Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
International SEO and duplicate content: what should I do when hreflangs are not enough?
Hi, A follow up question from another one I had a couple of months ago: It has been almost 2 months now that my hreflangs are in place. Google recognises them well and GSC is cleaned (no hreflang errors). Though I've seen some positive changes, I'm quite far from sorting that duplicate content issue completely and some entire sub-folders remain hidden from the SERP.
Intermediate & Advanced SEO | | GhillC
I believe it happens for two reasons: 1. Fully mirrored content - as per the link to my previous question above, some parts of the site I'm working on are 100% similar. Quite a "gravity issue" here as there is nothing I can do to fix the site architecture nor to get bespoke content in place. 2. Sub-folders "authority". I'm guessing that Google prefers sub-folders over others due to their legacy traffic/history. Meaning that even with hreflangs in place, the older sub-folder would rank over the right one because Google believes it provides better results to its users. Two questions from these reasons:
1. Is the latter correct? Am I guessing correctly re "sub-folders" authority (if such thing exists) or am I simply wrong? 2. Can I solve this using canonical tags?
Instead of trying to fix and "promote" hidden sub-folders, I'm thinking to actually reinforce the results I'm getting from stronger sub-folders.
I.e: if a user based in belgium is Googling something relating to my site, the site.com/fr/ subfolder shows up instead of the site.com/be/fr/ sub-sub-folder.
Or if someone is based in Belgium using Dutch, he would get site.com/nl/ results instead of the site.com/be/nl/ sub-sub-folder. Therefore, I could canonicalise /be/fr/ to /fr/ and do something similar for that second one. I'd prefer traffic coming to the right part of the site for tracking and analytic reasons. However, instead of trying to move mountain by changing Google's behaviour (if ever I could do this?), I'm thinking to encourage the current flow (also because it's not completely wrong as it brings traffic to pages featuring the correct language no matter what). That second question is the main reason why I'm looking out for MoZ's community advice: am I going to damage the site badly by using canonical tags that way? Thank you so much!
G0 -
Blocking Dynamic Search Result Pages From Google
Hi Mozzerds, I have a quick question that probably won't have just one solution. Most of the pages that Moz crawled for duplicate content we're dynamic search result pages on my site. Could this be a simple fix of just blocking these pages from Google altogether? Or would Moz just crawl these pages as critical crawl errors instead of content errors? Ultimately, I contemplated whether or not I wanted to rank for these pages but I don't think it's worth it considering I have multiple product pages that rank well. I think in my case, the best is probably to leave out these search pages since they have more of a negative impact on my site resulting in more content errors than I would like. So would blocking these pages from the Search Engines and Moz be a good idea? Maybe a second opinion would help: what do you think I should do? Is there another way to go about this and would blocking these pages do anything to reduce the number of content errors on my site? I appreciate any feedback! Thanks! Andrew
Intermediate & Advanced SEO | | drewstorys0 -
Internal links to preferential pages
Hi all, I have question about internal linking and canonical tags. I'm working on an ecommerce website which has migrated platform (shopify to magento) and the website design has been updated to a whole new look. Due to the switch to magento, the developers have managed to change the internal linking structure to product pages. The old set up was that category pages (on urls domain.com/collections/brand-name) for each brand would link to products via the following url format: domain.com/products/product-name . This product url was the preferential version that duplicate product pages generated by shopify would have their canonical tags pointing to. This set up was working fine. Now what's happened is that the category pages have been changed to link to products via dynamically generated urls based on the user journey. So products are now linked to via the following urls: domain.com/collection/brand-name/product-name . These new product pages have canonical tags pointing back to the original preferential urls (domain.com/products/product-name). But this means that the preferential URLs for products are now NOT linked to anywhere on the website apart from within canonical tags and within the website's sitemap. I'm correct in thinking that this definitely isn't a good thing, right? I've actually noticed Google starting to index the non-preferential versions of the product pages in addition to the preferential versions, so it looks like Google perhaps is ignoring the canonical tags as there are so many internal links pointing to non-preferential pages, and no on-site links to the actual preferential pages? I've recommended to the developers that they change this back to how it was, where the preferential product pages (domain.com/products/product-name) were linked to from collection pages. I just would like clarification from the Moz community that this is the right call to make? Since the migration to the new website & platform we've seen a decrease in search traffic, despite all redirects being set up. So I feel that technical issues like this can't be doing the website any favours at all. If anyone could help out and let me know if what I suggested is correct then that would be excellent. Thank you!
Intermediate & Advanced SEO | | Guy_OTS0 -
WordPress redesign: using posts as pages?
Starting a redesign for an attorney who is currently using WordPress with an old framework that is no longer being supported, so I'm going to install a new WP and start from scratch. The site consists of about 30 static pages (practice areas, attorney profiles, etc.) and they write about 5 blog posts per month. I've always differentiated between posts and pages for WP sites I've done in the past, but this time around I thought it might be more clean (less files, and easier for their webmaster to make routine edits) if I just brought over the static pages as posts. However, the recent webinar on the Yoast SEO plugin mentioned using the month/day in the permalink structure for posts to avoid duplicate content issues. That would go against how I was thinking of setting it up, because I would have just generated the URL off the page title and make a separate category for "pages". Just wondering if anyone's used posts as pages before. While this seems like it would make things easier for the webmaster, I'm not sure it maximizes potential for SEO. Thanks.
Intermediate & Advanced SEO | | c2g0 -
International Domain and URL Method of Preference
I'm seeing varied opinions and methods preferred for domain/URL structure on international websites. A specific example we have now is an international brand in Asia, USA, Brazil/South America, Australia, New Zealand and Africa. Their current domains are all fragmented across the brand and our goal is to have them unified, examples of their issue here; country.brand.com
Intermediate & Advanced SEO | | Cuker
www.brand.com.au
www.brand.co.nz What I'm looking for is an approach that will have the best long term impact but no short term losses as well. I'm leaning toward www.brand.com.eu or www.brand.com/eu/ Looking at SERP's for other countries, subdomain geographic segmenting doesn't seem to show on any first pages in the SERPs. There is one other option I'm still interested in finding out more about, geographically segmenting sites and pages through canonical or hreflang. Interested in hearing some additional POV's. Thanks! Anthony0 -
Anchor Tag around Table / Block
Our homepage (here) has four large promotional sections taking up most of the real estate. Each promo section has an image and styled text. We want each promo section to link to the appropriate page, so we created the promo sections as and wrapped each in an anchor. That works fine for users but I tried viewing our site in a text-only browser (Lynx) and couldn't follow those links! My fear is that GoogleBot can't follow them either and doesn't know what anchor text to pull. So, my question: What's the best way to make this entire block clickable, but still have it crawlable by robots? Or is our current implementation ok? For reference, here's a simplified version of the relevant code block: | | All Diamonds Extra 20% Off | [| | Jessica Simspon Extra 20% Off |](http://jessicasimpson.jewelry.com/shop/)
Intermediate & Advanced SEO | | Richline_Digital0 -
How to Disallow Tag Pages With Robot.txt
Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark
Intermediate & Advanced SEO | | monster990 -
Block an entire subdomain with robots.txt?
Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
Intermediate & Advanced SEO | | kylesuss12