Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt for Facet Results
Hi Does anyone know how to properly add facets URL's to Robots txt? E.g. of our facets URL - http://www.key.co.uk/en/key/platform-trolleys-trucks#facet:-10028265807368&productBeginIndex:0&orderBy:5&pageView:list& Everything after the # will need to be blocked on all pages with a facet. Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
International Site Migration
Hi guys, In the process of launching internationally ecommerce site (Magento CMS) for two different countries (Australia and US). Then later on expand to other countries like the UK, Canada, etc. The plan is for each country will have its own sub-folder e.g. www.domain.com/us, www.domain.com.au/au, www.domain.com.au/uk A lot of the content between these English based countries are the same. E.g. same product descriptions.
Intermediate & Advanced SEO | | jayoliverwright
So in order to prevent duplication, from what I’ve read we will need to add Hreflang tags to every single page on the site? So for: Australian pages: United States pages: Just wanted to make sure this is the correct strategy (will hreflang prevent duplicate content issues?) and anything else i should be considering? Thankyou, Chris0 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
International SEO Question
_The company I work for has a website www.example.com that ranks very well in English speaking countries - US, UK, CA. For legal reasons, we now need to create www.example.co.uk to be accessible and rank in google.co.uk. Obviously we want this change to be as smooth as possible with little effect on rankings in the UK. We have two options that we're talking through at the moment - Use the hreflang tag on both the .com and the .co.uk to tell Google which site to rank in each country. My worry with this is that we might lose our rankings in the UK as it will be a brand new site with little to no links pointing to it. 301 redirect to the .co.uk based on UK IP addresses. I'm skeptical about this. As a 301 passes most of the link juice, I'm not sure how Google would treat this type of thing - would the .com lose ranking? So my questions are - would we lose ranking in the UK if we use option 1? Would option 2 work? What would you do? Any help is appreciated._
Intermediate & Advanced SEO | | awestwood0 -
URL blocked
Hi there, I have recently noticed that we have a link from an authoritative website, however when I looked at the code, it looked like this: <a <span="">href</a><a <span="">="http://www.mydomain.com/" title="blocked::http://www.mydomain.com/">keyword</a> You will notice that in the code there is 'blocked::' What is this? has it the same effect as a nofollow tag? Thanks for any help
Intermediate & Advanced SEO | | Paul780 -
Internal Anchor Text Penalty Clarification
I believe we may be seeing the initial stages of a penalty for over-using internal anchor text on our ecommerce site. Per Rand and other training, we added related product links and popular category links to our product and category pages. At the time, we did not have an html sitemap in the footer. We're a small to medium sized site with 1,700+ products. We have since added an html sitemap of our categories to our footer. Now we have category links in the sitemap and category pages and product pages with targeted anchor text. I'm beginning to see downward movement on some of those targeted categories. If I have an html sitemap in the footer (category index) should I get rid of the popular category links throughout the rest of the site? Also, with more frequency, I'm seeing a "product index" and "category index" in footers. Is this a best practice? Thanks.
Intermediate & Advanced SEO | | AWCthreads0 -
Does It Really Matter to Restrict Dynamic URLs by Robots.txt?
Today, I was checking Google webmaster tools and found that, there are 117 dynamic URLs are restrict by Robots.txt. I have added following syntax in my Robots.txt You can get more idea by following excel sheet. #Dynamic URLs Disallow: /?osCsidDisallow: /?q= Disallow: /?dir=Disallow: /?p= Disallow: /*?limit= Disallow: /*review-form I have concern for following kind of pages. Shorting by specification: http://www.vistastores.com/table-lamps?dir=asc&order=name Iterms per page: http://www.vistastores.com/table-lamps?dir=asc&limit=60&order=name Numbering page of products: http://www.vistastores.com/table-lamps?p=2 Will it create resistance in organic performance of my category pages?
Intermediate & Advanced SEO | | CommercePundit0 -
Site Wide Internal Navigation links
Hello all, All our category pages www.pitchcare.com/shop are linked to from every product page via the sidebar navigation. Which results in every category page having over 1700 links with the same anchor text. I have noticed that the category pages dont appear to be ranked when they most definately should be. For example http://www.pitchcare.com/shop/moss-control/index.html is not ranked for the term "moss control" instead another of our deeper pages is ranked on page 1. Reading a previous SEO MOZ article · Excessive Internal Anchor Text Linking / Manipulation Can Trip An Automated Penalty on Google
Intermediate & Advanced SEO | | toddyC
I recently had my second run-in with a penalty at Google that appears to punish sites for excessive internal linking with "optimized" (or "keyword stuffed anchor text") links. When the links were removed (in both cases, they were found in the footer of the website sitewide), the rankings were restored immediately following Google's next crawl, indicating a fully automated filter (rather than a manual penalty requiring a re-consideration request). Do you think we may have triggered a penalty? If so what would be the best way to tackle this? Could we add no follows on the product pages? Cheers Todd0