Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Wordpress Blog in 2 languages. How to SEO or structure it?
Hi Moz community, I have got a wordpress blog currently in the spanish language. I want to create the same blog content but in english version. (manually translate it to english instead of using translation service such as Google Translate). How should i structure the blog for SEO? How will it work? Any structure markups i should know about? Any examples? Thanks
Intermediate & Advanced SEO | | WayneRooney0 -
Wordpress Comments Pagination
Hi Mozzers What is your view on the following. Should you Paginate comments to increase page speed? If yes, at what # of comments would you begin pagination? (with the objective being decreasing page load times) Apply rel="canonical" back to the main article URL? eg: url/comment-page-1 => url noindex the comment pages? create a "View all" comments page? Thanks in advance for your help! 🙂
Intermediate & Advanced SEO | | jeremycabral
J0 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Google is indexing wordpress attachment pages
Hey, I have a bit of a problem/issue what is freaking me out a bit. I hope you can help me. If i do site:www.somesitename.com search in Google i see that Google is indexing my attachment pages. I want to redirect attachment URL's to parent post and stop google from indexing them. I have used different redirect plugins in hope that i can fix it myself but plugins don't work. I get a error:"too many redirects occurred trying to open www.somesitename.com/?attachment_id=1982 ". Do i need to change something in my attachment.php fail? Any idea what is causing this problem? get_header(); ?> /* Run the loop to output the attachment. * If you want to overload this in a child theme then include a file * called loop-attachment.php and that will be used instead. */ get_template_part( 'loop', 'attachment' ); ?>
Intermediate & Advanced SEO | | TauriU0