Robots.txt blocked internal resources Wordpress

Mat_C

Hi all,

We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.

Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?

Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?

Thanks for your thoughts!

Mat_C

Thanks for the answer!

Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073

However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.

JordanLowry

I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:

User-agent: *
Disallow: /wp-admin/

Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.

I hope that helps. Let me know how that works out for you!

Mat_C

Thanks for the clear answer.

I've changed the robots.txt to:

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php

This should avoid problems with not indexing (parts of) cached content.

Or should I leave all the Disallows out?

JordanLowry

Hey there --

Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.

However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.

Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.

So, yeah this might have some impact on your SEO.

Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.

So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.

Hope this helps some.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

Robots.txt blocked internal resources Wordpress

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Using hreflang for international pages - is this how you do it?

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Do internal links from non-indexed pages matter?

"noindex, follow" or "robots.txt" for thin content pages

How Do I Generate a Sitemap for a Large Wordpress Site?

Soft 404's from pages blocked by robots.txt -- cause for concern?

Finding broken links / resources by topic

Robots.txt is blocking Wordpress Pages from Googlebot?

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved