Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing dates from wordpress blog URL
Hi all, Ours is website's blog is built with wordpress. We used to have the below URL pattern like may other websites: www.website.com/blog/2016/04/10/topic-on-how-to-optimise-blog. Recently we removed the date and made the URL pattern to just like: www.website.com/blog/topic-on-how-to-optimise-blog All the links have been generated with new URLs across the blog. Still all the old URLs have been reported as crawl errors in search console. I am wondering will there be any auto redirect formula to redirect all the old URLs to new URLs. Thanks
Intermediate & Advanced SEO | | vtmoz0 -
Is robots met tag a more reliable than robots.txt at preventing indexing by Google?
What's your experience of using robots meta tag v robots.txt when it comes to a stand alone solution to prevent Google indexing? I am pretty sure robots meta tag is more reliable - going on own experiences, I have never experience any probs with robots meta tags but plenty with robots.txt as a stand alone solution. Thanks in advance, Luke
Intermediate & Advanced SEO | | McTaggart1 -
International Href Lang Tag Parameter Issue
Hey, let's say I'm on the following page.. site.com/product-name/product-code/?d=womens I view the page source and it looks like this.. My question is, should I remove the parameter for the hreflang tag???? I just need some clarification that NO parameter page should have a canonical tag and / or href lang with parameters..
Intermediate & Advanced SEO | | ggpaul5620 -
Internal nofollow links
Hello, We have a blog and at the end each blog post (and from the sidebar) we link to one main product page (tagged with a particular query string). Now Google will see from every blog post all of these internal links pointing back to this page. Do you think this would cause a problem and that these links should be nofollowed? I think Google will kind of detect that these is kind of a "navigation" as the code will be the same across all webpages. Most of all, doing them nofollow I think it is worse because it may trigger some sort of pagerank sculpting algo filter, if it still exists. Thanks, Conrad
Intermediate & Advanced SEO | | conalt0 -
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
What content should I block in wodpress with robots.txt?
I need to know if anyone has tips on creating a good robots.txt. I have read a lot of info, but I am just not clear on what I should allow and not allow on wordpress. For example there are pages and posts, then attachments, wp-admin, wp-content and so on. Does anyone have a good robots.txt guideline?
Intermediate & Advanced SEO | | ENSO0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1