Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
All of my blog titles have disappeared. In need of Wordpress help.
Not sure if this is the right place to ask this question but here it goes. All of the titles on my real estate website have disappeared. I have spent hours looking through different forums trying to figure out how to make them show up. Also whenever I hover the cursor over links they turn to white and disappear as well. This is the website: http://www.acolerealty.com/blog/ If this helps here is the custom CSS in worpress is the following: /* GREEN */ body {background: #eff3ec !important;} .header-membership {
Intermediate & Advanced SEO | | artscube.biz
background: #fff !important;
box-shadow: none !important;
border-bottom: 2px solid #e5e9e3 !important;
} .header-membership a {
color: #909090 !important;
text-shadow: none !important
} h1#site-title a {
color: #397249 !important;
} header nav#main-nav {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */
} #t-header-container .home-search-container #header-top-search::before {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */
} input.button-primary {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */ border:1px solid #23472d !important;
} input.button-primary:hover {
background: #628b61 !important;
} footer {
background: #e4e8e1 !important;
}0 -
Meta robots or robot.txt file?
Hi Mozzers! For parametric URL's would you recommend meta robot or robot.txt file?
Intermediate & Advanced SEO | | eLab_London
For example: http://www.exmaple.com//category/product/cat no./quickView I want to stop indexing /quickView URLs. And what's the real difference between the two? Thanks again! Kay0 -
SSL and robots.txt question - confused by Google guidelines
I noticed "Don’t block your HTTPS site from crawling using robots.txt" here: http://googlewebmastercentral.blogspot.co.uk/2014/08/https-as-ranking-signal.html Does this mean you can't use robots.txt anywhere on the site - even parts of a site you want to noindex, for example?
Intermediate & Advanced SEO | | McTaggart0 -
Robots.txt - blocking JavaScript and CSS, best practice for Magento
Hi Mozzers, I'm looking for some feedback regarding best practices for setting up Robots.txt file in Magento. I'm concerned we are blocking bots from crawling essential information for page rank. My main concern comes with blocking JavaScript and CSS, are you supposed to block JavaScript and CSS or not? You can view our robots.txt file here Thanks, Blake
Intermediate & Advanced SEO | | LeapOfBelief0 -
Robots.txt Blocked Most Site URLs Because of Canonical
Had a bit of a "Gotcha" in Magento. We had Yoast Canonical Links extension which worked well , but then we installed Mageworx SEO Suite.. which broke Canonical Links. Unfortunately it started putting www.mysite.com/catalog/product/view/id/516/ as the Canonical Link - and all URLs with /catalog/productview/* is blocked in Robots.txt So unfortunately We told Google that the correct page is also a blocked page. they haven't been removed as far as I can see but traffic has certainly dropped. We have also , at the same time had some Site changes grouping some pages & having 301 redirects. Resubmitted site map & did a fetch as google. Any other ideas? And Idea how long it will take to become unblocked?
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Internal Linking for better seo
On our site http://villasdiani.com we have a blog called Kenya news, which is a category where we regular post articles. I am always creating external links to the category Kenya news so as it would pass juice to the posts in it and the posts have back links to category. There are no internal links among posts in the category. As our main target is to rent beach villas and boutique hotels, each of that posts in the category Kenya news has only a link either to category with beach villas or to category with boutique hotels. My question is, if this is good practice?, is it just not too much links going to categories to beach villas and boutique hotels form the Kenya news?
Intermediate & Advanced SEO | | Rebeca1
Thank you very much for any thoughts Iris0 -
Blocking some countries and redirecting that traffic
Hi there, I have a video site, which is on CDN and is really expensive to run. So I want to block most of the countries and only keep HQ ones. I wonder if there's a difference if I just block them and show blank page, or if I show them a page with text and let's say a link to a different site or if I just simply redirect to some other site. Do you think I can still get good ranking on google on countries that I don't block?
Intermediate & Advanced SEO | | melbog0 -
Robots.txt disallow subdomain
Hi all, I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only? Thanks in advance!
Intermediate & Advanced SEO | | Partouter0