Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page must be internally linked to get indexed?
If a there is page like website.com/page; I think this page will be indexed by Google even we don't link it internally from anywhere. Is this true? Will it makes any difference in-terms of "indexability" if we list this page on sitemap? I know page's visibility will increase when link from multiple internal pages. I wonder will there be any noticeable difference while this page is listed in sitemap.
Intermediate & Advanced SEO | | vtmoz0 -
Internal Links - Different URLs
Hey so, In my product page, I have recommended products at the bottom. The issue is that those recommended products have long parameters such as sitename.com/product-xy-z/https%3A%2F%2Fwww.google.co&srcType=dp_recs The reason why it has that long parameter is due to tracking purposes (internally with the dev and UX team). My question is, should I replace it with the clean URL or as long as it has the canonical tag, it should be okay to have such a long parameter? I would think clean URL would help with internal links and what not...but if it already has a canonical tag would it help? Another issue is that the URL is different and not just the parameter. For instance..the canonical URL is sitename.com/productname-xyz/ and so the internal link used on the product page (same exact page just different URL with parameter) sitename.com/xyz/https%3A%2F%2Fwww.google.co&srcType=dp_recs (missing product name), BUT still has the canonical tag!
Intermediate & Advanced SEO | | ggpaul5620 -
Permanently using 301 for internal link
Hello Folks, Tried going through the 301 answers but could not find any question similar to what I had. The issue we have is we have got a listing page with the products like this: /used-peugeot/used-toyota-corolla As you can see this URL is not really ideal and I want to redirect it to /used-toyota/corolla using mod_rewrite. The redirect will be 301. My concern here is the URL in the listing page won't change to /used-toyota/corolla and hence the 301 will be 'permanently' placed and I was wondering if this will lose some link juice of the 301ed URL. Now with 301 being a 'permanent' redirect one would assume it should not be an issue but I just wanted to be sure that I am correct in assuming so. Thank you for your time.
Intermediate & Advanced SEO | | nirpan0 -
Internal Linking for better seo
On our site http://villasdiani.com we have a blog called Kenya news, which is a category where we regular post articles. I am always creating external links to the category Kenya news so as it would pass juice to the posts in it and the posts have back links to category. There are no internal links among posts in the category. As our main target is to rent beach villas and boutique hotels, each of that posts in the category Kenya news has only a link either to category with beach villas or to category with boutique hotels. My question is, if this is good practice?, is it just not too much links going to categories to beach villas and boutique hotels form the Kenya news?
Intermediate & Advanced SEO | | Rebeca1
Thank you very much for any thoughts Iris0 -
Should I move our blog internal....
I wanted to also ask the wider moz community this question. Our blogs are currently run on blogger/wordpress using a subdomain strategy - blog.website.com and has now gained a home page PR3. It's been running for 2-3 years. This runs contrary to best practice of website.com/blog. I'm now considering making the blog internal but want to get your opinion as the longer I leave it, the bigger a decision it will be.... Do the pro's of making the blog internal outweigh the cons of doing so ? Pro's Blog benefits from root domain Fresh content on the site that people can interact with Root domain benefits from links the content gains Easier to analyse user activity Con's Loss of Page Rank Effort to 301 all URL's and content CMS altered to allow creation of blog content
Intermediate & Advanced SEO | | RobertChapman0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
10,000 New Pages of New Content - Should I Block in Robots.txt?
I'm almost ready to launch a redesign of a client's website. The new site has over 10,000 new product pages, which contain unique product descriptions, but do feature some similar text to other products throughout the site. An example of the page similarities would be the following two products: Brown leather 2 seat sofa Brown leather 4 seat corner sofa Obviously, the products are different, but the pages feature very similar terms and phrases. I'm worried that the Panda update will mean that these pages are sand-boxed and/or penalised. Would you block the new pages? Add them gradually? What would you recommend in this situation?
Intermediate & Advanced SEO | | cmaddison0 -
Wordpress Titles
My question is about long url titles, my client is using wordpress and the rankings are going well apart from two, which for some reason just wont move. After using some of the tools available on SEO moz which i have found very helpful I have spotted a re-occuring warning throughout the site, the titles, in word press you have this setting (below) page title : %page_title% | %blog_title% My question is my client has quite good brand online but I done want to impact this. The problem I have is that I have a Keyword in the title then the clients company name which is three words and takes up a lot of space. I am thinking about removing this but in two minds so i was kinda hoping for a bit of advice as this looks like a standard in wordpress. Mike
Intermediate & Advanced SEO | | TomBarker820