Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?
-
if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site).
And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back
We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?
-
I think it's worth it. I'm not sure what CMS you're using, but it shouldn't take much time to add noindex,follow to the header of all your pages, and then remove the robots.txt directive that's preventing them from being crawled.
-
thanks--I am concerned about if we should go through the process of unblocking them--they are all showing in the SERPs with the "This URL is blocked by robots.txt"--is it worrisome that such a large % of our URLs in the SERPs are showing as blocked by robots.txt with the "omitted from search results" message?
-
If Google has already crawled/indexed the subdomains before, then adding noindex, follow is probably the best approach. This is because if you just block the sites with robots.txt, Google will still know that they pages exist, but won't be able to crawl them, resulting in it taking a long time for the pages to be de-indexed, if ever. Additionally, if those subdomains have any links, then that link value is lost because Google can't crawl the pages.
Adding noindex,follow will tell Google definitely to remove those subdomains from their index, as well as help preserve any link equity they've accumulated.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Index, follow on a paginated page with a different rel=canonical URL
Hello, I have a question about meta robots ="index, follow" and rel=canonical on category page pagination. Should the sorted page be <meta name="robots" content="index,follow"></meta name="robots" content="index,follow"> since the rel="canonical" is pointing to a separate page that is different from the URL? Any thoughts on this topic would be awesome. Thanks. Main Category Page
Intermediate & Advanced SEO | | Choice
https://www.site.com/category/
<meta name="robots" content="index,follow"><link rel="canonical" href="https: www.site.com="" category="" "=""></link rel="canonical" href="https:></meta name="robots" content="index,follow"> Sorted Page
https://www.site.com/category/?p=2&dir=asc&order=name
<meta name="robots" content="index, follow"=""><link rel="canonical" href="https: www.site.com="" category="" ?p="2""></link rel="canonical" href="https:></meta name="robots" content="index,> As you can see, the meta robots is telling Google to index https://www.site.com/category/?p=2&dir=asc&order=name , yet saying the canonical page is https://www.site.com/category/?p=2 .0 -
Google is showing the Wrong Date for our Holiday.
Friday, December 16 is Free Shipping Day. However, Google says it's actually 2 days later. Does anybody know how we can get that changed? See: https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=free shipping day
Intermediate & Advanced SEO | | lk990 -
How to change URL structure in google webmasters
Is there any way to ask google to indexed the website in following URL structure abc.com/category/postname (I have this structure on my website) But Currently google indexed my website posts as - abc.com/postname/category How I can tell google to follow the right structure?
Intermediate & Advanced SEO | | Michael.Leonard0 -
What do you add to your robots.txt on your ecommerce sites?
We're looking at expanding our robots.txt, we currently don't have the ability to noindex/nofollow. We're thinking about adding the following: Checkout Basket Then possibly: Price Theme Sortby other misc filters. What do you include?
Intermediate & Advanced SEO | | ThomasHarvey0 -
Google Indexed Old Backups Help!
I have the bad habit of renaming a html page sitting on my server, before uploading a new version. I usually do this after a major change. So after the upload, on my server would be "product.html" as well as "product050714".html. I just stumbled on the fact G has been indexing these backups. Can I just delete them and produce a 404?
Intermediate & Advanced SEO | | alrockn0 -
Huge Google index on E-commerce site
Hi Guys, I got a question which i can't understand. I'm working on a e-commerce site which recently got a CMS update including URL updates.
Intermediate & Advanced SEO | | ssiebn7
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed). The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed. Another strange thing which another forum member describes here : Cache date has been reverted And next to that old url's (which have a 301 for about a month now) keep showing up in the index. Does anyone know what i could do to solve the problem?0 -
Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?
I have error messages in Google Webmaster that state that Googlebot encountered errors while attempting to access the robots.txt. The last date that this was reported was on December 25, 2012 (Merry Christmas), but the last cache date was November 16, 2012 (http://webcache.googleusercontent.com/search?q=cache%3Awww.etundra.com/robots.txt&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a). How could I get this error if the page hasn't been cached since November 16, 2012?
Intermediate & Advanced SEO | | eTundra0 -
Does anyone know if certain DMOZ categories are blocked/never get indexed on google?
Hi all, After waiting many months I was happy to see a certain site listed on DMOZ, then months later still haven't seen the dmoz category indexed in google. It makes me wonder if certain categories don't get indexed or blocked or even previously penalized by google. The category in question is a regional one : http://www.dmoz.org/Regional/North_America/United_States/New_Jersey/Localities/G/Garfield/Business_and_Economy/ Anyone come across this before? Dave
Intermediate & Advanced SEO | | davebrown19750