Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?

nicole.healthline

if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site).

And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back

We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?

TakeshiYoung

I think it's worth it. I'm not sure what CMS you're using, but it shouldn't take much time to add noindex,follow to the header of all your pages, and then remove the robots.txt directive that's preventing them from being crawled.

nicole.healthline

thanks--I am concerned about if we should go through the process of unblocking them--they are all showing in the SERPs with the "This URL is blocked by robots.txt"--is it worrisome that such a large % of our URLs in the SERPs are showing as blocked by robots.txt with the "omitted from search results" message?

TakeshiYoung

If Google has already crawled/indexed the subdomains before, then adding noindex, follow is probably the best approach. This is because if you just block the sites with robots.txt, Google will still know that they pages exist, but won't be able to crawl them, resulting in it taking a long time for the pages to be de-indexed, if ever. Additionally, if those subdomains have any links, then that link value is lost because Google can't crawl the pages.

Adding noindex,follow will tell Google definitely to remove those subdomains from their index, as well as help preserve any link equity they've accumulated.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

In the google index but search redirects to homepage

Robots.txt gone wild

When does Google index a fetched page?

Robots.txt, does it need preceding directory structure?

Should all pages on a site be included in either your sitemap or robots.txt?

If i disallow unfriendly URL via robots.txt, will its friendly counterpart still be indexed?

Help! Why did Google remove my images from their index?

Canonicalization issue? - URLs with and without trailing slashes showing up as unique