Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?

nicole.healthline

if we search site:domain.com vs www.domain.com, We see: 130,000 vs 15,000 results. When reviewing the site:domain.com results, we're finding that the majority of the URLs showing are blocked by robots.txt. They are subdomains that we use as production environments (and contain similar content as the rest of our site).

And, we also find the message "In order to show you the most relevant results, we have omitted some entries very similar to the 541 already displayed." SEER Interactive mentions that this is one way to gauge a Panda penalty: http://www.seerinteractive.com/blog/100-panda-recovery-what-we-learned-to-identify-issues-get-your-traffic-back

We were hit by Panda some time back--is this an issue we should address? Should we unblock the subdomains and add noindex, follow?

TakeshiYoung

I think it's worth it. I'm not sure what CMS you're using, but it shouldn't take much time to add noindex,follow to the header of all your pages, and then remove the robots.txt directive that's preventing them from being crawled.

nicole.healthline

thanks--I am concerned about if we should go through the process of unblocking them--they are all showing in the SERPs with the "This URL is blocked by robots.txt"--is it worrisome that such a large % of our URLs in the SERPs are showing as blocked by robots.txt with the "omitted from search results" message?

TakeshiYoung

If Google has already crawled/indexed the subdomains before, then adding noindex, follow is probably the best approach. This is because if you just block the sites with robots.txt, Google will still know that they pages exist, but won't be able to crawl them, resulting in it taking a long time for the pages to be de-indexed, if ever. Additionally, if those subdomains have any links, then that link value is lost because Google can't crawl the pages.

Adding noindex,follow will tell Google definitely to remove those subdomains from their index, as well as help preserve any link equity they've accumulated.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Google showing high volume of URLs blocked by robots.txt in in index-should we be concerned?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

MOZ is showing that I have non- indexed blog tag posts are they supposed to be nonindexed. My articles are indexed just not the blog tags that take you to other similar articles do I need to fix this or is it ok?

Google Indexing

Does Google cache every page that is been indexed?

Meta robots or robot.txt file?

Google News URL Structure

Soft 404's from pages blocked by robots.txt -- cause for concern?

Google Webmaster Now Shows YourMost Recent Links

Can I use a "no index, follow" command in a robot.txt file for a certain parameter on a domain?