Disallow statement - is this tiny anomaly enough to render Disallow invalid?

lzhao

Google site search (site:'hbn.hoovers.com') indicates 171,000 results for this subdomain. That is not a desired result - this site has 100% duplicate content. We don't want SEs spending any time here.

Robots.txt is set up mostly right to disallow all search engines from indexing this site. That asterisk at the end of the disallow statement looks pretty harmless - but could that be why the site has been indexed?

User-agent: *
Disallow: /*

lzhao

Interesting. I'd never heard that before.

We've never had GA or GWT on these mirror sites before, so it's hard to say what Google is doing these days.

But the goal is definitely to make them and their contents invisible to SEs. We'll get GWT on there and start removing URLs.

Thanks!

WilliamKammer

The additional asterisk shouldn't do you any harm, although standard practice seems to be just putting the "/".

Does it seem like Google is still crawling this subdomain when you look at webmasters crawl stats? While the disallow function in robots.txt will usually stop bots from crawling, it doesn't prevent them from indexing or keeping pages indexed that were before the disallow was put in place. If you want these pages removed from the index, you can request it through webmasters and also use meta robots noindex as opposed to the robots.txt file. Moz has a good article about it here: http://moz.com/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts

If you're just worried about bots crawling the subdomain, it's possible they've already stopped crawling it, but continue to index it due to history or additional indicators suggesting they should index it.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Disallow statement - is this tiny anomaly enough to render Disallow invalid?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Invalid Microdata - How much of an impact does invalid microdata have on SERPS

Fetch and Render misses middle chunk of page

Have I done enough seo on this page to make a difference

Spider Indexed Disallowed URLs

Is Noindex Enough To Solve My Duplicate Content Issue?

How long does it take for traffic to bounce back from and accidental robots.txt disallow of root?

How to disallow google and roger?

Differences between Lynx Viewer, Fetch as Googlebot and SEOMoz Googlebot Rendering