Robots.txt crawling URL's we dont want it to

ShearingsGroup

Hello

We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to.

Does anyone have an ideas how we can stop this happening?

Peterli

Hi there!

Thanks for reaching out to us! I am sorry if Roger is somehow not following your robots.txt directives. To ensure that Roger doesn't crawl your site you can put the following directive above your general directives in your robots.txt:

User-agent: rogerbot
Dissallow: /

Once this is in place you should find our crawler to be a lot more obedient towards your site.

Hope this helps, please let us know if you have any more questions about our crawler.

Best,

Peter
Moz Help Team.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt crawling URL's we dont want it to

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks

No: 'noindex' detected in 'robots' meta tag

Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'

What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?

New Website, New URL, New Content - What do we do with the old site? Are 301's the only option?

Redirect old URL's from referring sites?

What's the best canonicalization method?

Removing robots.txt on WordPress site problem