Robots.txt crawling URL's we dont want it to
-
Hello
We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to.
Does anyone have an ideas how we can stop this happening?
-
Hi there!
Thanks for reaching out to us! I am sorry if Roger is somehow not following your robots.txt directives. To ensure that Roger doesn't crawl your site you can put the following directive above your general directives in your robots.txt:
User-agent: rogerbot
Dissallow: /Once this is in place you should find our crawler to be a lot more obedient towards your site.
Hope this helps, please let us know if you have any more questions about our crawler.
Best,
Peter
Moz Help Team.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm struggling to understand (and fix) why I'm getting a 404 error. The URL includes this "%5Bnull%20id=43484%5D" but I cannot find that anywhere in the referring URL. Does anyone know why please? Thanks
Can you help with how to fix this 404 error please? It appears that I have a redirect from one page to the other, although the referring page URL works, but it appears to be linking to another URL with this code at the end of the the URL - %5Bnull%20id=43484%5D that I'm struggling to find and fix. Thanks
Technical SEO | | Nichole.wynter20200 -
No: 'noindex' detected in 'robots' meta tag
I'm getting an error in Search Console that pages on my site show No: 'noindex' detected in 'robots' meta tag. However, when I inspect the pages html, it does not show noindex. In fact, it shows index, follow. Majority of pages show the error and are not indexed by Google...Not sure why this is happening. Unfortunately I can't post images on here but I've linked some url's below. The page below in search console shows the error above... https://mixeddigitaleduconsulting.com/ As does this one. https://mixeddigitaleduconsulting.com/independent-school-marketing-communications/ However, this page does not have the error and is indexed by Google. The meta robots tag looks identical. https://mixeddigitaleduconsulting.com/blog/leadership-team/jill-goodman/ Any and all help is appreciated.
Technical SEO | | Sean_White_Consult0 -
Google has deindexed a page it thinks is set to 'noindex', but is in fact still set to 'index'
A page on our WordPress powered website has had an error message thrown up in GSC to say it is included in the sitemap but set to 'noindex'. The page has also been removed from Google's search results. Page is https://www.onlinemortgageadvisor.co.uk/bad-credit-mortgages/how-to-get-a-mortgage-with-bad-credit/ Looking at the page code, plus using Screaming Frog and Ahrefs crawlers, the page is very clearly still set to 'index'. The SEO plugin we use has not been changed to 'noindex' the page. I have asked for it to be reindexed via GSC but I'm concerned why Google thinks this page was asked to be noindexed. Can anyone help with this one? Has anyone seen this before, been hit with this recently, got any advice...?
Technical SEO | | d.bird0 -
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
New Website, New URL, New Content - What do we do with the old site? Are 301's the only option?
We've just built a new site for a client. They were adamant on changing the url. The new site is entirely new content, however the subject mater is the same. Some pages are even titled very similarly. Is is advisable to keep the old site running, and link it to the new site? Permanently, or temporarily? Do we simply place redirects from the old site the new? Old site was 30 pages, new site is 80 pages. So redirects won't be available to all the new pages. It seems a shame to trash the old site, it is getting some good traffic, and the content - although outdated is unique and of a high quality. Old url is 4+ yrs old, the new url is new. Some enlightened opinions would be greatly welcomed. Thanks
Technical SEO | | MarketsOnline0 -
Redirect old URL's from referring sites?
Hi I have just came across some URL's from the previous web designer and the site structure has now changed. There are some links on the web however that are still pointing at the old deep weblinks. Without having to contact each site it there a way to automatically sort the links from the old structure www.mydomain.com/show/english/index.aspx to just www.mydomain.com Many Thanks
Technical SEO | | ocelot0 -
What's the best canonicalization method?
Hi there - is there a canonicalization method that is better than others? Our developers have used the
Technical SEO | | GBC0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0