Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
5xx Crawl Issue might not be issues at all. Help
Hi, I ran a crawl test on our website and it came back with 900 5xx potential errors. When I started opening these links 1 by 1 I could see they were actually working. So i exported the full list of 900 and went to the website: https://httpstatus.io/ pasted the links by 100 and used that. They came back with status codes of 301 / 301 / 200 which i believe means they are okay. After reading it says that my programmer may need to see if we are blocking the MOZ BOT or to slow the MOZ BOT down. I guess I'm wondering if this is not done is the site actually having these 5xx errors when Google is Crawling or is it just showing 900 errors because of MOZ BOT but actually things are okay? I know the simple answer is to get the programmer to fix the MOZ BOT issue to know for sure but getting programmers to do things take a lot of time so I'm trying to get a better idea here. Thanks for your input.
Getting Started | | Cfarcher1 -
Attempts to fix MOZ recommended issues resulted in drastic ranking drop.
I have a website built in SquareSpace. https://www.ruffhaus.com/ I recently started working with MOZ to track and improve organic SEO. After my initial site crawl, search visibility was reported at a waping 2.38%. Moz showed several critical crawler issues. Most were redirect, 4xx and long URL. So I started working on fixing the redirect and 4xx issues first. I thought this was a good thing. But now my already sad search visibility has dropped to .07% (-95.97%!). I also went from #1 on keyword, brand implementation plan (and 3 variations), to #27. What? Wondering where I went wrong and how to remedy? This is all new to me so I am sure I'm not providing all the info you need to answer my question. Hoping providing the site URL will help. Fire away!
Getting Started | | RuffHaus0 -
Moz only crawling one page of a campaign, please help
Today I set up a new campaign for a client, however the crawl has only found the home page and is saying that the URL is unavailable. The site is definitely live and the URL is correct. I have set up the campaign 3 times one with the full address (http://www.) one with www. and with just the domain name. All three of these have come page with one page crawled and "unavailable" above the URL. It is picking up the crawl issues on the page and showing domain authority but I don't know why it's not crawling other pages. Prior to setting up the campaign I did a site crawl and Moz found everything then, so I don't know why it isn't now. Please help. Thanks
Getting Started | | Wrapped0 -
Moz can't crawl my site.
Moz cannot carry out the site crawl on my online shop. Not really sure what the issue is, it has no problem getting onto my site when you use www. before the address, but it needs to be able to access bluerinsevintage.co.uk Stuck as what to do, we are a shopify store. Anyone else had this problem, or know what i need to change so they can crawl the site? thjis is the page they are getting when trying to get on bluerinsevintage.co.uk but if they use www.bluerinsevintage.co.uk the site comes up. Adam
Getting Started | | bluerinsevintage0 -
We recently switched from HTTP to HTTPS and we are having crawling issues!
We switched our website from HTTP to HTTPS and we started to get an email from Moz about the robots.txt being unable to crawl our website. The website is hosted through wordpress but we haven't had any issues until we switched. We have no idea what to do or even what the problem is! If you have had a similar problem and fixed it, we need your help! Thank you.
Getting Started | | DrInfinity0 -
SSL - green padlock but Moz say there's an 804 error?
Hi, my site has a green padlock and no SSL errors but Moz are reporting an 804 error. I use CloudFlare with fairly complex settings. I've read this thread but it's quite old and I don't understand which parts of it are still valid. I'd love to know whether this can be sorted before I spend hours setting up Moz's features as if they can't crawl my site then I would obviously need to cancel my subscription. Thanks
Getting Started | | Barn2Plugins0 -
Multiple Google Analytics Accounts connected to the same MOZ subscription
Hi everyone, I have a quick question regarding setting up my MOZ account and it's
Getting Started | | Bernardo.Reguero
integration with a Google Analytics account: I'm not sure if I'm right but I think I can only set a single Google
Analytics account to do my analysis and then created several campaign. Or is there a possibility to set several Google Analytics accounts and creating several camapign to track diferente websites (connected to diferente Google Analytics accounts) ? Thanks! Cheers, Bernardo0 -
Clarifications on the Moz Analytics package (Medium - $149 per month)
What are the Moz tools available with this package? What factors of SEO can be checked with these tools? With this package, is it possible to provide a single URL (preferably home page) and Moz will analyse the entire site and highlight how the site performs wrt various SEO factors? This package states that with this package we can run 10 Moz Analytics campaigns. Our understanding of Moz Analytics Campaign is every site; say www.test.com is one analytics campaign. Are we correct? Does the subdomains within a parent domain also considered as one analytics campaign. For e.g., if I have sites: www.mydomain.com and www.xxx.mydomain.com are they considered two separate campaigns or are they considered as one single campaign? In this package it is listed as 750 keywords, what does this signify? In what way this feature can be used to check our site’s SEO compliance. Please elaborate. In this package it is listed as 15 social accounts, what does this signify? In what way this feature can be used to check our site’s SEO compliance. Please elaborate. What do you mean by branded reports?
Getting Started | | WebCCTrial0