Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved What would the exact text be for robots.txt to stop Moz crawling a subdomain?
I need Moz to stop crawling a subdomain of my site, and am just checking what the exact text should be in the file to do this. I assume it would be: User-agent: Moz
Getting Started | | Simon-Plan
Disallow: / But just checking so I can tell the agency who will apply it, to avoid paying for their time with the incorrect text! Many thanks.0 -
MOZ point
I don't know why, but since a week ago I'm not receiving moz point for my activity on moz forum. Example
Getting Started | | Roman-Delcarmen
Today I posted 3 answer in the Question section but in my moz profile does not show the 3 moz point that normally I receive for that. I week ago suddenly I received 20 moz point, why I dont have any idea, maybe someone mark one of my asnwer as good answer. So my point is where I cant found the exact tracking record of my activity1 -
What is 'domain authority'?
IN seomoz, it mentions domain authority but it doesnt define it. what does domain authority mean?
Getting Started | | torbett0 -
I don't believe moz is seeing everything that is on my webpage
I used the page key word grader and got an "F" Moz said that my keyword employee handbook was not in my title nor was it found in the body of my page. But when I look at the page and double check everything it is there all over the place. I am not blaming moz this is a wiz site and while I am a beginner and very well could be wrong could anyone just take a look and tell me if I am nuts or what. The web page is http://www.cestoday.com/#!employee-handbook/co0h I now have the font so big I will have to fix that. Thank you
Getting Started | | redsman9440 -
Moz Analytics Campaign Landing Pages Update
Moz, Any ideas why i am not seeing my landing pages information in my analytics campaign yet ?, It has been a couple of days, i have reconnected my analytics, i have built up the keyword list a little more, But i only get page data from clicking 'Include pages with no tracked keywords' but still no keyword data, Am i miss understanding how this tool works or is there something else i need to setup ? Thanks James
Getting Started | | Antony_Towle1 -
How do get Moz to spider a Development site PRE LAUNCH?
Hi, Does anyone know how we could get Moz to browse a development site before launch? But without Google and other engines indexing it? Thanks
Getting Started | | bjs20100 -
'Domain Does Not Respond To Web Requests'
Hi everyone, This seems to be a fairly common query on the Q&A section, but I haven't been able to find a solution by reading through previous threads. When I try to set up an SEOmoz campaign for spryz.co.nz, I get that ol' favourite error message: 'We have detected that the domain spryz.co.nz does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information.' The problem isn't being caused by a Robots.txt file, and the site has experienced 99% uptime since it was launched.Traffic stats show that visits are coming through to the site via the search engines, suggesting that it may not be all crawlers that fail to access the site. I've tried to set up this campaign several times throughout the day, since I've read that sometimes Roger goes on the blink, but I've still not been successful. Any suggestions as to why Roger might be unable to crawl my website would be great.
Getting Started | | e8creative0