Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz can't crawl my site.
Moz cannot carry out the site crawl on my online shop. Not really sure what the issue is, it has no problem getting onto my site when you use www. before the address, but it needs to be able to access bluerinsevintage.co.uk Stuck as what to do, we are a shopify store. Anyone else had this problem, or know what i need to change so they can crawl the site? thjis is the page they are getting when trying to get on bluerinsevintage.co.uk but if they use www.bluerinsevintage.co.uk the site comes up. Adam
Getting Started | | bluerinsevintage0 -
Moz Pro Warning: Redirect Chain
I have just signed up to a Moz Pro account, and after it finished crawling my website it gave me a warning about a redirection chain. http://elementpaints.com >> https://elementpIaints.com >> https://www.elementpaints.com I'm trying to find some more information about how to fix this problem but I'm not having much luck. This article I found even says it is not a problem: https://really-simple-ssl.com/knowledge-base/avoid-landing-page-redirects So now I'm even more confused, do I still need to fix this? If so, how do I do that? FYI: I have Lets Encrypt SSL cert installed on my server, and I'm using Cloudflare with Full SSL option and HSTS enabled, and "Always Use HTTPS" option is turned on. | http://elementpaints.com |
Getting Started | | elementpaints0 -
Do you have a tool to track blogger's product reviews?
Can I enter a blogger's product review URL and see what other bloggers have either shared that review on social media or written their own reviews on the product?
Getting Started | | PerfectTD0 -
MOZ Guru Needed for Freelance Work
Hello -- I'm new to Moz and wanted to know if any community members were available for freelance work to help me fine-tune the setup of the Moz account for one of my clients. This could lead to additional work for other clients of my advertising agency.
Getting Started | | KRT0 -
Why Moz New Users Are Left Alone?
New users when sign up to an application like Moz need concrete support to make them familiarized the basics at least, but Moz seems to be not realizing this hardship of the new users or intentionally ignoring it. There is no chat service or phone numbers to quickly contact the designated agent for some quick help. Mail responses that are received after much time have their limitations -- not much interactive. What the new users should do, because the Moz is not that simple for new users. [edited question for formatting]
Getting Started | | Sequelmed1 -
Why does moz show "not in top 50" for all my keywords???
Hello, I signed up to moz pro 4 days ago. And so far it seems to be tracking visits etc. But all my keywords say "not in top 50" . Why is this? Is this normal? Just to confirm most of the keywords i pasted in from my webmaster tools and i only chose the ones that were in top 50
Getting Started | | casper09030 -
Campaign.crawl-seed.bad-response ???
Hello Guys I have just tried to set-up a new campaign for this site http://www.emsababies.co.uk/ HoweverI keep getting this error ?? campaign.crawl-seed.bad-response Anyone know what I am doing wrong ?? Cheers James
Getting Started | | BlueNinja1 -
Can MOZ crawl our website twice in a week?
I want to generate MOZ crawl errors report twice in a week. Is it possible to do that.
Getting Started | | chandman0