Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved What are the top 5 things you recommend you should do when you first subscribe to Moz?
Hi
Getting Started | | djnlowe
Im new to Moz and would like to get the most out of it as quickly as possible. so like the subject say - What are the top 5 things you would recommend I should to get the most out of Moz? #getstarted0 -
What is the IP address of the MOZ crawlbot?
Our preproduction site is restricted to a specific IP range. Prelaunch I'd like to use MOZ to check the site but this obviously isn't currently possible. Is there an IP range I can whitelist ?
Getting Started | | mrphome0 -
Moz scraper
How often do you Moz do whatever it is they do for me to get up-to-date data?
Getting Started | | infinety0 -
Why wont rogerbot crawl my page?
How can I find out why rogerbot won't crawl an individual page I give it to crawl for page-grader? Google, bing, yahoo all crawl pages just fine, but I put in one of the internal pages fo page-grader to check for keywords and it gave me an F -- it isn't crawling the page because the keyword IS in the title and it says it isn't. How do I diagnose the problem?
Getting Started | | friendoffood0 -
Hi, I'm looking to find out why a google+ account that was rarely used has 10,000 views. I want to discover what sites it is linked to. I entered the page url but no joy. can anyone help?
I would like to find out where all this traffic is coming from. It is most likely from an out of date sales site etc, but it's important to find out as it could be the result of hacking etc. It appears the page is linked to another site and I would like to find out which one(s) Entering the page url is not getting results, can anyone help?
Getting Started | | cyganswenia0 -
My site is not being fully crawled
Our site has been crawled several times by RogerBot but each time only 6 pages are crawled even though we have more than 100 pages. Do I need to submit my sitemap.xml to Moz?
Getting Started | | Scurri0 -
Fastest way to set up Moz to produce a list of easy-to-rank for keyword opps
I've used Hubspot's keyword tool previous to MOZ, and I know that they import data generated from MOZ into their tool... but I find it a much more convoluted process using Moz's more robust system. ALL I want is to dump in a long list of keyword possibilities, then see where we can easily rank for these terms, especially against our local competitors. How do I set this up???? It's not as self-explanitory as Hubspot! Thanks!!!
Getting Started | | uptongirl340 -
Why does moz show "not in top 50" for all my keywords???
Hello, I signed up to moz pro 4 days ago. And so far it seems to be tracking visits etc. But all my keywords say "not in top 50" . Why is this? Is this normal? Just to confirm most of the keywords i pasted in from my webmaster tools and i only chose the ones that were in top 50
Getting Started | | casper09030