Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Changing the Moz Crawl Date
Hello, I am wondering where I can change the date of Crawl by Moz. I would like to change this crawl period from one week to 2 or even 3 weeks for Moz to crawl my website. Hope to hear from anyone soon. Kind regards, Koen.
Getting Started | | Koenniiee1 -
When I crawl my site On Moz it says it can't access the robots.txt file, but crawl is fine on SEM Rush - Anyone know any reason for this?
Hi guys, When I try to run a site crawl on Moz it returns an error saying that it has failed due to an error with the robots.txt file. However, my site can be crawled by SEM Rush with no mention of problems with roots.txt file issues. My developer has looked into it and insists their is no problem with my robots.txt and I've tried the Moz crawl at least 6 times over an 8 week period. Has anyone ever seen such a large discrepancy between Moz and SEM Rush or have any ideas why Moz has this issue with my site?? TIA everyone
Getting Started | | Webreviewadmin0 -
SSL - green padlock but Moz say there's an 804 error?
Hi, my site has a green padlock and no SSL errors but Moz are reporting an 804 error. I use CloudFlare with fairly complex settings. I've read this thread but it's quite old and I don't understand which parts of it are still valid. I'd love to know whether this can be sorted before I spend hours setting up Moz's features as if they can't crawl my site then I would obviously need to cancel my subscription. Thanks
Getting Started | | Barn2Plugins0 -
Why wont rogerbot crawl my page?
How can I find out why rogerbot won't crawl an individual page I give it to crawl for page-grader? Google, bing, yahoo all crawl pages just fine, but I put in one of the internal pages fo page-grader to check for keywords and it gave me an F -- it isn't crawling the page because the keyword IS in the title and it says it isn't. How do I diagnose the problem?
Getting Started | | friendoffood0 -
Getting started with moz
Hi i use to be a user of SEO moz before the change to just Moz, however i am struggling to navigate around and work on campaigns. I need to know what steps to take methodically from setting up a campaign, keyword research, competitors and monitoring SERPs results. in addition improving the page grade reports and analysing keyword difficulty. Whilst Moz assist you to set up some of this initially when creating the campaign i feel it doesn't seem to take you through a logical methodical step process of configuring in depth the steps of ensuring the correct settings are relevant to the campaign i.e. an outline of the steps and i need to take that follow on from each other and getting your campaign completed. for example: create campaign keyword research onpage optimisation competitor research link finder analyser run reports that related to the info provided above I feel lost in the features of moz whilst i can see they are highly beneficial putting them to use in a chronological order to ensure the the correct setup and make use of these tools. i.e. where to start and where to end currently i feel i can only find where to start and what i should do after that to make use of Moz fully is somewhat missing. Thanks in advance, any links and direct appreciated, i would alsolike to possible speak to a Moz team member regarding my account setup if possible.
Getting Started | | mari-rose0 -
Campaign.crawl-seed.bad-response ???
Hello Guys I have just tried to set-up a new campaign for this site http://www.emsababies.co.uk/ HoweverI keep getting this error ?? campaign.crawl-seed.bad-response Anyone know what I am doing wrong ?? Cheers James
Getting Started | | BlueNinja1 -
Where to find answers to really dumb questions about setting up Moz campaigns...
I had a Moz account earlier this year, let it drop for a few months, then decided it really was important to have one. So I'm back. But there is still something about it that I really find frustrating. I can't seem to find any answers to what seem to be basic questions. Maybe they are too simple, maybe they are just dumb questions, but I sure would appreciate it if somebody could lead me to another source of information. Maybe somebody has written a "Moz for Dummies" book? Here's an example: I'm setting up campaigns for my websites, and I get to the Brand and Mentions section. It tries to autofill the space using my campaign name, but that's not accurate. So I go look at the Help Hub for Brands and Mentions, and it gives an example using MOZ as the brand name. But not many sites have a three-letter domain name, do they?. One of my domains is EasyDigging.com (2 words) and the other is BestDryingRack.com (3 words). So I look and look, but can't find any examples of longer domain names. So I have no idea how to enter my Brand. Should it be the whole "EasyDigging.com" or should it just be "EasyDIgging" or should it be broke up into "Easy Digging"? This is just 1 example. I find these sort of unanswered questions almost every time I try something new. Is there a collection of examples anywhere? I really hope so. Surely there are others who learn best from examples, who hate having to guess based on slim or vague instructions. Keeping my fingers crossed that somebody can lead me to the goldmine of good examples. Thanks!
Getting Started | | GregB1230