Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best use of Moz Pro
Hello, I want to ask what are the benefits of Moz Pro other than finding low competitive keywords? I want to know the best use of Moz Pro for betterment of my site (chicken grill recipe). I want the best possible result. Guide me. Thanks.
Getting Started | | maxcharles0 -
Risk of moz on website
Does Moz collect information of customers accessing my website https://tructiepbongda.site/?
Getting Started | | gogoanimetp
Thank you!0 -
My campaign hasn't updated since wk of Sept 18-24
Even though it shows it should run weekly the campaign appears to be stuck. Also, it seems like campaign frequency should be a variable under campaign settings, but isn't. Is there a way to manually run the campaign to add the data?
Getting Started | | dwerkema0 -
Crawling Website
How often does Moz crawl the website? I have a lot of high priority crawl issues but my web development team says that it fixed them. However, Moz still shows me that none of the errors are fixed.
Getting Started | | Leoni0 -
Moz Analytics - Custom Report Question
Hello Moz Forums! I'm having problems building custom reports for my clients. So, i've managed to put together a few reports and I've set these to be scheduled for weekly automated emailing! Now, here's the problem, before sending it to my clients, I was testing this by sending it to myself. This is what I receive: http://www.freelance-seo.co.uk/images/Moz.png I don't really want my clients to see 'By Moz!' everywhere when I send this out, is there a way to change this automated mail? Or a way to just just send the PDF without a link? Alternatively, have I just failed in using this product? Edit: Just to clarify! The 'simply change your report settings' option, just takes me to an option to edit the report title, not the email in any way.
Getting Started | | Paul_Tovey0 -
How to remove campaign from Moz.com
I've accidentally created a duplicate campaign, which I am unable to delete, please help
Getting Started | | shopperlocal_DM0 -
Is there a way to force re-crawl?
1- Can we force the system to recrawl a website without having to wait for the scheduled date? 2- Please forgive me as I am new to this, but is "Moz Analytics" supposed to replace the "Pro"? Thank you!
Getting Started | | BlackTreeIT0 -
Oops! This doesn't appear to be a valid URL. Please try again.
Hi, I just started with using MOZ and at my first attemp I can not start a new campaign because my url doesn't appear to be valid.. I tried http://www.mydomain.nl , www.mydomain.nl and mydomain.nl.. It's a Dutch extension (.nl). Can this be the problem? Excuse me my spelling and typo's. I'm Dutch 🙂
Getting Started | | SWP0