Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best Ways to Use Moz to Increase Rankings
Hello, I work for an online retailer and I have been using Moz for a few weeks now but have had limited success so I wanted to ask how I could better spend my time on this platform. I have been focusing heavily on the Page Optimization tool which has allowed me to rank high on the Google Shopping's Free Listings but it seems to have had little to no impact on the Google All search itself over those weeks. Perhaps I'm not utilizing it properly? I tend to focus on relevant keywords with high volume as identified by the Keyword Explorer tool. Alternatively, is there a Moz tool that might be more helpful? I can provide additional details or specific examples if needed. Thank you for your consideration,
Getting Started | | ForestGT
Forest1 -
Can I access old data/keyword research if I cancel my Moz Pro account?
I'm currently on the free month trial period for Moz Pro and I will probably cancel the account before the free period ends, but if I want to renew my subscription later, what happens to all the previous data? And does all the keyword research I've done disappear when I cancel it, or is it restored when I renew the subscription? Any insight is helpful! Thank you!
Getting Started | | TeamOneRep0 -
Why does the moz bar link count differ from the moz link analysis page
Hi all, Why does the Moz Bar show a different link count from the Moz link analysis page? For example, when I check the SERP below, for the first result, the bar shows 936 page links from 4 RDs. But when I check out the link analysis page, it tells me there are just 141 page links from 4 RDs. What gives? For the second entry, the bar shows 6 page links from 0 RDs. Not sure how that's possible. Can anyone explain these things. Thanks! Andy Reviewed SERP: https://www.google.com/search?source=hp&ei=hhRVW5yyH5C60PEP-_-isAw&q=mountain+bike+trails+near+me&oq=mountain+bike+trails+near+me&gs_l=psy-ab.3..0l7j0i22i30k1l3.645.4654.0.5322.28.19.0.8.8.0.243.2920.3j11j5.19.0....0...1c.1.64.psy-ab..2.26.2961...0i131k1.0.C4lAxLkGgH0
Getting Started | | AndyKubrin0 -
Moz Pro Warning: Redirect Chain
I have just signed up to a Moz Pro account, and after it finished crawling my website it gave me a warning about a redirection chain. http://elementpaints.com >> https://elementpIaints.com >> https://www.elementpaints.com I'm trying to find some more information about how to fix this problem but I'm not having much luck. This article I found even says it is not a problem: https://really-simple-ssl.com/knowledge-base/avoid-landing-page-redirects So now I'm even more confused, do I still need to fix this? If so, how do I do that? FYI: I have Lets Encrypt SSL cert installed on my server, and I'm using Cloudflare with Full SSL option and HSTS enabled, and "Always Use HTTPS" option is turned on. | http://elementpaints.com |
Getting Started | | elementpaints0 -
SSL - green padlock but Moz say there's an 804 error?
Hi, my site has a green padlock and no SSL errors but Moz are reporting an 804 error. I use CloudFlare with fairly complex settings. I've read this thread but it's quite old and I don't understand which parts of it are still valid. I'd love to know whether this can be sorted before I spend hours setting up Moz's features as if they can't crawl my site then I would obviously need to cancel my subscription. Thanks
Getting Started | | Barn2Plugins0 -
How do I update the crawl issues & Notifications?
I have a list of errors, most relating to missing meta descriptions. I have added a meta description to a page, visited the site and viewed the source, and the meta description is now there. When I go to analyze issues, the report it gives back for the link contains the same missing meta description as previously. How do I get it to update and realize the issue has been fixed?
Getting Started | | ETGg0 -
I can't add Google+
I just signed up for the beta analytics, when trying to add our CEO Google plus it asks to create a new G+ profile? Why can't I add her Google + without creating a new one?
Getting Started | | KatherineKotaw0