Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ Extension is not Working Properly
I'm using MOZ BAR extension on Chrome. It's not working properly on this WEBSITE. Can anyone tell me that is it an issue with this particular site or there is an issue in MOZ BAR. Furthermore, I have to logout and re-login every 30 minutes in order to see results. Thanks in Anticipation
Moz Bar | | yenahi85641 -
Are we actually getting accurate data on keyword volumes from Moz (or other sources)?
I have a client who does patio furniture repair and restoration. When performing keyword research in Moz for terms like "patio furniture repair" I see that only 11-50 people in the entire US are searching for this term according to the Moz data. However, running an Adwords campaign currently and our top keyword is the phrase match for "patio furniture repair" which has generated over 100 clicks in just a couple of months in ONE county. Is there a better way to research more accurate results on search volume estimates? This makes organic SEO and keyword targeting hard! Thanks, Ricky
Moz Bar | | RickyShockley1 -
Does "Disallow: /xmlrpc.php" in robots.txt affect moz tools ability to fetch DA?
Just checked a website for Domain Authority using Moz' tool, however it returned 1 for DA, which should be unlikely. I have been trying to find the problem and found "Disallow: /xmlrpc.php" in robots.txt. Could this affect Moz' tools ability to get the required data?
Moz Bar | | Foli0 -
Does Moz's keyword tool pull data from your IP address?
Does anyone know how Moz's keyword tool pulls their keyword ranks? Do they take it based off of the IP (history and cookies) that is being used? I am trying to find a way to collect keyword data that is neutral and not based off of my previous searches, etc. TIA
Moz Bar | | ReviveMedia0 -
Link to hotels on http://moz.com/mozcon doesn't work
Hi The link to the hotel for Mozcon 2015 doesn't work - seems like its the 2014 link still in place. Thanks Andy
Moz Bar | | Andy-Halliday0 -
Can't get on page grader to work properly.
Hi I'm trying to optimize my pages with the on page grader tool but it keeps returning an F grade when I know my page is very well optimized. It is like something is blocking the page crawl but I have double checked my robots.txt and can't think of anything else that is causing a problem. I am trying to do www.hydrohobby.co.uk with the keyword hydroponics for starters but it is the same problem with all other page urls on my site and keywords I try to input. This is a new site made with cscart 4.0. I've graded pages on previous versions of this software with no problems. Can anyone help? Rob
Moz Bar | | hydrohobby0 -
Error for a page that doesn't exist.
Hi, I'm just trailing this service, and I have a couple of questions that I hope someone can help with. 1. I am getting a high priority error regarding a page not being able to be crawled - a 4XX error. Problem is, there is no such page in existence. The URL is my site/comments/feed It's driving me crazy. 2. I'm also getting errors based on missing meta tags in blog posts. I am adding tags at the time of posting, so I am unsure why these errors are showing up. Actually, I didn't add tags to all posts - but there are errors on ALL posts, even those I added tags to. Any help would be wonderful. Thanks!!! Hugh
Moz Bar | | hughanderson0 -
Moz reporting appropriate Canonical tag usage but no canonical tag on page !?
I take it this means that the page in question has been referenced via a different pages canonical tag but that the page in question itself does not have a self referencing canonical tag (and that it should do) cheers dan
Moz Bar | | Dan-Lawrence0