Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
MOZ Extension is not Working Properly
I'm using MOZ BAR extension on Chrome. It's not working properly on this WEBSITE. Can anyone tell me that is it an issue with this particular site or there is an issue in MOZ BAR. Furthermore, I have to logout and re-login every 30 minutes in order to see results. Thanks in Anticipation
Moz Bar | | yenahi85641 -
I added a privacy policy link to my footer and now Moz is showing thousands of 4xx errors
My website didn't have a privacy policy so I added one and put the link in the footer menu. When I did this, Moz came back telling me that there are a lot of new errors on the site. Is this a bad thing? Do I need to address it? HY59Iks sYyAHCB
Moz Bar | | elisa175910 -
Moz Bar doesn't show any data and keeps asking me to log in when actually I'm logged in.
Hi all, I've been using Moz Bar for years. It ran well until about three weeks ago. It suddenly failed to show the DA and PA of sites that I open after I log in. And it keeps asking me to log in when I did. I tried to uninstall the Mozbar extension and reinstalled it several times. Nothing worked. I also tried to uninstall Chrome and clear the cookies, still, nothing changed. Did anyone experience this? How do you solve it and make it run on the track? Any information will be appreciated. [admin edited support category]
Moz Bar | | Bennie22339 -
Site crawl warning - concatenated urls from Wordpress
I could use some help on how to fix this. I asked at the walkthrough but was told it was a Wordpress issue but so far I can't find anything to point me in the right direction. There are no errors in the files on server side and I have asked my hosting company too. I am hoping someone here may be able to shed some light on it. One of my websites it giving 404 errors on links that are formed as below and there are over 12.7K of them! Example: <mydomainurl>/www.instagram.com/www.instagram.com/<instagram username=""></instagram></mydomainurl> The link that relates to my website is valid and working, but I don't understand the rest. I am totally stumped on how to move forward with this. Any advice, suggestions, tips on how to fix these errors and stop these types of links getting generated. Thanks.
Moz Bar | | emercarr0 -
How does a non-traditional TLD impact Moz's crawl test?
I have a client who moved from a .com to .academy domain 6 months ago, and their current crawl tests are coming back with a universal page authority of 1, along with 0 indexed backlinks. The previous version of the site had an average page authority of 35-40, the site architecture and content are nearly identical, and there are no other errors or red flags in the crawl report that would hold back their organic rankings. In fact, looking at the site's analytics account, I can see dozens of sites that provide current and properly functioning backlinks, non of which are listed on the crawl test. So the question is - is Moz currently unable to properly crawl a .academy (or any other non-traditional TLD) site, or is there some deeper issue with the site's SEO that I'm not seeing? Thanks!
Moz Bar | | ThinkAOR1 -
Getting 'Sorry, but that URL is inaccessible' error msg when trying to run On-Page Grader
I just signed up for MOZ Pro for the first time today. Tried to run the 'on-page grader' tool on some of my pages but I'm getting a 'Sorry, but that URL is inaccessible' error msg. I have verified against the robot.txt file that the pages are NOT blocking any crawlers. Can anybody help?
Moz Bar | | spinoki0 -
I am not able to perform crawl test in moz tools
it is throwing there is some problem in domain when i try testing the crawl test for my domains
Moz Bar | | IBEE-Hosting0 -
Moz keywords tool obsolete?
It looks like Google is going to encrypt all user searches, rendering entire sections of SEO tools useless like portions of Moz. What's Moz's reaction to something like this? http://blog.hubspot.com/google-encrypting-all-searches-nj
Moz Bar | | BlueLinkERP0