Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is the Moz tool bar page analysis saying my website is from Romania when we are in the United States?
So when I go to my client's website, https://www.paracore.com/ and on the home page, I use the Moz toolbar. From there I then use the "Page Analysis." If you look at the "URL" line there is a Romanian flag next to the site name. Then I scroll down within the page analysis and the "Country" line says Romania. This is a WordPress site, and the company is based in Arizona. Can anyone explain to me if this is code that I can find and change or remove? Any insight would be greatly appreciated.
Moz Bar | | Striventa1 -
Different Errors Running 2 Crawls on Effectively the Same Setup
Our developers are moving away from utilising robots.txt files due to security risks, so e have been in the process of removing them from sites. However we, and our clients still want to run Moz crawl reports as they can highlight useful information. The two sites in question sit on the same server with the same settings (in fact running on the same Magento install). We do not have a robots.txt files present (they 404), and as per Chiaryn's response here https://moz.com/community/q/without-robots-txt-no-crawling this should work fine? However for www.iconiclights.co.uk we got: 902 : Network errors prevented crawler from contacting server for page. While for www.valuelights.co.uk we got: 612 : Page banned by error response for robots.txt. These crawls were both run recently, and there was no robots.txt present. Not to mention, they are on the same setup/server etc as mentioned. Now, we have just tested this, by uploading a blank robots.txt file to see if it changed anything - but we get exactly the same errors. I have had a look, but can't find anything that really matches this on here - help would really be appreciated! Thanks!
Moz Bar | | I-COM0 -
Community Discussion - What's Been Your Experience With Moz Content?
When the content developed Moz Content, I was excited as can be about having another tool in the content marketing and content strategy repertoire. I knew it could and would help marketers better identify the content they should be creating and make it easier for them to move the needle for their brands. Since it's been available, I've had fun using Moz Content, seeing it as a great vehicle for flattening the learning curve for content ideation and creation. In a recent post, Here's How I'm Using Moz Content for Mining Local Link Opportunities, David Farkas described how brands can use Moz Content to better create localized content. I'd like to know how you're using it, or if you're using it: Have you tried Moz Content? And if not, what's stopping you? If you have used it, what are you really liking? What would you change? What, if any, additional features you'd like to see added? What tips can you share for helping others get the most out of the tool? Looking forward to reading the comments below.
Moz Bar | | ronell-smith3 -
Why do my search results differ from MOZ's rank tracker
This is starting to happen a lot, i mean they weren't always an exact match but they differed by a few places. But now the gap between results I'm getting and MOZ's own rank tracker is quite large. For my keyword my page ranks on MOZ at 39 (it was 25 but has slipped down). Im seeing my page on page 1 locally and page 2 in incognito mode. Now I understand there are other factors such as browser history, cookies, am i logged into gmail etc. Thats why I asked colleagues to use Internet explorer and they have nothing to do with SEO so their history wont affect the search. They report seeing it on page 2, even colleagues in a different office in a different city sees it on page 2. I want to contact the department in question and share the good news that they've gone from none existent to 14th in what is a very competitive area. But MOZ's result has be second guessing whether I should. Any ideas why the gap between results is so large? Thanks
Moz Bar | | Brabian0 -
Why does the Moz Tool Bar show code as HTML text for my site?
When I run my site in the Moz Tool bar, all of the page elements are read correctly except for HTML text. Sample page: http://www.lifeionizers.com/blog/alkaline-water/electrolyzed-reduced-water Instead of the text on that page, the tool shows Javascript code: /* With a very high character count for text (14,247).
Moz Bar | | karasd0 -
I have been unable to use the export button on any of the Moz pages.
Every time I click the export button on the upper right portion of the page, it says "preparing to export", but either nothing is exported or a blank document is downloaded. I was specifically working with the Fresh Web Explorer but have noticed this on all pages. Is anyone getting this export function to work?
Moz Bar | | Manseo0 -
How do I export my keywords from Moz?
Simple question: once you've built up a big set of keywords within Moz, how do you export it back out to use in other places?
Moz Bar | | tcolling0 -
Moz analytics not updating
Okay so I was invited to moz analytics. When I received the email I was stoked to get to use the new beta software. My campaigns transferred over ,but when I began to look at the data, it said updating check back in 24 hours or something along those lines. I thought okay that is fine, but to my suprise it has been around four days since then and it still says it is updating. It also shows weekly stats of visits but the number there is definitely wrong. It said I only had around 2,100 but I get more than that daily. Anyone in support that can help? I'm confused on what I can do to fix this issue. I understand it is just a beta ,but other people, from what I have seen, haven't had a similar issue. If anyone can point me in the right direction I'd appreciate it!
Moz Bar | | ithvac0