Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it possible to block Moz from crawling sites?
Hi, is it possible to stop Moz from crawling a site at the server level? Not that I am looking to do this or anything, but here's why I'm asking. I have been crawling a site that is managed (currently by 2 parties), and I noticed that this week pages crawled went from 80 (last week) to 1 page!! I know, what? See my image attached... and the issues all went to zero "0"....! So is it possible that someone can't prevent Moz from crawling the site at the server level? I checked the robots.txt file on the site, but nothing there. I'm curious. dYNUwjd.jpg
Moz Pro | | co.mc0 -
How do I exclude my blog subfolder from being crawled with my main domain (www.) folder?
I am trying to setup two separate campaigns for my blog and for my main site. While it is easy enough to do through the wizard the results I am getting for my main site still include pages that are in my blog sub folder. Please Advise!
Moz Pro | | sameufemia0 -
Crawl Diagnostics - Crawling way more pages than my site has?
Hello all, I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform). Any reasons this would be happening?
Moz Pro | | LodestoneGen0 -
"Does not respond to web requests" error
When trying to set up a new campaign I get the following message:
Moz Pro | | bshanahan
"Roger has detected a problem: We have detected that the domain www.chicagofinancialadvisers.com does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information." Can someone please tell me what I need to do on my site to make this work? I haven't seen this before and have done many other campaigns. Thanks a lot!0 -
90% of our sites that are designed are in wordpress and the report brings up "duplicate" content errors. I presume this is down to a conical error?
We are looking at getting the Agency version of SEOMoz and are based in the UK Could you please tell me what would be the best way to correct this issue as this appears to be a problem with all our clients websites. an example would be www.fsgenergy.co.uk Would you also be able to suggest the best SEO plugin to use with SEOMOz ? Many thanks Paul
Moz Pro | | KloodLtd1 -
Wordpress Plugin Causing Mobile Switcher 404 Errors
Hi All, Has anyone seen this 404 error before? I installed a wordpress plugin that would allow users to switch between a mobile and desktop theme. This was about 6 months ago and I thought nothing of it. Since I am new to SeoMoz, I have become aware of this lovely problem. After setting up my campaign, I now have 1600 something 404 errors due to this plugin. It looks like this plugin creates 4 to 5 links for each one of my posts and they all return up as a 404 error. Example: http://frogfanreport.com/football/page/36/ or http://frogfanreport.com/football/page/36/?wpmp_switcher=desktop I just noticed this morning in Google Webmaster tools that the errors are starting to show up there. Has anyone seen this? Or know if this is a problem and what to do? zach
Moz Pro | | TCUFrogFanReport0 -
Why Is SEOMOZ No Longer crawling All Of My Site
Hi all, I joined Seomoz over a month ago and Roger has been crawling all of the pages on the site approx 20 pages. Through out the last few weeks I have been working on the errors and notices identified by Roger. However, this week Roger has only re-crawled 1 page and is not picking up all the other pages. Has any one come across this problem. can you recommend any thing to resolve it? Many thanks in advance....
Moz Pro | | Dan280 -
Dismiss crawl diagnostics error
Hello everyone, Is there a way to dismiss some errors in the Crawl Diagnostics tool so they don't appear again? It happens so that some of the errors are never going to be fixed because of their nature. For example, 'Title too long' errors that point to some of the threads on my forum - it doesn't make sense to change the title of a thread posted by user just for the sake of the error disappearing from the 'Crawl Diagnostics' tool. 🙂 Otherwise the CD interface gets a little bit cluttered with errors which I will never fix anyway. I wonder how others deal with this problem. Thanks.
Moz Pro | | MaratM0