Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Crawl Test error
Moz crawl test show blank report for my website test - guitarcontrol.com. Why??? Please suggest.
Moz Pro | | zoe.wilson170 -
Find a 4xx or 5xx link referenced in an SEO Crawl Report
So I just got the Crawl Diagnostics report for a client site and it came back with a number of 4xx errors and even 1 5xx error. So while I can find the URL that has the problem, I cannot find the pages that have the links pointing to these non-existent or problematic pages. Normally I would just search the database for the site, but in this case I don't have access to it as the site is on a proprietary platform with no access other than to the CMS. Is there anyway to get the linking URL from the report? Thanks!
Moz Pro | | farlandlee0 -
Duplicate content error?
I am getting a duplicate content error for the following pages: http://www.bluelinkerp.com/products/accounting/index.asp http://www.bluelinkerp.com/products/accounting/ But, of course, the 2nd link is just an automatic redirect to the index file, is it not? Why is it thinking it is a different URL? See image. NJfxA.png
Moz Pro | | BlueLinkERP0 -
Crawl Test - Taking too long
The last crawl test I invoked seems to be in progress for over 24 hours. The one before that completed in a few hours. Wish there was a progress indicator or an option to cancel. The crawl (from Tool > Crawl Test) should not take this long. Any ideas or suggestions? Also, the keyword research tool (plus a few others) have been down ever since I signed up. Is this a normal?
Moz Pro | | MomoMasta0 -
Was there another issue crawling rankings on 9th October?
Again this month I don't appear to have rankings logged for 9th October. Was there an issue earlier in the month?
Moz Pro | | NathanP0 -
SEOMoz Crawling Only 1 Page
I entered a new site into my dashboard 2 days ago - everything looked kosher, there were a few hundred pages crawled and a whole bunch of errors. I came back this morning to start work on the site and SEOMoz has crawled the site again, this time returning only 1 page and 0 errors. I haven't even logged in to the site since the first crawl, so I couldn't have broken anything. Has anyone seen this before?
Moz Pro | | Junction0 -
Has anyone else not had an SEOmoz crawl since Dec 22?
Before the holidays, I completed a website redesign on an eCommerce website. One of the reasons for this was duplicate content. The new design has removed all duplicate content. (Product descriptions on 2 pages) I took a look at my Crawl Diagnostics Summary this morning and this is what I saw: Last Crawl Completed: Dec. 15th, 2011 Next Crawl Starts: Dec. 22nd, 2011 Thinking it might have something to do with the holidays. Although, I would like to see this data as soon as possible. Is there a way I can request a crawl from seomoz ? Thanks, John Parker
Moz Pro | | JohnParker27920 -
Crawl test. Bot crawled only 200 or so links when it should have crawled thousands
Hi everyone, I just recieved my crawl test report and its only given me 200 or so URL's when my site has thousands, any thoughts?
Moz Pro | | Ev840