Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can someone kindly explain what 'Crawl Issue Found: No rel="canonical" Tags' means? Is this a critical error and how can it be rectified?
Can someone kindly explain what 'Crawl Issue Found: No rel="canonical" Tags' means? Is this a critical error and how can it be rectified?
Moz Pro | | JoshMcLean0 -
Error in Moz duplicate content reports
Hi - I've run the Moz campaign on a client's site. Moz is saying that there are duplicate content errors, and when I look at the errors it is showing that they are all to do with the non-www URLs having being duplicated in the www form of the URLs. However this is not the case - all the non-www URLs are all 301 redirected to the www URLs. Is this an error in the Moz tool? Has anybody experienced something similar?
Moz Pro | | rorynatkiel0 -
5xx Server Errors
Hey all, Recently my site has seen a rise in 5xx errors which have all popped up since the first of the year. Our company made a big announcement on 1/4 which drove a lot of traffic to the site, which I thought may be a reason for the sudden errors. But the errors have since continued. Also, the crawl from last week showed no errors at all (which I'm sure was a fluke) Taking a closer look in GA, the pages that are receiving errors have not been visited for months. Is this a random occurrence or should I consult someone to investigate our server? Thanks,
Moz Pro | | Clayco
Chris0 -
Help with URL parameters in the SEOmoz crawl diagnostics Error report
The crawl diagnostics error report is showing tons of duplicate page titles for my pages that have filtering parameters. These parameters are blocked inside Google and Bing webmaster tools. I do I block them within the SEOmoz crawl diagnostics report?
Moz Pro | | SunshineNYC0 -
How does SEOMoz crawl sites? Does it follow the sitemap?
I only ask because in my report it returned a few "no follow" links. Was wondering how SEOMoz operates.
Moz Pro | | Pikimal0 -
Why am I getting 400 client errors on pages that work?
Hi, I just done the initial crawl on y domain and I sem to have 80 400 client errors. However when I visit the urls the pages are loading fine. Any ideas on why this is happening and how I can resolve the problem?
Moz Pro | | moesian0 -
Broken Links and Duplicate Content Errors?
Hello everybody, I’m new to SEOmoz and I have a few quick questions regarding my error reports: In the past, I have used IIS as a tool to uncover broken links and it has revealed a large amount of varying types of "broken links" on our sites. For example, some of them were links on my site that went to external sites that were no longer available, others were missing images in my CSS and JS files. According to my campaign in SEOmoz, however, my site has zero broken links (4XX). Can anyone tell me why the IIS errors don’t show up in my SEOmoz report, and which of these two reports I should really be concerned about (for SEO purposes)? 2. Also in the "errors" section, I have many duplicate page titles and duplicate page content errors. Many of these "duplicate" content reports are actually showing the same page more than once. For example, the report says that "http://www.cylc.org/" has the same content as "http://www.cylc.org/index.cfm" and that, of course, is because they are the same page. What is the best practice for handling these duplicate errors--can anyone recommend an easy fix for this?
Moz Pro | | EnvisionEMI0 -
How come when I export a error list I can only export the first page?
I am working on fixing the 4xx errors. I have found the easiest way to do this would be to export the list, print it out, and check off the ones i've fixed. The site only lets me export the first page. We'll appreciate any help. Thanks, Ryan D. Gran --Not sure what category this question belongs in so selected SEOmoz Tools--
Moz Pro | | dggusmc0