Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Blog archive pages in Craw Error Report
Hi there, I'm new to MOZ Pro and have a question. My scan shows Archive pages as having crawl issues, but this is because Yoast is set up to block robots on these pages. Should I be allowing search engines to crawl these pages, or am I fine to leave them as I have it set up already? Any advice is greatly appreciated.
Moz Pro | | mhenshall
Marc0 -
Does MOZ still do deep crawls of the website?
In the past you could get MOZ to crawl your website, now I don't see this option, no do I see a crawl at the beginning of the month. Has this change? I saw this as a useful feature.
Moz Pro | | cdgospel0 -
Woocommerce filter urls showing in crawl results, but not indexed?
I'm getting 100's of Duplicate Content warnings for a Woocommerce store I have. The urls are
Moz Pro | | JustinMurray
etc These don't seem to be indexed in google, and the canonical is for the shop base url. These seem to be simply urls generated by Woocommerce filters. Is this simply a false alarm from Moz crawl?0 -
Has the Crawl Test gone?
Just checked the new Moz, am I right in thinking the super useful crawl test functionality has gone? I use it for existing sites to download all the title tags and meta name descriptions, is there more to come??
Moz Pro | | Karen_Dauncey0 -
Crawl Diagnostics
Hello, I would appreciate your help on the following issue. During Crawl procedure of e-maximos.com (WP installation) I get a lot of errors of the below mentioned categories: Title Missing or Empty & Missing Meta Description Tag for the URLs: http://e-maximos.com/?like_it=xxxx (i.e. xxxx=1033) Any idea of the reason and possible solution. Thank you in advance George
Moz Pro | | gpapatheodorou0 -
Crawl Diagnostics Warnings - Duplicate Content
Hi All, I am getting a lot of warnings about duplicate page content. The pages are normally 'tag' pages. I have some news stories or blog posts tagged with multiple 'tags'. Should I ask google not to index the tag pages? Does it really affect my site? Thanks
Moz Pro | | skehoe0 -
If i have just started a campaign, and the crawl is happening - can i log off and shut down my PC
If i have just started a campaign, and the crawl is happening - can i log off and shut down my PC or will this cause the crawl to stop. My campaign was showing as "crawl in process". I then shut down and logged onto my SEOmoz account on another PC. When I did the crawl didnt seem to be in process, even though the total time was still about 30 min from the crawl start...... sorry if this is a stupid question, just a bit new to this
Moz Pro | | duff1010 -
Is The Crawl Diagnostic tool working correctly?
The Crawl Diagnostic tool shows issues and displays a graph but they don't display the page specific results/suggestion like it used to. I get the "Congratulations, there are no pages affected by this issue!" message.
Moz Pro | | -PAUL-0