Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Error Code 902 & 403
Several thousand of these popped up on my Crawl Report and the links appear to be searches, i.e. below 902: http://thespacecollective.com/index.php?route=product/search&tag=nasa+ma-1+jacket%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F 403: http://thespacecollective.com/index.php?route=product/search&tag=periodic+table+tshirt%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F I don't want Moz, let alone Google finding this kind of nonsensical link but I don't exactly know what the problem is or how to fix it. Am I right in thinking these are pages people have searched for? Can anyone shed light on this please?
Moz Pro | | moon-boots0 -
In Crawl Diagnostics, length of title element is incorrect
Hey all, It appears the Moz crawler is misreading the number of characters in my website's page titles. It shows 72 characters for the following page's title element: http://giavan.com/products/orange-crystal-chain-necklace-with-drop The page title for this web page is: Orange Crystal Chain Necklace with Drop | Giavan which is 48 characters. As it stands, this page title is displayed at 48 characters in Google SERPs. I am getting "This Element is Too Long" issue on 925 pages, which is just about the entire site. These issues appeared after I added additional Shopify (Liquid) code to the page title. If you inspect the code, you will see title element looks a bit odd with extra spacing and line breaks. What I'd like to know is whether or not it's necessary to rewrite the Shopify code, for SEM purposes. My feeling is that it's okay because the page titles look fine in SERPs but those 925 Moz crawl errors are kind of scary. Thanks for your help!
Moz Pro | | RichAlbanese0 -
Has any on else experienced a spike in crawl errors?
Hi, Since the last time our sites were crawled in SEOmoz they are all showing a spike in Errors. (Mainly duplicate page titles and duplicate content). We haven't changed anything to the structure of the sites but they are all using the same content management system. The image is an example of what we are witnessing for all our sites based on the same system. Is anyone else experiencing anything similar? or does anyone know of any changes that SEOmoz has implemented which may be affecting this? Thanks in advance, Anthony. WzdQV WzdQV WzdQV.jpg WzdQV.jpg
Moz Pro | | BallyhooLtd1 -
Was there another issue crawling rankings on 9th October?
Again this month I don't appear to have rankings logged for 9th October. Was there an issue earlier in the month?
Moz Pro | | NathanP0 -
Third crawl of my sites back to 250 pages
Hi all, I've been waiting some days for the third crawl of my sites, but SEOMOZ only crawled 277 pages. The next phrase appeared on my crawl report: Pages Crawled: 277 | Limit: 250 My last 2 crawls were of about 10K limit. Any idea? Kind regards, Simon.
Moz Pro | | Aureka0 -
Why won't scheduled crawl of my site begin?
I currently have a campaign running on SEOMoz for over a month. It has been showing that a crawl was scheduled to start on 12/21. Now it's 12/23 and there has not been a new crawl, and it still says scheduled for 12/21.. Anyone know why this is happening or how to fix it? Thanks
Moz Pro | | Prime850 -
How long does the seomoz crawl take?
It's been doing it's thing for over 48 hours now and Ive got less than 350 pages... is this norma? It's NOT the first crawl.
Moz Pro | | borderbound0 -
Most of the time getting error.
Hi, i am getting this error most of the time in linkscape since last month. Sorry dude, no inlinks found matching this criteria. Pl guide is this a bug and the sites I am trying to use linkscape for were having lot of pages crawled earlier by SEOMOZ. Thanks, Preet
Moz Pro | | PreetSibia0