Crawl Errors Confusing Me
-
The SEOMoz crawl tool is telling me that I have a slew of crawl errors on the blog of one domain. All are related to the MSNbot. And related to trackbacks (which we do want to block, right?) and attachments (makes sense to block those, too) ... any idea why these are crawl issues with MSNbot and not Google? My robots.txt is here: http://www.wevegotthekeys.com/robots.txt.
Thanks, MJ
-
I'm a little late to the party, but I want to summarize what I see as the answer.
1. The "Search Engine Blocked by Robots.txt" is only a warning, and not an error. If you intend for these pages not to get crawled (and it does seem like you have a good reason for this), then there is nothing to worry about.
2. The reason the warning appears for MSNbot and not Google is that currently, your robots.txt allows Google to crawl those files. As Daniel pointed out, you would need to add the identical directives to your robots.txt file to make this happen. Does that make sense? Or you could just add all of these files under the * directive to apply to all robots.
-
Yes, I thought that's what you meant ... thanks!
-
I am saying this:
User-agent: Googlebot Noindex: /key-west-blog/*?* Noindex: /key-west-blog/*.rss Noindex: /key-west-blog/*feed Noindex: /key-west-blog/*trackback Noindex: /key-west-blog/*wp- Noindex: /key-west-blog/tag/ Noindex: /key-west-blog/search/ Noindex: /key-west-blog/archives/ Noindex: /key-west-blog/category/ Noindex: /key-west-blog/2009 Noindex: /key-west-blog/2010 and this:
User-agent: Googlebot-Mobile
Noindex: /key-west-blog/?
Noindex: /key-west-blog/*.rss
Noindex: /key-west-blog/*feed
Noindex: /key-west-blog/*trackback
Noindex: /key-west-blog/*wp-
Noindex: /key-west-blog/tag/
Noindex: /key-west-blog/search/
Noindex: /key-west-blog/archives/
Noindex: /key-west-blog/category/
Noindex: /key-west-blog/2009
Noindex: /key-west-blog/2010They use Noindex which is a syntax I am unfamiliar with in robots.txt. So you can check out http://www.robotstxt.org/robotstxt.html for more info on robots.txt and proper syntaxt. I would change Noindex: to Disallow: and that should fix the error in the robots.txt file.
-
The robots.txt file DOES contain
User-agent: Msnbot Crawl-delay: 120 Disallow: /key-west-blog/*?* Disallow: /key-west-blog/*.rss Disallow: /key-west-blog/*feed Disallow: /key-west-blog/*trackback Disallow: /key-west-blog/*wp- Disallow: /key-west-blog/*login.php Disallow: /key-west-blog/tag/ Disallow: /key-west-blog/search/ Disallow: /key-west-blog/archives/ Disallow: /key-west-blog/category/ Disallow: /key-west-blog/2009 Disallow: /key-west-blog/2010 But you are saying I should remove the lines with noindex?
-
In your robots.txt file, you have the Disallow: command under MSNbot and Noindex: under Googlebot. Noindex is not a robots.txt command. Change Noindex: to Disallow: and those pages will be blocked for all bots. Not sure if that is what is causing the issue, but that would explain the discrepancy. If you want to noindex a page, you do it with a meta tag like this:
You can change follow to nofollow if you want, really doesn't matter much.
-
I have the same problem looks like MSN bot is disallowed from accessing wordpress content. So pages show up as ?page=111 so from what I understand so far anything that shows as below is blocked from MSNbot. I don't have a definite answer for you as to what to do, but I can tell you will need to "allow" msn bot the googlebot is.
Disallow: /key-west-blog/*?*
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I got an 803 error yesterday on the Moz crawl for most of my pages. The page loads normally in the browser. We are hosted on shopify
I got an 803 error yesterday on the Moz crawl for most of my pages. The page loads normally in the browser. We are hosted on shopify, the url is www.solester.com please help us out
Moz Pro | | vasishta0 -
Crawl test from tools
Hi, I notice that the crawl test which is from the Research Tools doesn't really get a new crawl even though there is 2 crawl per day. It will only provide the data which was acquire from the crawl diagnostics in my pro account. There is no point for me to get the data which I get from my crawl diagnostic isn't it? Even seomoz provided with more than 2 crawl per day also useless in this case. This whole thing doesn't make sense as the crawl diagnostics will only perform a full crawl test once every week. but even the crawl test also not helping any thing out for me.
Moz Pro | | hanzoz0 -
Is it possible to re-run crawl in seomoz pro campaign?
Hello, for my campaign designzzz.com the crawl ran today, and next crawl is after a week. but unfortunately when the crawling was in progress i had a little issue on site which added lots of 500 server errors and other notifications. So is it possible to re-run the crawl ?
Moz Pro | | wickedsunny10 -
Error on SEOMoz When Trying to Track Website. Please Advise
Hi, I'm trying to start a new campaign for a root domain, but I'm getting the "Roger found an error" and am not sure what to make of it. Error #1: "You've decided to set up a root domain campaign, but entered the subdomain path: www.siteurl.com. Don't worry, we'll switch that for you and crawl everything on the subdomain: www.siteurl.com. If you meant to set this up to only crawl pages in the root domain, click 'Go back and Change" and enter a root domain URL in step 1." Error #2: "Oops! The root domain siteurl.com redirects to a domain that is not within the specified root domain (www.siteurl.com). This will cause us to stop crawling as the first discovered page falls outside of the root domain you've defined. Please make sure you enter a root domain that resolves to a page that is under the root domain." What does this mean? Is there something I am doing wrong? The first error is what returned when I input www.siteurl.com. The second was returned when I put just siteurl.com. I didn't put up the exact URL for privacy reasons, but if you really do want to help me out, PM me and I can give you the real URL. Thanks in advance!
Moz Pro | | locallyrank0 -
Crawl Diagnostics Warnings - Duplicate Content
Hi All, I am getting a lot of warnings about duplicate page content. The pages are normally 'tag' pages. I have some news stories or blog posts tagged with multiple 'tags'. Should I ask google not to index the tag pages? Does it really affect my site? Thanks
Moz Pro | | skehoe0 -
What does this error mean when starting a new campaign
hi just about to start a new campaign to i can monitor my website and i get this message up but not sure what this means. Roger has detected a problem: We have detected that the domain www.headlineplus.com and the domain headlineplus.com both respond to web requests and do not redirect. Having two "twin" domains that both resolve forces them to battle for SERP positions, making your SEO efforts less effective. We suggest redirecting one, then entering the other here. can anyone please let me know why this message comes up please
Moz Pro | | ClaireH-1848860 -
Why won't scheduled crawl of my site begin?
I currently have a campaign running on SEOMoz for over a month. It has been showing that a crawl was scheduled to start on 12/21. Now it's 12/23 and there has not been a new crawl, and it still says scheduled for 12/21.. Anyone know why this is happening or how to fix it? Thanks
Moz Pro | | Prime850 -
Crawl Diagnostic Errors
Hi there, Seeing a large number of errors in the SEOMOZ Pro crawl results. The 404 errors are for pages that look like this: http://www.example.com/2010/07/blogpost/http:%2F%2Fwww.example.com%2F2010%2F07%2Fblogpost%2F I know that t%2F represents the two slashes, but I'm not sure why these addresses are being crawled. The site is a wordpress site. Anyone seen anything like this?
Moz Pro | | rosstaylor0