Why is Roger crawling pages that are disallowed in my robots.txt file?
-
I have specified the following in my robots.txt file:
Disallow: /catalog/product_compare/
Yet Roger is crawling these pages = 1,357 errors.
Is this a bug or am I missing something in my robots.txt file?
Here's one of the URLs that Roger pulled:
<colgroup><col width="312"></colgroup>
|Please let me know if my problem is in robots.txt or if Roger spaced this one. Thanks!
|
-
Digging in further I discovered that rogerbot had blocked a portion of these URL variations, but 2/3 slipped through. I sent an email to support. Thanks for the suggestion.
-
Digging back through the Q&A... I'm several posts reporting this sort of thing.
http://www.seomoz.org/dp/rogerbot
Perhaps you could try specifically blocking rogerbot? If that doesn't work, an email to the SEOmoz team may do the trick
-
Yes, blocking all --> *
-
Have you specified a User-Agent?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawl 4xx Errors?
Hello! When I check our website's critical crawler issues with Moz Site Crawler, I'm seeing over 1000 pages with a 4xx error. All of the pages that are showing to have a 4xx error appear to be the brand and product pages we have on our website, but with /URL at the end of each permalink. For example, we have a page on our site for a brand called Davinci. The URL is https://kannakart.com/davinci/. In the site crawler, I'm seeing the 4xx for this URL: https://kannakart.com/davinci/URL. Could this be a plugin on our site that is generating these URLs? If they're going to be an issue, I'd like to remove them. However, I'm not sure exactly where to begin. Thanks in advance for the help, -Andrew
Moz Pro | | mostcg0 -
Duplicate Pages
Hello, we have an issue which I'm hoping someone can help with. Our Moz system is saying that this page http://www.indigolittle.com/fees/ Is a duplicate page. We use this page purely for mobiles and we have added code to say This has been on for over a month now however Moz is still picking the page us as a High Priority Issue.
Moz Pro | | popcreativeltd0 -
Duplicate page content and title
hi,
Moz Pro | | solutionforweb
i have a serious issue with my site. my website contains 21 pages. but during my weekly report, moz found 84 errors. i.e 42 errors in duplicate page content and 42 errors in duplicate page title... when i see the error in details.. all my 21 links are displaying twice. for example http://domain.com/
http://domain.com/page1.html
http://domain.com/page2.html
and
http://www.domain.com/
http://www.domain.com/page1.html
http://www.domain.com/page2.html so, the same link is repeating twice with www and without www. how do i resolve this error? so please kindly anyone help me....0 -
Unable to crawl pages
Hi, I am trying to set up a campaign for our website - www.salvationarmy.org.au however, I can't seem to get a scan of more than three pages. I have tried the following: www.salvationarmy.org.au (only 2 pages) www.salvationarmy.org.au/home (only 1 page) salvationarmy.org.au (only 3 pages) There is a geo IP redirect on www.salvationarmy.org.au but the second domain listed above should resolve the full site. I'm a newbie to SEOmoz so any help would be appreciated! Thanks, Mel
Moz Pro | | KingPings0 -
CSV file messed up
I cannot convert my exported CSV file to a proper excel sheet. The data is mixed, so converting doesn't work. Some rows have all data in the first cell (column), some rows have data in first AND second cell.. Anyone a solution? yaIKsZz
Moz Pro | | nans2 -
Campaign Crawl Report
Hello, Just a quicky, is there anyway I can do a crawl report for something in a campaign so I can compare the changes? I know you can do a separate crawl test, but it wont show the differences,and the next crawl date isnt untill the 28th.
Moz Pro | | Prestige-SEO0 -
Robots review
Anything in this that would have caused Rogerbot to stop indexing my site? It only saw 34 of 5000+ pages on the last pass. It had no problems seeing the whole site before. User-agent: Rogerbot Disallow: /default.aspx?*
Moz Pro | | sprynewmedia
//Keep from crawling the CMS urls default.aspx?Tabid=234. Real home page is home.aspx Disallow: /ctl/
// Keep from indexing the admin controls Disallow: ArticleAdmin
// Keep from indexing article admin page Disallow: articleadmin
// same in lower case Disallow: /images/
// Keep from indexing CMS images Disallow: captcha
// keep from indexing the captcha image which appears to be a page to crawls. general rules lacking wildcards User-agent: * Disallow: /default.aspx Disallow: /images/ Disallow: /DesktopModules/DnnForge - NewsArticles/Controls/ImageChallenge.captcha.aspx0 -
2nd Crawl taking too long?
Hi, I've added a campaign to my account with the first crawl taking around a week. The 2nd crawl started 3days 17 hours ago and si still running. Is this something that others have experienced? The campaign is tracking 5 keywords and have 17 pages on the site. Steve
Moz Pro | | stevecounsell0