Why is Roger crawling pages that are disallowed in my robots.txt file?
-
I have specified the following in my robots.txt file:
Disallow: /catalog/product_compare/
Yet Roger is crawling these pages = 1,357 errors.
Is this a bug or am I missing something in my robots.txt file?
Here's one of the URLs that Roger pulled:
<colgroup><col width="312"></colgroup>
|Please let me know if my problem is in robots.txt or if Roger spaced this one. Thanks!
|
-
Digging in further I discovered that rogerbot had blocked a portion of these URL variations, but 2/3 slipped through. I sent an email to support. Thanks for the suggestion.
-
Digging back through the Q&A... I'm several posts reporting this sort of thing.
http://www.seomoz.org/dp/rogerbot
Perhaps you could try specifically blocking rogerbot? If that doesn't work, an email to the SEOmoz team may do the trick
-
Yes, blocking all --> *
-
Have you specified a User-Agent?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keyword Stuffing - MOZ On-Page Grader
We sell a great number of insulation products, many of which are produced by individual manufacturers. On the page identified below the Keyword "Kingspan" is repeated numerous times as these items are included in our online shop. However, the many mentions of Kingspan are recorded in the HTML5 Source Code, rather than an external database. When I used the MOZ On-Page Grader, using the keywords "Kingspan" I was surprised to achieve an "A" Grade! I know I shouldn't be complaining, but I am wondering why the significant repetition of the word "Kingspan" has not negatively impacted my score? http://www.just-insulation.com/001-eshop/buy-kingspan-thermapitch-thermawall-thermafloor-insulation-boards.html
Moz Pro | | JustInsulation0 -
Crawl Diagnostics Summary Problem
We added our website a Robots.txt file and there are pages blocked by robots.txt. Crawl Diagnostics Summary page shows there is no page blocked by Robots.txt. Why?
Moz Pro | | iskq0 -
Not all pages are being crawled
I am set up on the PRO plan, I was under the impression that it would crawl up to 10,000 pages. My site has just over 200 pages, but whenever I am crawled it only crawls 121 pages. Is this normal? It's hard to know how reliable my data is because a significant amount of pages are missing.
Moz Pro | | KristinHarding0 -
Crawl credits how to buy more?
Just wondering if there is a way of increasing, my 2 crawl credits per day limit?
Moz Pro | | aussieseoguy0 -
SEOMOZ Crawl Test
Guys I really have an issue that i know have but cannot see if that makes sense. Basically 3 months ago i did a site wide 301 from economyleasinguk.co.uk to www.economy-car-leasing.co.uk Every thing looks good get all the correct header responses , all canonicals work perfectly , Google webmaster tools is updated fetch as google bot shows the old site is 301 I tried the seomoz crawl test today on the old domain and got this message Oh no! Looks like the page you were trying to access is temporarily down which at first thought ok because the site was not there it wont do it on an old 301 domain, however i tried it on a domain i know has just been 301'd and i got this message The URL http://www.site1.com/ redirects to http://site2.com/. Do you want to crawl http://site2.com/ instead?
Moz Pro | | kellymandingo
Would you like to:
Continue with www.site1.com
Continue with site2.com I really do not know what to do, its either the redirect script is missing something however its doing what it should or the server is a problem but again its doing what it should so why would SEOMOZ not be able to crawl the old URL like it example site above. Now the strange thing is Open Site Explorer does see the 301 and asks if i want to check the new URL instead Ps the redirect is done using PHP redirect which i am asking him to change to a htaccess as its now on a apache server and was wondering if this could be an issue, all pages go to correct pages as requested Thanks in Advance1 -
Crawl test tool from SEOmoz - which URLs does it actually crawl?
I am using for the first time the crawl test tool from SEOmoz and I do not really understand which URLs the tool is going to crawl. First, it says "enter any subdomain" --> why can´t I do the crawl for the root domain? Second it says "we'll crawl up to 3,000 linked-to pages" --> does that mean that the tool crawls all internal links that it can find on the given domain? Thanks for your help!
Moz Pro | | Elke.GetApp0 -
Rogerbot Ignoring Robots.txt?
Hi guys, We're trying to block Rogerbot from spending 8000-9000 of our 10000 pages per week for our site crawl on our zillions of PhotoGallery.asp pages. Unfortunately our e-commerce CMS isn't tremendously flexible so the only way we believe we can block rogerbot is in our robots.txt file. Rogerbot keeps crawling all these PhotoGallery.asp pages so it's making our crawl diagnostics really useless. I've contacted the SEOMoz support staff and they claim the problem is on our side. This is the robots.txt we are using: User-agent: rogerbot Disallow:/PhotoGallery.asp Disallow:/pindex.asp Disallow:/help.asp Disallow:/kb.asp Disallow:/ReviewNew.asp User-agent: * Disallow:/cgi-bin/ Disallow:/myaccount.asp Disallow:/WishList.asp Disallow:/CFreeDiamondSearch.asp Disallow:/DiamondDetails.asp Disallow:/ShoppingCart.asp Disallow:/one-page-checkout.asp Sitemap: http://store.jrdunn.com/sitemap.xml For some reason the Wysiwyg edit is entering extra spaces but those are all single spaced. Any suggestions? The only other thing I thought of to try is to something like "Disallow:/PhotoGallery.asp*" with a wildcard.
Moz Pro | | kellydallen0 -
Page and Domain Authority and other bits
Hi, I am in the process of finding blogs to have a few articles published with a couple of links in each. Articles will all be unique and relevant to the link I drop in and relevant in someway to the reader However I have a few questions. My site is a designer menswear site, so I have picked fashion and sports sites first and foremost to have the articles published. Now, I have a guy who owns about 30 different websites. 2 of them are sports based and about 10 are fashion based. Around $10-$15 an article. I have ran them all through the Open Site Explorer Tool and picked out the best ranked ones. Now my problem is, how do I know if its a good site to not only list an article on, but to pay for it as well. The sites page ranks are around the 30-45 range, the domains are around the 35-45 range. What is a good range to have? I know the higher the better but is 30-45 good enough to pay for? (I don't mind paying the $10 each (£7 my money) for each one) Also as he is quoted me in dollars, I assume there all USA based, so majority of users are USA based. Well I am UK based and only ship to the UK. Will this matter as much if I am trying to gain backlinks? Obviously a UK based site, would be ideal, but is it a case of getting more external links on the web for Google to find, as long as they are relevant to the user? Any help would be great. Thanks Will
Moz Pro | | WillBlackburn0