Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'VE DONE EVERYTHING RIGHT BUT STILL GET LOW GOOGLE RANKING
Moz Pro Dashboard compared my site to three competitors' site. My site is better in every aspect. My DA is higher, I have four times more external links, I have more relevant content than my competition, my site also is the fastest, I have solved all redirect issues, there is no broken link internally or externally. We are custom home builders. So all our sites have numerous full screen images. I have optimized all of them. When I ran WebPageTest on my competition, they all fail. My site got A's in all categories. The three competitors' sites rank first, second and third. My site ranks bottom on the 2nd page. Since my site was online last January, I have spent thousands of dollars on SEO work. Its url is www.pokudesign.com I am hoping somebody can point me in the right direction. I am attaching a screen shot of my Moz Pro Dashboard. frEpZ
Moz Pro | | pokongku0 -
How i can get the audit report (using tools etc.) immediately for any website?
I want to audit websites in few minutes/ hours (tools/ free tools, technics). Reports should be authentic and provide full site health report i.e.404 errors, duplicate page content/title, missing meta tags etc. Kindly suggest.
Moz Pro | | 1akal0 -
Ajax4SEO and rogerbot crawling
Has anyone had any experience with seo4ajax.com and moz? The idea is that it points a bot to a html version of an ajax page (sounds good) without the need for ugly urls. However, I don't know how this will work with rogerbot and whether moz can crawl this. There's a section to add in specific user agents and I've added "rogerbot". Does anyone know if this will work or not? Otherwise, it's going to create some complications. I can't currently check as the site is in development and the dev version is noindexed currently. Thanks!
Moz Pro | | LeahHutcheon0 -
Can't get Page Analysis for homepage - any ideas?
Using seomoz toolbar and can't get Page Analysis for homepage - it comes up blank for "page elements", "page attributes" and "link data". Any help would be great. Cheers, Don
Moz Pro | | nositeleftbehind0 -
URLs getting re-directed to double http:// URLs
The "Notices" section under "Crawl Diagnostics" shows that there are 435 issues on my website. I checked out a few URLs to verify this issue and found that most of these pages are working perfectly. For instance, the above mentioned report shows that http://policycomplaints.com/about redirects to http://http://policycomplaints.com/about/ . Then, http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ redirects to http://http://policycomplaints.com/aegon-religare/mis-selling-of-policy-by-aegon-religare/ . However, when I open these pages, they seem to be working perfectly. I didn't find them getting re-directed to somewhere else. So, as per the report, it seems that all of these 435 http://URLs are getting re-directed to http://http://URL versions which in reality is not true because all the http://URLs are working perfectly. So, is this a problem with SEOmoz software? If not, what is the reason for these issues and how can I adddress them. Do notify if any further information is required for the same. Thanks. bNiEm.png
Moz Pro | | unknownID10 -
Is the Linkscape Index getting updated?
I know it only gets a big refresh once a month...The last being in the early part of January. But I released a map that went viral (front page of CNET, Wired, Scientific American) that has gotten me hundreds of links from different domains: https://www.google.com/search?q=perceptionbuilder.com+map I got these links in early to mid December and OSE is showing no data at all. Similarly, I have been doing some linkbuilding for clients and links that I got in October and November and those links aren't showing up either. Anyone else experiencing this? Many thanks, Matt
Moz Pro | | coppersix0 -
How do I get the SERP overlay tool to work?
I have the SEOMoz toolbar installed. In the settings I have a tick next to Display SERP Overlay. When I first activated this it showed up but with no data just continually searching for a long time. Now it is not showing up at all. This is such a great tool, how can I get it working? Thanks, Daniel.
Moz Pro | | iSenseWebSolutions0 -
Can you help me get started using the crawl diagnostics report?
After getting the crawl diagnostics report for the first time my boss and I looked over it and we have tried to fix the problems but we are stumped.I have tried and watched videos , read books, etc.. but have found nothing to help. I need assistance getting started on improving my website. Can you help?
Moz Pro | | WVInjuryLawyer0