Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Solve a Lost GA Connection - can't get it to work
Hi, I followed all steps on https://moz.com/help/moz-pro/getting-started/troubleshoot-google-analytics?_ga=2.15852085.2007735045.1676157945-122376721.1676157945 below Solve a Lost GA Connection multiple times but I keep getting the message "Our connection to your Google Analytics account was lost. Don't worry, you won't lose any data. Please follow the steps in our guide to reconnect.
Moz Pro | | silvansoeters
" I am on the free trial and would obviously like to connect to GA to see how everything is working. Any help would be greatly appreciated. Thanks.0 -
Unsolved Getting keywords to rank on new landing pages
I've built new landing pages for a website, and have loaded them with researched keywords in the content, alt image attributes and metas etc - but, after a number of crawls, the keywords are currently being matched to other existing web pages on the website. Does anyone have any advice on 'unlatching' these keywords from these pages, and instead getting them to match with pages that have been optimised for them? Many thanks!
Moz Pro | | Darkstarr6660 -
How do i get the crawler going again?
The initial crawl only hit one page. Set up another campaign for another site and it crawled 260 pages. How can I get the crawler started up again or do I really have to wait a week ?
Moz Pro | | martJ0 -
Did I get paid for Jan or Feb ?
I started pro account in 18 jan. And I got an invoice in 18 feb. So am I supposed to get first month free ? Is this payment belongs to Jan or Feb ?
Moz Pro | | zizigo0 -
I need to get a page in the top 3 Google results for my keyword "teaching jobs" but am struggling to do so! Can anyone help?
I'm trying to get this page http://www.eteach.com/teaching-jobs to rank as the top search result on Google with the keyword "teaching jobs" but it seems to be number 5 in the results! My competitors are totally kicking my arse on getting this page to be above my website. I've got the keywords in there, I have the right content and I have links, what more can I do to make it rank as number 1! Help please!! If anyone has an SEO check list of things I need to make sure I do on my pages for them to rank in the top 3 results then that would be really handy!
Moz Pro | | Eteach_Marketing0 -
I get 401 Your authentication failed, my application has worked for 2 years.
I have an old link request application which i have restarted. I have regenerated the api key, however i get 401 Your authentication failed on request. The test signature works and returns my account no. Here is my request: http://lsapi.seomoz.com/linkscape/links/www.screwfix.com%2F?AccessID=member-12345678&Expires=1357316478&Signature=12345678910&Scope=page_to_subdomain&Sort=page_authority&SourceCols=133982846973 I have replaced my memberID and signature with dummy data to protect my credentials. Is my request string correct? Also where can i get the expiry parameter?
Moz Pro | | matth3w0 -
How do I get a MozRank?
Hi all, Hoping that one of you Guru's might be able to shed a little light for me please. we launched the online arm of our gold bullion business on the 21st of February and I signed up for an account here on the 23rd of Feb. I don't have a MozRank for my site yet and I'd love to get one. The mozbar that I installed shows o linkes from 0 root domains etc. but google webmaster can see links that are inbound to my site. My questions are: Do I have to wait the 45-60 days that I believe it might take SEOmoz to give me a rank- or is there a process that I manually kick off? Is there anything other than google webmaster that I should be looking at to try and make sure that I am on the right track; I'd hate to go 45-60 days in the wrong direction before realising there is an issue. thanks in advance, YGF
Moz Pro | | YGF0 -
What is the quickest way to get OSE data for many URLs all at once?
I have over 400 URLs in a spreadsheet and I would like to get Open Site Explorer data (domain/page authority/trust etc) for each URL. Would I use the Linkscape API to do this quickly (ie not manually entering every single site into OSE)? Or is there something in OSE or a tool I am overlooking? And whatever the best process is, can you give a brief overview? Thanks!! -Dan
Moz Pro | | evolvingSEO0