Is there a whitelist of the RogerBot IP Addresses?
-
I'm all for letting Roger crawl my site, but it's not uncommon for malicious spiders to spoof the User-Agent string. Having a whitelist of Roger's IP addresses would be immensely useful!
-
Samantha (of the Moz team) suggested I have my client whitelist Rogerbot - so you are saying simply whitelist Rogerbot as a useragent? Is there any other information I need to provide?
-
Gotcha thanks for the response, Aaron.
-
Hey Kalen! Rogerbot is the crawler we use to gather data on websites for Moz Analytics and the Mozscape link index. Here's his info: http://moz.com/help/pro/what-is-rogerbot-.
I wish I could give you IP addresses, but they change all the time since we host Roger in the cloud. There's not even a reliable range of IPs to give you. You can totally whitelist the useragent rogerbot, but that's the only reliable information about the crawler you can go off of. I hope that helps but let me know if there's any other solution you can think of. Thank you!
-
Hi Aaron,
I'm not totally sure what RogerBot is, but I was also interested in a list of IPs to white list. We just completed a search crawl and are checking out the Crawl Diagnostics. It's hit some 503 errors b/c it's triggering our DoS filter.
Is there a way to get the IP addresses behind this crawl in order to white list them?
Thanks,
Kalen -
Hey there Outside!
I totally understand your concerns, but unfortunately we don't have a static IP we can give you for Rogerbot. He's crawling from the cloud so his IP address changes all the time! As you know, you can allow him in Robots.txt but that's the only way to do it for now. We have a recent post about why this may be risky business: http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
Hope that helps!
-
Personally, I've run across spiders that search for entry points and exploits in common CMS, e-commerce, and CRM web applications. For example, there was a recent Wordpress bug that could be exploited to serve malicious content (read: virus) to visiting users.
Spoofing the User-Agent string is elementary at best, and wouldn't fool any sys admin worth a salt. All you have to do is a WHOIS on the requested IP to help identify it's origin.
I'm a bit of a data geek, so I like to grep through log files to see things that won't show up in Analytics that require Javascript.
-
Out of curiosity (and because I don't know), what is the advantage for a malicious spider to spoof the User-Agent string? I mean, I understand this hides its identity, but why does a spider need to hide its identity? And what can a malicious spider do that a browsing human can't do? I haven't taken any action to prevent robots from anything on my site. Should I?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved ip address
hi, im currently running my web without domain, can i use moz with my web ip address. thanks
Moz Pro | | ikuraa0 -
Rogerbot did not crawl my site ! What might be the problem?
When I saw the new crawl for my site I wondered why there are no errors, no warning and 0 notices anymore. Then I saw that only 1 page was crawled. There are no Error Messages or webmasters Tools also did not report anything about crawling problems. What might be the problem? thanks for any tips!
Moz Pro | | inlinear
Holger rogerbot-did-not-crawl.PNG0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
SEO Web Crawler IP addresses
What are the IP addresses for the SEO Web Crawler? There is a firewall on my clients website before it goes live, I would like to crawl the site before it goes live, but need to provide the web crawlers IP addreses. Thank you for your time
Moz Pro | | sfchronicle1 -
Rogerbot getting cheeky?
Hi SeoMoz, From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20. I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below). I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO. example 1
Moz Pro | | BM7
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441 example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659 Please advise.1 -
Rogerbot not showing in logs
Hi All Rogerbot has recently thrown up 403 errors for all our pages - no changes had been made to the site so I asked our ISP for assistance. They wanted to have a look at what rogerbot was doing and so went to the logs but rogerbot was not listed anywhere in the logs by name - any ideas why? Regards Craig
Moz Pro | | CraigWiltshire0 -
Is having the company's address in the footer (or header) of each webpage important for SEO?
Is having the company/office's address in the footer (or header) of each webpage important for SEO? Your page for the Geotarget tool says that having the address in this element helps search engines find your location. My question is, how important or relevant is this to SEO? How does knowing the address influence SEO? Is it best SEO practice to put the address in the footer of every webpage? http://www.seomoz.org/geotarget
Moz Pro | | richardstrange0 -
IP Location at a glance - Moz toolbar
I am using the latest version of the moz toolbar in my firefox browser. The IP location feature is not working for me as demonstrated in the article. "You will now notice a new button with a flag that shows up in the toolbar as you surf the web. This new toolbar addition shows the country where this site is hosted. Click on the flag to see more details about the location and IP address. If you want to learn more, click on the IP to access WhoIs information." My flag icon is depressed and looks like a US flag, even when I browse European sites. Also, I never see the location nor IP address of other sites. Am I missing a step or is the tool not working correctly?
Moz Pro | | RyanKent0