Is there a whitelist of the RogerBot IP Addresses?
-
I'm all for letting Roger crawl my site, but it's not uncommon for malicious spiders to spoof the User-Agent string. Having a whitelist of Roger's IP addresses would be immensely useful!
-
Samantha (of the Moz team) suggested I have my client whitelist Rogerbot - so you are saying simply whitelist Rogerbot as a useragent? Is there any other information I need to provide?
-
Gotcha thanks for the response, Aaron.
-
Hey Kalen! Rogerbot is the crawler we use to gather data on websites for Moz Analytics and the Mozscape link index. Here's his info: http://moz.com/help/pro/what-is-rogerbot-.
I wish I could give you IP addresses, but they change all the time since we host Roger in the cloud. There's not even a reliable range of IPs to give you. You can totally whitelist the useragent rogerbot, but that's the only reliable information about the crawler you can go off of. I hope that helps but let me know if there's any other solution you can think of. Thank you!
-
Hi Aaron,
I'm not totally sure what RogerBot is, but I was also interested in a list of IPs to white list. We just completed a search crawl and are checking out the Crawl Diagnostics. It's hit some 503 errors b/c it's triggering our DoS filter.
Is there a way to get the IP addresses behind this crawl in order to white list them?
Thanks,
Kalen -
Hey there Outside!
I totally understand your concerns, but unfortunately we don't have a static IP we can give you for Rogerbot. He's crawling from the cloud so his IP address changes all the time! As you know, you can allow him in Robots.txt but that's the only way to do it for now. We have a recent post about why this may be risky business: http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
Hope that helps!
-
Personally, I've run across spiders that search for entry points and exploits in common CMS, e-commerce, and CRM web applications. For example, there was a recent Wordpress bug that could be exploited to serve malicious content (read: virus) to visiting users.
Spoofing the User-Agent string is elementary at best, and wouldn't fool any sys admin worth a salt. All you have to do is a WHOIS on the requested IP to help identify it's origin.
I'm a bit of a data geek, so I like to grep through log files to see things that won't show up in Analytics that require Javascript.
-
Out of curiosity (and because I don't know), what is the advantage for a malicious spider to spoof the User-Agent string? I mean, I understand this hides its identity, but why does a spider need to hide its identity? And what can a malicious spider do that a browsing human can't do? I haven't taken any action to prevent robots from anything on my site. Should I?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Restrict rogerbot for few days
Hi Team, I have a subdomain that built in Zendesk's CRM system. Now, I want to restrict Moz crawler (rogerbot) for crawling this complete subdomain for a few days, but I am not able to edit the robots.txt file of the subdomain, because this is a shared file and Zendesk is not allowing to edit it. Could you please let me know the alternative way to restrict rogerbot to crawl this subdomain? I am eagerly awaiting your quick response. Thanks
Moz Pro | | Adeptia0 -
IP TRACKING
Is there a 3rd party that can provide me with all the ips that have hit my site http://1-800-medigap.com ? Does the MOZ API have that ability? Does anything?
Moz Pro | | jdcline0 -
Its been over a month, rogerbot hasn't crawled the entire website yet. Any ideas?
Rogerbot has stopped crawling the website at 308 pages past week and has not crawled the website with over 1000+ pages. Any ideas on what I can do to get this fixed & crawling?
Moz Pro | | TejaswiNaidu0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
Does Rogerbot recognize rel="alternate" hreflang="x"?
Rogerbot just completed its first crawl and is reporting all kinds of duplicate content - both page content and meta title/description. The pages it is calling duplicate are used with rel="alternate" hreflang="x", but are still being labeled as dupes. The title and descriptions are usually exactly the same, so I am working on getting at least those translated into different languages. I think its getting tripped up because the product page its crawling are only in English, but the chrome of the site is in the translated languages. The URLs look like so: Original: site.com/product Detected duplicates: site.com/fr/product, site.com/de/product, site.com/zh-hans/product
Moz Pro | | sedwards0 -
Does Rogerbot respect the robots.txt file for wildcards?
Hi All, Our robots.txt file has wildcards in it, which Googlebot recognizes. Can anyone tell me whether or not Rogerbot recognizes wildcards in the robots.txt file? We've done a Rogerbot site crawl since updating the robots.txt file and the pages that are set to disallow using the wildcards are still showing. BTW, Googlebot is not crawling these pages according to Webmaster Tools. Thanks in advance, Robert
Moz Pro | | AC_Pro0 -
Report notification emails (SEOmoz PRO Application ) --- Is it possible to customise the sender email address and/or email signature
The ranking reports are great but is there any way of changing the sender email address to be our own, and have our signature appearing at the bottom of the email notifications? We'd really like to send them direct to our customers and save some time double handling.
Moz Pro | | Fatpublisher0 -
Does anyone know of a crawler similar to SEOmoz's RogerBot?
As you probably know SEOmoz had some hosting and server issues recently, and this came at a terrible time for me... We are in the middle of battling some duplicate content and crawl errors and need to get a fresh crawl of some sites to test things out before we are hit with the big one? Before I get a million thumbs downs- I love and will continue to use SEOmoz, just need something to get me through this week ( or until Roger is back! )!
Moz Pro | | AaronSchinke1