Is there a whitelist of the RogerBot IP Addresses?
-
I'm all for letting Roger crawl my site, but it's not uncommon for malicious spiders to spoof the User-Agent string. Having a whitelist of Roger's IP addresses would be immensely useful!
-
Samantha (of the Moz team) suggested I have my client whitelist Rogerbot - so you are saying simply whitelist Rogerbot as a useragent? Is there any other information I need to provide?
-
Gotcha thanks for the response, Aaron.
-
Hey Kalen! Rogerbot is the crawler we use to gather data on websites for Moz Analytics and the Mozscape link index. Here's his info: http://moz.com/help/pro/what-is-rogerbot-.
I wish I could give you IP addresses, but they change all the time since we host Roger in the cloud. There's not even a reliable range of IPs to give you. You can totally whitelist the useragent rogerbot, but that's the only reliable information about the crawler you can go off of. I hope that helps but let me know if there's any other solution you can think of. Thank you!
-
Hi Aaron,
I'm not totally sure what RogerBot is, but I was also interested in a list of IPs to white list. We just completed a search crawl and are checking out the Crawl Diagnostics. It's hit some 503 errors b/c it's triggering our DoS filter.
Is there a way to get the IP addresses behind this crawl in order to white list them?
Thanks,
Kalen -
Hey there Outside!
I totally understand your concerns, but unfortunately we don't have a static IP we can give you for Rogerbot. He's crawling from the cloud so his IP address changes all the time! As you know, you can allow him in Robots.txt but that's the only way to do it for now. We have a recent post about why this may be risky business: http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
Hope that helps!
-
Personally, I've run across spiders that search for entry points and exploits in common CMS, e-commerce, and CRM web applications. For example, there was a recent Wordpress bug that could be exploited to serve malicious content (read: virus) to visiting users.
Spoofing the User-Agent string is elementary at best, and wouldn't fool any sys admin worth a salt. All you have to do is a WHOIS on the requested IP to help identify it's origin.
I'm a bit of a data geek, so I like to grep through log files to see things that won't show up in Analytics that require Javascript.
-
Out of curiosity (and because I don't know), what is the advantage for a malicious spider to spoof the User-Agent string? I mean, I understand this hides its identity, but why does a spider need to hide its identity? And what can a malicious spider do that a browsing human can't do? I haven't taken any action to prevent robots from anything on my site. Should I?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Restrict rogerbot for few days
Hi Team, I have a subdomain that built in Zendesk's CRM system. Now, I want to restrict Moz crawler (rogerbot) for crawling this complete subdomain for a few days, but I am not able to edit the robots.txt file of the subdomain, because this is a shared file and Zendesk is not allowing to edit it. Could you please let me know the alternative way to restrict rogerbot to crawl this subdomain? I am eagerly awaiting your quick response. Thanks
Moz Pro | | Adeptia0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
Does Rogerbot recognize rel="alternate" hreflang="x"?
Rogerbot just completed its first crawl and is reporting all kinds of duplicate content - both page content and meta title/description. The pages it is calling duplicate are used with rel="alternate" hreflang="x", but are still being labeled as dupes. The title and descriptions are usually exactly the same, so I am working on getting at least those translated into different languages. I think its getting tripped up because the product page its crawling are only in English, but the chrome of the site is in the translated languages. The URLs look like so: Original: site.com/product Detected duplicates: site.com/fr/product, site.com/de/product, site.com/zh-hans/product
Moz Pro | | sedwards0 -
SEO Web Crawler IP addresses
What are the IP addresses for the SEO Web Crawler? There is a firewall on my clients website before it goes live, I would like to crawl the site before it goes live, but need to provide the web crawlers IP addreses. Thank you for your time
Moz Pro | | sfchronicle1 -
RogerBot does not respect some rules??
Hello; Every week when I see my stats I notice that RogerBot has crawled 10000 form my website, even pages with a no index or not allowed in the robots.txt. Is it possible to avoid him from crawling the these pages? They are form pages in my site, with are not indexed by google, they have a noindex and they are not allowed for crawling in the robots.txt. Thanks everyone for your help!!!
Moz Pro | | jgomes0 -
Blocking all robots except rogerbot
I'm in the process of working with a site under development and wish to run the SEOmoz crawl test before we launch it publicly. Unfortunately rogerbot is reluctant to crawl the site. I've set my robots.txt to disallow all bots besides rogerbot. Currently looks like this: User-agent: * Disallow: / User-agent: rogerbot Disallow: All pages within the site are meta tagged index,follow. Crawl report says: Search Engine blocked by robots.txt Yes Am I missing something here?
Moz Pro | | ignician0 -
Can't find email address or contact form on website I want link from
Hi, How do you use whois or other resources to find an email address for a site that doesn't list their email address or have a contact form? Our competitors look like they have contacted the site but I don't know how to contact the site myself. They have a facebook fan page and a twitter account. Thanks.
Moz Pro | | BobGW0 -
Is there a recommended format when placing a business address on a webpage?
Hi All, I ask the question as I was trying to GeoTarget tool which happened to not recognise a business address I place in the footer on one of my sites. The tool states that including the address on page helps the search engines identify your location, so I'm curious whether a specific format works best when optimizing for local search? Thanks.
Moz Pro | | davebrown19750