Is there a whitelist of the RogerBot IP Addresses?
-
I'm all for letting Roger crawl my site, but it's not uncommon for malicious spiders to spoof the User-Agent string. Having a whitelist of Roger's IP addresses would be immensely useful!
-
Samantha (of the Moz team) suggested I have my client whitelist Rogerbot - so you are saying simply whitelist Rogerbot as a useragent? Is there any other information I need to provide?
-
Gotcha thanks for the response, Aaron.
-
Hey Kalen! Rogerbot is the crawler we use to gather data on websites for Moz Analytics and the Mozscape link index. Here's his info: http://moz.com/help/pro/what-is-rogerbot-.
I wish I could give you IP addresses, but they change all the time since we host Roger in the cloud. There's not even a reliable range of IPs to give you. You can totally whitelist the useragent rogerbot, but that's the only reliable information about the crawler you can go off of. I hope that helps but let me know if there's any other solution you can think of. Thank you!
-
Hi Aaron,
I'm not totally sure what RogerBot is, but I was also interested in a list of IPs to white list. We just completed a search crawl and are checking out the Crawl Diagnostics. It's hit some 503 errors b/c it's triggering our DoS filter.
Is there a way to get the IP addresses behind this crawl in order to white list them?
Thanks,
Kalen -
Hey there Outside!
I totally understand your concerns, but unfortunately we don't have a static IP we can give you for Rogerbot. He's crawling from the cloud so his IP address changes all the time! As you know, you can allow him in Robots.txt but that's the only way to do it for now. We have a recent post about why this may be risky business: http://www.seomoz.org/blog/restricting-robot-access-for-improved-seo
Hope that helps!
-
Personally, I've run across spiders that search for entry points and exploits in common CMS, e-commerce, and CRM web applications. For example, there was a recent Wordpress bug that could be exploited to serve malicious content (read: virus) to visiting users.
Spoofing the User-Agent string is elementary at best, and wouldn't fool any sys admin worth a salt. All you have to do is a WHOIS on the requested IP to help identify it's origin.
I'm a bit of a data geek, so I like to grep through log files to see things that won't show up in Analytics that require Javascript.
-
Out of curiosity (and because I don't know), what is the advantage for a malicious spider to spoof the User-Agent string? I mean, I understand this hides its identity, but why does a spider need to hide its identity? And what can a malicious spider do that a browsing human can't do? I haven't taken any action to prevent robots from anything on my site. Should I?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New URL, new physical address, New Name. 30 point drop in Domain Authority. Yikes.
I have a client who is asking for SEO help after renaming their business, getting a new URL, and somehow having an address change (without moving to a new location...weird...I know). This has set them back big time in terms of their domain authority (they went from a 46 to a 15 in DA). The web developers they work with put a 302 redirect in place from their old URL (home page), which had 10,477 links from 52 root domains, to their new URL's home page. Open site explorer shows that they now have 5 links! We can improve some of the local search set backs from the name and address change with a citation audit and clean up, but the domain name change is a killer. So here's my question or questions, really: Do we need to manually rebuild links with partner websites? I know there is debate around the actual link juice passed along from a 302 vs a 301 redirect (despite what has been publicly stated by Google). Or is this just a waiting game while old links get recrawled?
Moz Pro | | TheKatzMeow1 -
Ajax4SEO and rogerbot crawling
Has anyone had any experience with seo4ajax.com and moz? The idea is that it points a bot to a html version of an ajax page (sounds good) without the need for ugly urls. However, I don't know how this will work with rogerbot and whether moz can crawl this. There's a section to add in specific user agents and I've added "rogerbot". Does anyone know if this will work or not? Otherwise, it's going to create some complications. I can't currently check as the site is in development and the dev version is noindexed currently. Thanks!
Moz Pro | | LeahHutcheon0 -
Rogerbot does not catch all existing 4XX Errors
Hi I experienced that Rogerbot after a new Crawl presents me new 4XX Errors, so why doesn't he tell me all at once? I have a small static site and had 9 crawls ago 10 4XX Errors, so I tried to fix them all.
Moz Pro | | inlinear
The next crawl Rogerbot fount still 5 Errors so I thought that I did not fix them all... but this happened now many times so that I checked before the latest crawl if I really fixed all the errors 101%. Today, although I really corrected 5 Errors, Rogerbot digs out 2 "new" Errors. So does Rogerbot not catch all the errors that have been on my site many weeks before? Pls see the screenshot how I was chasing the errors 😉 404.png0 -
Rogerbot did not crawl my site ! What might be the problem?
When I saw the new crawl for my site I wondered why there are no errors, no warning and 0 notices anymore. Then I saw that only 1 page was crawled. There are no Error Messages or webmasters Tools also did not report anything about crawling problems. What might be the problem? thanks for any tips!
Moz Pro | | inlinear
Holger rogerbot-did-not-crawl.PNG0 -
Should I be running my crawl on our www address or our non-www address?
I currently run our crawl on oursitename.com, but am wondering if it should be run on www.oursitename.com instead.
Moz Pro | | THMCC0 -
RogerBot does not respect some rules??
Hello; Every week when I see my stats I notice that RogerBot has crawled 10000 form my website, even pages with a no index or not allowed in the robots.txt. Is it possible to avoid him from crawling the these pages? They are form pages in my site, with are not indexed by google, they have a noindex and they are not allowed for crawling in the robots.txt. Thanks everyone for your help!!!
Moz Pro | | jgomes0 -
Can't find email address or contact form on website I want link from
Hi, How do you use whois or other resources to find an email address for a site that doesn't list their email address or have a contact form? Our competitors look like they have contacted the site but I don't know how to contact the site myself. They have a facebook fan page and a twitter account. Thanks.
Moz Pro | | BobGW0 -
IP Location at a glance - Moz toolbar
I am using the latest version of the moz toolbar in my firefox browser. The IP location feature is not working for me as demonstrated in the article. "You will now notice a new button with a flag that shows up in the toolbar as you surf the web. This new toolbar addition shows the country where this site is hosted. Click on the flag to see more details about the location and IP address. If you want to learn more, click on the IP to access WhoIs information." My flag icon is depressed and looks like a US flag, even when I browse European sites. Also, I never see the location nor IP address of other sites. Am I missing a step or is the tool not working correctly?
Moz Pro | | RyanKent0