Omega8.cc decided to block rogerbot

JayShoe

My host decided to block rogerbot because "it's too agreessive... and doesn't follow the Crawl-limit... so we blocked them". And now I can't get crawl reports on my site. Any advice?

MattRoney

Your best bet may actually be to run a Crawl Test using this tool. It'll crawl up to 3000 pages of any given domain, and report back in much the same way as the weekly crawl. That one should still run as scheduled.

JayShoe

Hello Matt,

Thank you for your help with this. My host didn't have an old log stored. I've convinced my host to unblock rogerbot and if he sees funny business he's going to send me the log. I dont' know if asking you to trigger a crawl on my sites would be a good or bad idea at this point. But I'd like my reports.

Jay

MattRoney

Hi Jay,

I definitely sympathize, and I'm sorry you're dealing with this. I'm aware that there is a small subset of hosts that feels our crawlers are too aggressive, and that yours isn't the only one. (As it happens, though, both of those posts are regarding Dotbot, the Mozscape Index crawler, and the first response to the Q&A post indicates that the host has no issues with Rogerbot.)

Our challenge in this area centers around our need to accommodate a vast and diverse customer base. We have customers with sites spanning millions of pages, and we're obligated to meet our service-level agreement with them to provide data in a timely manner. As it is, many of our larger customers receive crawl updates only once per month in order to prevent Rogerbot from having to crawl too aggressively. We've found that the rate at which Rogerbot crawls is acceptable to the vast majority of hosts, and that the few who would prefer a less-aggressive crawl are almost always willing to apply a crawl limit.

This is especially true given that Rogerbot only crawls sites on-demand, either as part of an ongoing Moz campaign or in a Crawl Test. Since having a site crawled by Rogerbot is voluntary, it generally falls on that few to adjust their crawl limits accordingly. We simply can't adjust our crawl rate to suit the requirements of that few. This is the case with all of our competitors, as well.

That said, I'd still love to show a server log from one of those old crawls to our engineering team. If something _is _amiss with our crawler, we absolutely want to make sure it's addressed. I understand you've been in touch with our Help team, so you can go ahead and send it over to them.

JayShoe

Hi Matt,

My host isn't the only one complaining about rogerbot on the internet (another rogerbot blockage report). And here is my host's (presumably last) response...

We have blocked this [explicit removed] bot on all our machines in all datacenters and it was not related to your sites at all, but to their bot's [explicit removed] behaviour we have experienced on many systems.

We could consider removing it from the global blacklist only if they will adjust this on their end so it will not require any special settings in the robots.txt

Again, it is their job to fix their bot behaviour and not our job to keep fighting to stop the bot from too aggressive crawling per site -- it is an insane proposition from their end we simply can't accept.

Configuring the crawler to use some sane intervals is unbelievably simple, if they really want to fix this on their end. And they don't use any intervals at all, they just flood the servers like crazy. Just stop flooding the server like crazy and we can then unblock them.

Kind Regards,

That sucks because if you two can't resolve your issues and get along with eachother, I'll probably have to break up with Moz and I love Moz. Hopefully there is a good alternative...

Jay

MattRoney

I'm sorry that Rogerbot crawls too aggressively for your host. It's designed to crawl as aggressively as necessary in order to complete the crawl in a reasonable amount of time to keep you from having to wait too long for your campaign data.

Since Rogerbot was already blocked by the time the new robots.txt was implemented, you may want to see if your host would un-ban the crawler and test whether it follows the limit.

All that said, if Rogerbot _is _crawling in a way we don't intend, we'd like to check it out. If there's any way you could send me a server log from one of those old crawls we could investigate.

JayShoe

My host feels like we shouldn't have to specify the crawl limit and the bot should automatically respect a reasonable crawl speed without being asked to do so. My host also thinks that Rogerbot doesn't respect the specified crawl-limit, for whatever reason (I can't say whether he is right or wrong). They blocked the crawler a few weeks ago when i realized none of my crawls were working.

The robots.txt file on my sites were set to something strange, it was my home page, for a while. Then when the crawler stopped crawling I started digging, and rebuilt the robots.txt. Now it's set to the default robots.txt. I didn't notice anything about user-agent: Rogerbot, but somehow my host blocked him.

Edit: I also note that my robots.txt already has a User-agent: * Crawl-delay: 10

Something that I should note is that I'm a Drupal fan because it allows me to create sites that are very content centric and allows me creative freedom in content oriented designs and layouts that I otherwise never had with Joomla or Wordpress (I admit, I could use some brushup on my graphics skills). Omega8.cc is a great host for drupal because it's running on the Aegir platform which was designed for drupal. Moving back to a cpanel host would be less than ideal.

MattRoney

It's never really been a problem, and Rogerbot SHOULD respect crawl limit. May I ask, when did your host block the crawler?

EDIT: Also, it looks like the robots.txt file was massively changed pretty recently on at least one of your campaign sites. Do you know when that was? Feel free to private message me.

JayShoe

Hello and thanks for the responses. They are complaining that Rogerbot is too aggressive. I've seen other complaints online about the same issue with Rogerbot doing too many page requests. What's the chances my host is right? Does Rogerbot need a chill pill?

MattRoney

I'm afraid Jonathan is right—if Rogerbot is the crawler that gathers data for the crawl reports, so if he can't crawl your site you won't get crawl data. Your only other option would be to use a crawl tool other than Moz.

TheWebMastercom

I would suggest you find a new host. I know that is the obvious answer, but probably the easiest. If you have cPanel, many hosts will transfer everything across for you.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Omega8.cc decided to block rogerbot

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Rogerbot will not crawl my site! Site URL is https but keep getting and error that homepage (http) can not be accessed. I set up a second campaign to alter the target url to the newer https version but still getting the same error! What can I do?

Why RogerBot can't crawl site https://unplag.com

Moz reporting for C-Blocking

Is it possible for a wordpress theme to block MozBar in Chrome?

How do you block keywords in On-Page Grader for certain URLs?

We Launched a new site and Rogerbot is still reporting on links/errors from the old site, is there a way to clear those out?

Will robots.txt override a Firewall for Rogerbot?

Where's Rogerbot?