Omega8.cc decided to block rogerbot
-
My host decided to block rogerbot because "it's too agreessive... and doesn't follow the Crawl-limit... so we blocked them". And now I can't get crawl reports on my site. Any advice?
-
Your best bet may actually be to run a Crawl Test using this tool. It'll crawl up to 3000 pages of any given domain, and report back in much the same way as the weekly crawl. That one should still run as scheduled.
-
Hello Matt,
Thank you for your help with this. My host didn't have an old log stored. I've convinced my host to unblock rogerbot and if he sees funny business he's going to send me the log. I dont' know if asking you to trigger a crawl on my sites would be a good or bad idea at this point. But I'd like my reports.
Jay
-
Hi Jay,
I definitely sympathize, and I'm sorry you're dealing with this. I'm aware that there is a small subset of hosts that feels our crawlers are too aggressive, and that yours isn't the only one. (As it happens, though, both of those posts are regarding Dotbot, the Mozscape Index crawler, and the first response to the Q&A post indicates that the host has no issues with Rogerbot.)
Our challenge in this area centers around our need to accommodate a vast and diverse customer base. We have customers with sites spanning millions of pages, and we're obligated to meet our service-level agreement with them to provide data in a timely manner. As it is, many of our larger customers receive crawl updates only once per month in order to prevent Rogerbot from having to crawl too aggressively. We've found that the rate at which Rogerbot crawls is acceptable to the vast majority of hosts, and that the few who would prefer a less-aggressive crawl are almost always willing to apply a crawl limit.
This is especially true given that Rogerbot only crawls sites on-demand, either as part of an ongoing Moz campaign or in a Crawl Test. Since having a site crawled by Rogerbot is voluntary, it generally falls on that few to adjust their crawl limits accordingly. We simply can't adjust our crawl rate to suit the requirements of that few. This is the case with all of our competitors, as well.
That said, I'd still love to show a server log from one of those old crawls to our engineering team. If something _is _amiss with our crawler, we absolutely want to make sure it's addressed. I understand you've been in touch with our Help team, so you can go ahead and send it over to them.
-
Hi Matt,
My host isn't the only one complaining about rogerbot on the internet (another rogerbot blockage report). And here is my host's (presumably last) response...
We have blocked this [explicit removed] bot on all our machines in all datacenters and it was not related to your sites at all, but to their bot's [explicit removed] behaviour we have experienced on many systems.
We could consider removing it from the global blacklist only if they will adjust this on their end so it will not require any special settings in the robots.txt
Again, it is their job to fix their bot behaviour and not our job to keep fighting to stop the bot from too aggressive crawling per site -- it is an insane proposition from their end we simply can't accept.
Configuring the crawler to use some sane intervals is unbelievably simple, if they really want to fix this on their end. And they don't use any intervals at all, they just flood the servers like crazy. Just stop flooding the server like crazy and we can then unblock them.
Kind Regards,
That sucks because if you two can't resolve your issues and get along with eachother, I'll probably have to break up with Moz and I love Moz. Hopefully there is a good alternative...
Jay
-
I'm sorry that Rogerbot crawls too aggressively for your host. It's designed to crawl as aggressively as necessary in order to complete the crawl in a reasonable amount of time to keep you from having to wait too long for your campaign data.
Since Rogerbot was already blocked by the time the new robots.txt was implemented, you may want to see if your host would un-ban the crawler and test whether it follows the limit.
All that said, if Rogerbot _is _crawling in a way we don't intend, we'd like to check it out. If there's any way you could send me a server log from one of those old crawls we could investigate.
-
My host feels like we shouldn't have to specify the crawl limit and the bot should automatically respect a reasonable crawl speed without being asked to do so. My host also thinks that Rogerbot doesn't respect the specified crawl-limit, for whatever reason (I can't say whether he is right or wrong). They blocked the crawler a few weeks ago when i realized none of my crawls were working.
The robots.txt file on my sites were set to something strange, it was my home page, for a while. Then when the crawler stopped crawling I started digging, and rebuilt the robots.txt. Now it's set to the default robots.txt. I didn't notice anything about user-agent: Rogerbot, but somehow my host blocked him.
Edit: I also note that my robots.txt already has a User-agent: * Crawl-delay: 10
Something that I should note is that I'm a Drupal fan because it allows me to create sites that are very content centric and allows me creative freedom in content oriented designs and layouts that I otherwise never had with Joomla or Wordpress (I admit, I could use some brushup on my graphics skills). Omega8.cc is a great host for drupal because it's running on the Aegir platform which was designed for drupal. Moving back to a cpanel host would be less than ideal.
-
It's never really been a problem, and Rogerbot SHOULD respect crawl limit. May I ask, when did your host block the crawler?
EDIT: Also, it looks like the robots.txt file was massively changed pretty recently on at least one of your campaign sites. Do you know when that was? Feel free to private message me.
-
Hello and thanks for the responses. They are complaining that Rogerbot is too aggressive. I've seen other complaints online about the same issue with Rogerbot doing too many page requests. What's the chances my host is right? Does Rogerbot need a chill pill?
-
I'm afraid Jonathan is right—if Rogerbot is the crawler that gathers data for the crawl reports, so if he can't crawl your site you won't get crawl data. Your only other option would be to use a crawl tool other than Moz.
-
I would suggest you find a new host. I know that is the obvious answer, but probably the easiest. If you have cPanel, many hosts will transfer everything across for you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocked Resource in Google Index. SSL certificate blocking 718 pages seen in Google Search Console.
My google search console indicates that my SSL certificate is blocking Googlebot. I was wondering if the blocking of my SSL certificate to the GoogleBot is causing any issues. I I'm not sure if this was only blocked recently by Volusion (my host) as a means of accommodating my ssl certificate not being able to address the various url versions of my site, or is this just commonplace and not really harmful to my indexing. I tested one of these "blocked" urls in the robots.txt tester and it showed that the Googlebot was allowed. Could it be just the SSL certificate at the bottom of the page is blocked? Thanks
Moz Bar | | mrkingsley0 -
Rogerbot will not crawl my site! Site URL is https but keep getting and error that homepage (http) can not be accessed. I set up a second campaign to alter the target url to the newer https version but still getting the same error! What can I do?
Site URL is https but keep getting and error that homepage (http://www.flogas.co.uk/) can not be accessed. I set up a second campaign to alter the target url to the newer https://www.flogas.co.uk/ version but still getting the same error! What can I do? I want to use Moz for everything rather than continuing to use a separate auditing tool!
Moz Bar | | digitalascend0 -
Why RogerBot can't crawl site https://unplag.com
Hello Please help me to solve the problem. The on-page grader and Crawl Test are not working for Unplag.com website. Both said that they can't access the url. Yes, I've tried different variants like unplag.com, http://unplag.com One more thing - RogerBot was disallowed in robots.txt file. I deleted it from the file a week ago so maybe moz index haven't been renewed.
Moz Bar | | Targeras0 -
Moz reporting for C-Blocking
Hey Mozers, I see Moz has a reporting tool for C-blocking and for november I had 330. Does this mean 330 Ip addresses came from the same location in the month of november?
Moz Bar | | rpaiva1 -
Is it possible for a wordpress theme to block MozBar in Chrome?
My client's website, when viewed in Chrome, does not show the MozBar. In Firefox, it's all good. I've checked multiple websites in Chrome and they all bring up the MozBar, it's just this one. (theme and wordpress install are both up to date) Is it possible that this is a theme issue or does anyone have any ideas as to why this is happening?
Moz Bar | | SearchAppealSEO0 -
We Launched a new site and Rogerbot is still reporting on links/errors from the old site, is there a way to clear those out?
We are mostly a Branding agency, and have not put a lot of effort into SEO for ourselves... SEO tends to take a backseat to design most of the time, making it a little difficult for me at times when it comes to SEO. We recently launched a new site, http://Roninadv.com/ and the developer and I have done quite a bit of work to make it work well for Google. I was really looking forward to a new crawl report from Roger, but alas, It's like Roger crawled the old site? The new site has been up since last Monday. Is there a way to clear out the old errors? Do I just need to give roger more time?
Moz Bar | | PaulRonin0 -
Blocked Production Site from Search Engines - How to get it Crawled by Moz Crawler
I have an 'under development' site hosted, (which is an exact replica of live site as working on to add new functionalities & modules) - but its password protected, excluded from robots.txt (Disallow) & also marked noindex on all pages in the index - so that Googlebot & other Search Engines can not crawl the site At present the development work is almost 95% completed., Now - feel like to crawl the site through SEOMOZ Roger Bot - to know the errors and all indexed urls by Rogerbot. What's the best way to get Moz Bot crawl the site - but simultaneously continue it blocking its access to Search Engines I have gone through - https://support.google.com/webmasters/answer/93708?hl=en, it says a) Save it in a password-protected directory. Googlebot and other spiders won't be able to access the content- But this way Moz will also not be able to crawl the site b) Use a robots.txt to control access to files and directories on your server - However it also says - It's important to note that even if you use a robots.txt file to block spiders from crawling content on your site, Google could discover it in other ways and add it to our index. c) Use a noindex meta tag to prevent content from appearing in our search results - It also says that a link to the page can still appear in their search results. Because we have to crawl your page in order to see the noindex tag, there's a small chance that Googlebot won't see and respect the noindex meta tag Password Protected thus seems the best way to continue blocking. However, continuing with it will also block Moz bot to crawl the site. Any suggestions Thanks
Moz Bar | | Modi0 -
Where's Rogerbot?
Could someone please tell me where Rogerbot lives now?! Unless I'm having distorted memories, I used to be able to crawl websites with Rogerbot (that are not set-up as Campaigns). Could someone please let me know where to find this now? p.s. I didn't really want to Q&A this, but after a while clicking around moz I'm now even questioning myself!
Moz Bar | | GregDixson0