RogerBot does not respect some rules??
-
Hello;
Every week when I see my stats I notice that RogerBot has crawled 10000 form my website, even pages with a no index or not allowed in the robots.txt.
Is it possible to avoid him from crawling the these pages? They are form pages in my site, with are not indexed by google, they have a noindex and they are not allowed for crawling in the robots.txt.
Thanks everyone for your help!!!
-
If Roger is still not listening to you, send an email to help@seomoz.org and open a ticket with the help desk. They'll try to figure out why he's misbehaving and how to get him to listen to you again.
-
Hi Jorge,
Yes this would be possible, Rogerbot is also the User Agent for the crawler. So within you're robots.txt you are capable of letting Roger know which pages you don't like him to crawl. More information about this could be found on this page about Roger himself.
Hopefully this answers your question.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Rogerbot blocked by cloudflare and not display full user agent string.
Hi, We're trying to get MOZ to crawl our site, but when we Create Your Campaign we get the error:
Moz Pro | | BB_NPG
Ooops. Our crawlers are unable to access that URL - please check to make sure it is correct. If the issue persists, check out this article for further help. robot.txt is fine and we actually see cloudflare is blocking it with block fight mode. We've added in some rules to allow rogerbot but these seem to be getting ignored. If we use a robot.txt test tool (https://technicalseo.com/tools/robots-txt/) with rogerbot as the user agent this get through fine and we can see our rule has allowed it. When viewing the cloudflare activity log (attached) it seems the Create Your Campaign is trying to crawl the site with the user agent as simply set as rogerbot 1.2 but the robot.txt testing tool uses the full user agent string rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com) albeit it's version 1.0. So seems as if cloudflare doesn't like the simple user agent. So is it correct the when MOZ is trying to crawl the site it uses the simple string of just rogerbot 1.2 now ? Thanks
Ben Cloudflare activity log, showing differences in user agent strings
2022-07-01_13-05-59.png0 -
Restrict rogerbot for few days
Hi Team, I have a subdomain that built in Zendesk's CRM system. Now, I want to restrict Moz crawler (rogerbot) for crawling this complete subdomain for a few days, but I am not able to edit the robots.txt file of the subdomain, because this is a shared file and Zendesk is not allowing to edit it. Could you please let me know the alternative way to restrict rogerbot to crawl this subdomain? I am eagerly awaiting your quick response. Thanks
Moz Pro | | Adeptia0 -
Rogerbot crawls my site and causes error as it uses urls that don't exist
Whenever the rogerbot comes back to my site for a crawl it seems to want to crawl urls that dont exist and thus causes errors to be reported... Example:- The correct url is as follows: /vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/ But it seems to want to crawl the following: /vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/?id=10330 This format doesn't exist anywhere and never has so I have no idea where its getting this url format from The user agent details I get are as follows: IP ADDRESS: 107.22.107.114
Moz Pro | | spiralsites
USER AGENT: rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+pr1-crawler-17@moz.com)0 -
Allow only Rogerbot, not googlebot nor undesired access
I'm in the middle of site development and wanted to start crawling my site with Rogerbot, but avoid googlebot or similar to crawl it. Actually mi site is protected with login (basic Joomla offline site, user and password required) so I thought that a good solution would be to remove that limitation and use .htaccess to protect with password for all users, except Rogerbot. Reading here and there, it seems that practice is not very recommended as it could lead to security holes - any other user could see allowed agents and emulate them. Ok, maybe it's necessary to be a hacker/cracker to get that info - or experienced developer - but was not able to get a clear information how to proceed in a secure way. The other solution was to continue using Joomla's access limitation for all, again, except Rogerbot. Still not sure how possible would that be. Mostly, my question is, how do you work on your site before wanting to be indexed from Google or similar, independently if you use or not some CMS? Is there some other way to perform it?
Moz Pro | | MilosMilcom
I would love to have my site ready and crawled before launching it and avoid fixing issues afterwards... Thanks in advance.0 -
Data Update for RogerBot
Hi, I noticed that rogerbot still give me 404 for http://www.salustore.com/capelli/nanogen-acquamatch.html refferal form http://www.salustore.com/protocollo-nanogen even I made changes since a couple of week. Same error with one "Title Element Too Short" on our site. Any suggestion on how to refresh it? Best Regards n.
Moz Pro | | nicolobottazzi0 -
Does anyone know of a crawler similar to SEOmoz's RogerBot?
As you probably know SEOmoz had some hosting and server issues recently, and this came at a terrible time for me... We are in the middle of battling some duplicate content and crawl errors and need to get a fresh crawl of some sites to test things out before we are hit with the big one? Before I get a million thumbs downs- I love and will continue to use SEOmoz, just need something to get me through this week ( or until Roger is back! )!
Moz Pro | | AaronSchinke1 -
Ruling out subfolders in pro tool crawl
Is there a way to "rule out" a subfolder in the pro dashboard site crawl? We're working on a site that has 500,000+ pages in the forums, but its the CMS pages we're optimizing and don't want to spend the 10k limit on forum pages.
Moz Pro | | DeepRipples0 -
What is the full User Agent of Rogerbot?
What's the exact string that Rogerbot send out as his UserAgent within the HTTP Request? Does it ever differ?
Moz Pro | | rightmove0