Rogerbot crawls my site and causes error as it uses urls that don't exist
-
Whenever the rogerbot comes back to my site for a crawl it seems to want to crawl urls that dont exist and thus causes errors to be reported...
Example:- The correct url is as follows:
/vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/
But it seems to want to crawl the following:
/vw-baywindow/cab_door_slide_door_tailgate_engine_lid_parts/cab_door_seals/genuine_vw_brazil_cab_door_rubber_68-79_10330/?id=10330
This format doesn't exist anywhere and never has so I have no idea where its getting this url format from
The user agent details I get are as follows:
IP ADDRESS: 107.22.107.114
USER AGENT: rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+pr1-crawler-17@moz.com) -
The first thing I would do is download the crawl report as an excel sheet. You can do this from your crawl report page.
From there, sort by the 404 error column, bringing "True" to the top. The top of the list is now the broken URL's. One of the very last columns on the right is the "referrer" column. This will show you the page where Roger is getting the bad link from.
Make Sense?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
WEbsite cannot be crawled
I have received the following message from MOZ on a few of our websites now Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster. I have spoken with our webmaster and they have advised the below: The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place. For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end. _Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _ Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
Moz Pro | | threecounties0 -
Is it possible to block Moz from crawling sites?
Hi, is it possible to stop Moz from crawling a site at the server level? Not that I am looking to do this or anything, but here's why I'm asking. I have been crawling a site that is managed (currently by 2 parties), and I noticed that this week pages crawled went from 80 (last week) to 1 page!! I know, what? See my image attached... and the issues all went to zero "0"....! So is it possible that someone can't prevent Moz from crawling the site at the server level? I checked the robots.txt file on the site, but nothing there. I'm curious. dYNUwjd.jpg
Moz Pro | | co.mc0 -
Moz Crawl Test: WordPress sites with and without /feed and /trackback entires?
I have multiple WP websites and on some of the websites, on my Moz Crawl test, I see an entry for every blog post but also entries for /feed and /trackback for that single blog post. For example, www...com/someArticle www....com/someArticle/feed www...com/someArticle/trackback 1. Can anyone explain why the Crawl test is picking up the /feed and /trackback items? Is it simply because they are 301 redirects to the original post (www...com/someArticle)? 2. What setting(s) in WordPress are making this information appear? Or is it just that the site(s) that have the /feed and /trackback are displaying "normal" behavior for a WP site with a lot of trackbacks and feed entires? 3. Should /fee and /trackback, as well as /author be blocked in robots.txt? Thanks in advance for your advice and input!
Moz Pro | | Titan5520 -
What's the best tool to use to compare competirors
A client of ours has asked us to compare their search rankings to competitors. What's the best tool to use in SEOMoz to do this?
Moz Pro | | BillyBobGriffin0 -
Can't log into Firefox MozBar
I just downloaded and installed the MozBar for Firefox, but it will not let me login to my account. The Log In button is gray and none of the buttons do anything when I click on them. Please help! Thank you,
Moz Pro | | Instabill
Meghan0 -
Initiate crawl
Anyway to start the crawl of a site immediately after changes have been made? Or must you wait for the next scheduled crawl? Thanks.
Moz Pro | | dave_whatsthebigidea.com0 -
Third crawl of my sites back to 250 pages
Hi all, I've been waiting some days for the third crawl of my sites, but SEOMOZ only crawled 277 pages. The next phrase appeared on my crawl report: Pages Crawled: 277 | Limit: 250 My last 2 crawls were of about 10K limit. Any idea? Kind regards, Simon.
Moz Pro | | Aureka0 -
RogerBot does not respect some rules??
Hello; Every week when I see my stats I notice that RogerBot has crawled 10000 form my website, even pages with a no index or not allowed in the robots.txt. Is it possible to avoid him from crawling the these pages? They are form pages in my site, with are not indexed by google, they have a noindex and they are not allowed for crawling in the robots.txt. Thanks everyone for your help!!!
Moz Pro | | jgomes0