Rogerbot Ignoring Robots.txt?
-
Hi guys,
We're trying to block Rogerbot from spending 8000-9000 of our 10000 pages per week for our site crawl on our zillions of PhotoGallery.asp pages. Unfortunately our e-commerce CMS isn't tremendously flexible so the only way we believe we can block rogerbot is in our robots.txt file.
Rogerbot keeps crawling all these PhotoGallery.asp pages so it's making our crawl diagnostics really useless.
I've contacted the SEOMoz support staff and they claim the problem is on our side. This is the robots.txt we are using:
User-agent: rogerbot
Disallow:/PhotoGallery.asp
Disallow:/pindex.asp
Disallow:/help.asp
Disallow:/kb.asp
Disallow:/ReviewNew.asp
User-agent: *
Disallow:/cgi-bin/
Disallow:/myaccount.asp
Disallow:/WishList.asp
Disallow:/CFreeDiamondSearch.asp
Disallow:/DiamondDetails.asp
Disallow:/ShoppingCart.asp
Disallow:/one-page-checkout.asp
Sitemap: http://store.jrdunn.com/sitemap.xml
For some reason the Wysiwyg edit is entering extra spaces but those are all single spaced.
Any suggestions? The only other thing I thought of to try is to something like "Disallow:/PhotoGallery.asp*" with a wildcard.
-
I have just encountered an interesting thing about Moz Link Search and its bot: if you do a search for Domains linking to Google.com , you find a list of about 900 000 domains, among which I was surprised to find webcache.googleusercontent.com
See the proof below in attache screen shot.
At the same time, the webcache.googleusercontent.com policy for robots is as shown in the second attachment.
In my opinion, there is only one possible explanation: Moz Bot does ignore robots.txt files...
-
Thanks Cyrus,
No, for some reason the editor double-spaced the file when I pasted. Other than that, it's the same though.
Yes, I actually tried ordering the exclusions both ways. Neither works.
The robots.txt checkers report no errors. I had actually checked them before posting.
Before I posted this, I was pretty convinced the problem wasn't in our robots.txt but the Seomoz support staff says essentially, "We don't think the problem is with Rogerbot, so it must be in your robots.txt file, but we can't look at that, so if by some chance your robots.txt file is fine, then there's nothing we can do for you because we're just going to assume the problem is on your side."
I figured, with everything I've already tried, and if the fabulous SEOMoz community can't come up with a solution, that'll be the best I can do.
-
Hi Kelly,
Thanks for letting us know. Could be a couple of things right off the bat. Is this your exact robots.txt file? If so, it's missing some formatting like proper spacing to be perfectly compliant. You can run a check of your robots.txt file at serveral places.
http://tool.motoricerca.info/robots-checker.phtml
http://www.searchenginepromotionhelp.com/m/robots-text-tester/robots-checker.php
http://www.sxw.org.uk/computing/robots/check.html
Also, it's generally a good idea to put specific inclusions towards the bottom, so I might flip the order and put the rogerbot directives last and the User-agent: * first.
Hope this helps. Let us know if any of this points in the right direction.
-
Thanks so much for the tip. Unfortunately still unsuccessful. (shrug)
-
Try
Disallow: /PhotoGallery.asp
I put wild cards all over usually just to be sure and had no issues so far.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocking Moz
Moz are reporting the robots.txt file is blocking them from crawling one of our websites. But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems. We have never come up against this before, even with this site. Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error. Can anyone enlighten us to the problem please? http://www.wychwoodflooring.com -Christina
Moz Pro | | ChristinaRadisic0 -
Meta Robots query
Hi guys, I was ranking really well on my home page for certain keywords which has all dropped pretty dramatically over the last 3/4 weeks - I think the issue is since since the configuration of Yoast SEO Wordpress plugin. In March (when my rankings were strong) my crawl test showed the top data in the attached image, and in May (now the rankings have dropped severly) they show the bottom data. I don't fully understand canonical and Meta Robots so I am hoping someone can shed some light on the following points. 1. Will the change result in my loss of rankings.
Moz Pro | | RocketStats
2. How can I put it back to how it was in March? PS. I haven't had any Google penalties. Thanks,
Joshua RfTar0 -
Website blocked by Robots.txt in OSE
When viewing my client's website in OSE under the Top Pages tab, it shows that ALL pages are blocked by Robots.txt. This is extremely concerning because Google Webmaster Tools is showing me that all pages are indexed and OK. No crawl errors, no messages, no nothing. I did a "site:website.com" in Google and all of the pages of the website returned. Any thoughts? Where is OSE picking up this signal? I cannot find a blocked robots tag in the code or anything.
Moz Pro | | ConnellyPartners0 -
Rogerbot not showing in logs
Hi All Rogerbot has recently thrown up 403 errors for all our pages - no changes had been made to the site so I asked our ISP for assistance. They wanted to have a look at what rogerbot was doing and so went to the logs but rogerbot was not listed anywhere in the logs by name - any ideas why? Regards Craig
Moz Pro | | CraigWiltshire0 -
Seomoz bar: No Follow and Robots.txt
Should the Mozbar pickup 'nofollow" links that are handled in robots.txt ? the robots.tx blocks categories, but is still show as a followed (green) link when using the mozbar. Thanks! Holly ETA: I'm assuming that- disallow: myblog.com/category/ - is comparable to the nofollow tag on catagory?
Moz Pro | | squareplug0 -
Blocking all robots except rogerbot
I'm in the process of working with a site under development and wish to run the SEOmoz crawl test before we launch it publicly. Unfortunately rogerbot is reluctant to crawl the site. I've set my robots.txt to disallow all bots besides rogerbot. Currently looks like this: User-agent: * Disallow: / User-agent: rogerbot Disallow: All pages within the site are meta tagged index,follow. Crawl report says: Search Engine blocked by robots.txt Yes Am I missing something here?
Moz Pro | | ignician0 -
To block with robots.txt or canonicalize?
I'm working with an apt community with a large number of communities across the US. I'm running into dup content issues where each community will have a page such as "amenities" or "community-programs", etc that are nearly identical (if not exactly identical) across all communities. I'm wondering if there are any thoughts on the best way to tackle this. The two scenarios I came up with so far are: Is it better for me to select the community page with the most authority and put a canonical on all other community pages pointing to that authoritative page? or Should i just remove the directory all-together via robots.txt to help keep the site lean and keep low quality content from impacting the site from a panda perspective? Is there an alternative I'm missing?
Moz Pro | | JonClark150 -
What's name of SEOmoz and Open Site Explorer robots?!
I would like to exclude in robots.txt SEOmoz and Open Site Explorer bots to don't let them index my sites… what's their names?
Moz Pro | | cezarylech0