Our crawler was not able to access the robots.txt file on your site
-
Hello Mozzers!
I've received an error message saying the site can't be crawled because Moz is unable to access the robots.txt. I've spoken to the webmaster and he can't understand why the robot.txt can't be accessed in Moz.
https://www.thefurnshop.co.uk/robots.txt
and Google isn't flagging anything up to us.
Does anyone know how to solve this problem?
Thanks
-
@LoganRay This was our issue. Didn't know Moz tries to retrieve the HTTP robots.txt first. Our HTTPS redirect was not working on static files only, so the HTTP path to the robots.txt was failing. We did not notice it because the HSTS policy was forcing the browser to redirect.
-
Wanted to jump back in on this topic as I've just confirmed my initial suspicion.
I just added a new client to our Moz account and had the exact same issue, crawler unable to access the robots.txt file. It's a secure site and was configured in Moz without the HTTPS. When I go to the robots.txt file without https://www, it redirects to the same thing as yours where the / between the TLD and page path gets removed.
Reconfigure your site and it should begin to work.
-
There are 2 parts of your robots.txt that could be causing this, and it all just depends on how each bot is reading regular expressions in your robots.txt:
First, your Disallow: /? can be read as Disallow all paths starting with "/" with 0 to infinity characters "" and one character "?". Try replacing this part with Disallow: /*? to make it not crawl anything with a query string (which is what I believe you were going for).
Second, you have a open Disallow followed by the User-agent: rogerbot and while this should not be read this way, once again it all depends on how each bot reads the commands. To fix this you should change your Disallow following your Googlebot-Image as Disallow: /
-
Hi there,
There's something odd going on when I try to access your robots.txt file without the www. The www gets added back on, but when it does, the slash between the TLD and page path gets deleted, see below. I'm guessing your domain in Moz is configured without the www, which means RogerBot is getting redirected to this slash-less version of the file.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz not able to crawl our site - any advice?
When I try and crawl our site through Moz it gives this message: Moz was unable to crawl your site on Aug 7, 2019. Our crawler was banned by a page on your site, either through your robots.txt, the X-Robots-Tag HTTP header, or the meta robots tag. Update these tags to allow your page and the rest of your site to be crawled. If this error is found on any page on your site, it prevents our crawler (and some search engines) from crawling the rest of your site. Typically errors like this should be investigated and fixed by the site webmaster. I have been through all the help and doesn't seem to be any issues. You can check the site and robots.txt here: https://myfamilyclub.co.uk/robots.txt. Anyone got any advice on where I could go to get this sorted?
Getting Started | | MyFamilClubLtd1 -
Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/ To resolve this we have set up a disallow statement in the robots.txt file that says
Getting Started | | btreloar
Disallow: /page/ For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?0 -
Can't track my site, keep getting "Ooops. Our crawlers are unable to access that URL"
Hello, So i keep getting this message and I went to hurl.it and I get 200 response. But it appears its not my actual homepage bc it says the body is empty and in the title it says "COMING SOON" which is not what my actual homepage says. Does anyone know what this means?? Thank you in advance! Rena
Getting Started | | Palila-Studio0 -
Open site explorer
Excuse my ignorance but I am very new to all this. I have a new site www.sassandgrace.co.uk and am using MOZ to try and get it right first time. On OSE I am seeing different results for http://www.sassandgrace.co.uk/ and http://www.sassandgrace.co.uk i.e with an without / ) and different results again for http://sassandgrace.co.uk These differences relate to the social page metrics but I would be keen to get one consistent reading. What do I need to do? Also, I know that I have links to the site but none are showing at all. On google console there are a couple showing but I know that there are many more. Is it just a matter of time before they are 'found' or is there something that I should be doing? Sorry for what are probably very basic questions but any help is appreciated.
Getting Started | | Sassandgrace0 -
How to authenticate Moz crawler so that others don't use Rogerbot useragent to scrape data from our site?
Is there any way to authenticate genuine Moz crawler. Because, our website keeps getting scrapping attacks and if there is no way to authenticate Moz crawler, then, any scraper can just set user agent as Rogerbot and scrape all our pages. Is there a fixed IP that can be used or any other customization that will help us authenticate and allow only Moz crawler to crawl our site. Looking forward to a solution to this problem. We haven't been able to use Moz crawler due to this issue.
Getting Started | | longclimber0 -
Can not create new campain with my site: edunet.com.vn
I can not create new campain because it always warning my site is not a right URL. I don't understand, please tell what should I do. My site is: edunet.com.vn. (When I try to use "Grade a page for keyword" for URL: edunet.com.vn or edunet.com.vn/thong-tin-tuyen-sinh, it returns "Sorry, but that URL is inaccessible".) Thank you so much. Minh Tam.
Getting Started | | toosol0 -
What is the "other" group in traffic to your site on the moz dashboard ?
What is the "other" group in traffic to your site on the moz analytics dashboard ? "Not provided" is in search - organic search history so it's not that.
Getting Started | | Crocodesign0