WEbsite cannot be crawled
-
I have received the following message from MOZ on a few of our websites now
Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster.
I have spoken with our webmaster and they have advised the below:
The Robots.txt file is definitely there on all pages and Google is able to crawl for these files. Moz however is having some difficulty with finding the files when there is a particular redirect in place.
For example, the page currently redirects from threecounties.co.uk/ to https://www.threecounties.co.uk/ and when this happens, the Moz crawler cannot find the robots.txt on the first URL and this generates the reports you have been receiving. From what I understand, this is a flaw with the Moz software and not something that we could fix form our end.
_Going forward, something we could do is remove these rewrite rules to www., but these are useful redirects and removing them would likely have SEO implications. _
Has anyone else had this issue and is there anything we can do to rectify, or should we leave as is?
-
Ok, I made a quick test of your robot.txt file and looks fine,
https://www.threecounties.co.uk/robots.txtThen I made a test https://httpstatus.io/ to check the status code
of your robot.txt file and show me 200 status code (So it's fine)Also, you need to make sure that your robot.txt file is accessible for the Rogerbot (Moz crawler)
This day the hosting providers have become very strict with third-party crawlers
This includes Moz, Majestic SEO, Semrush and Ahrefs.Here you can find all the possible sources of the problem and recommended solutions
https://moz.com/help/guides/moz-pro-overview/site-crawl/unable-to-crawlRegards
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with a site of >50,000 pages vs. crawl limit?
What happens if you have a site in your Moz Pro campaign that has more than 50,000 pages? Would it be better to choose a sub-folder of the site to get a thorough look at that sub-folder? I have a few different large government websites that I'm tracking to see how they are fairing in rankings and SEO. They are not my own websites. I want to see how these agencies are doing compared to what the public searches for on technical topics and social issues that the agencies manage. I'm an academic looking at science communication. I am in the process of re-setting up my campaigns to get better data than I have been getting -- I am a newbie to SEO and the campaigns I slapped together a few months ago need to be set up better, such as all on the same day, making sure I've set it to include www or not for what ranks, refining my keywords, etc. I am stumped on what to do about the agency websites being really huge, and what all the options are to get good data in light of the 50,000 page crawl limit. Here is an example of what I mean: To see how EPA is doing in searches related to air quality, ideally I'd track all of EPA's web presence. www.epa.gov has 560,000 pages -- if I put in www.epa.gov for a campaign, what happens with the site having so many more pages than the 50,000 crawl limit? What do I miss out on? Can I "trust" what I get? www.epa.gov/air has only 1450 pages, so if I choose this for what I track in a campaign, the crawl will cover that subfolder completely, and I am getting a complete picture of this air-focused sub-folder ... but (1) I'll miss out on air-related pages in other sub-folders of www.epa.gov, and (2) it seems like I have so much of the 50,000-page crawl limit that I'm not using and could be using. (However, maybe that's not quite true - I'd also be tracking other sites as competitors - e.g. non-profits that advocate in air quality, industry air quality sites - and maybe those competitors count towards the 50,000-page crawl limit and would get me up to the limit? How do the competitors you choose figure into the crawl limit?) Any opinions on which I should do in general on this kind of situation? The small sub-folder vs. the full humongous site vs. is there some other way to go here that I'm not thinking of?
Moz Pro | | scienceisrad0 -
Possible Crawling Problem with Screaming Frog and Moz Crawlers
So I'm not sure if what I'm seeing is a problem or not. As of about two weeks ago the Moz crawler has only been able to see www.mysite.com, and none of the links, content, title, ect associated with the page. Essentially the report has one line, what should be the homepage, but it's not able to pull any information from the page but does show a 200 http status code. The report shows nothing blocked by robots or any errors. When I use screaming frog to crawl the site about 75% of the time it just reports one line www.mysite.com with a 200 status code, but again the crawler is not able to actually see the html. The other 25% of the time it works perfectly fine, crawls all pages and sees all meta info and content. There are no errors in Google WMT and everything looks ok there. We have seen a traffic drop the last two weeks but I don't know if this is the reason for it. I can't publicly post the page but if someone has an idea of what might be going on I'd be happy to PM them. Thanks
Moz Pro | | CJ50 -
Crawl Report Re-direct Notice?
Just trying to understand if this is bad or not. The crawl report has picked up that my website is redirecting (301) from http://mysite.com to http://www.mysite.com - under Crawl Notices (blue section). Is this the wrong way to do it as we wanted the www domain version? Is that why SEOMoz has flagged it ?
Moz Pro | | Ubique0 -
Crawl Disgnosis only crawling 250 pages not 10,000
My crawl diagnosis has suddenly dropped from 10,000 pages to just 250. I've been tracking and working on an ecommerce website with 102,000 pages (www.heatingreplacementparts.co.uk) and the history for this was showing some great improvements. Suddenly the CD report today is showing only 250 pages! What has happened? Not only is this frustrating to work with as I was chipping away at the errors and warnings, but also my graphs for reporting to my client are now all screwed up. I have a pro plan and nothing has (or should have!) changed.
Moz Pro | | eseyo0 -
Crawl Test has taken over 5 days and still has yet to complete
I am running some crawls on some sites and I have a number still pending. I have one from 7 days ago, a couple from 6 days ago, and 1 from 5 days ago. The confusing thing is that I have run a few others in that same period that have finished already. Do I need to restart the crawls or cancel them and start over?
Moz Pro | | DRSearchEngOpt0 -
Recent SEOMoz Crawl = Strange Results
Did anyone else get some really strange results in their weekly crawls this week with the campaign tool? Either my ranks sky rocked across three different sites or the tools is busted. Something to the tune of having 4 pages ranking in the top 30 to now having 15-16 pages ranking in the top 30. I'd love to find out it is just all the hard work paying off but i am worried it is the later. Regards - Kyle
Moz Pro | | kchandler0 -
Why is the SEOmoz crawler crawling the old version of our website?
Hello, I'm a new SEOmoz member. On Dec. 2nd, after completely redesigning our website, we migrated to a new hosting company by switching our DNS to the new server. The vast majority of the URLs have changed and we configured redirects of the old URLs to the new ones. Although, this task is not completed yet. After the migration, I created an account on SEOmoz to be able to track our progress and find the issues to fix to optimize our SEO. For some reason, in the SEOmoz reports it is the old URLs that show up. Unless the crawler does not actually crawl the pages and only uses the indexed pages to generate its report, I don't understand how could this possible. Anyone has a clue? When will the new URLs be indexed by SEOmoz and the major search engines? Thanks for your help!
Moz Pro | | Gestisoft-Qc0 -
Why did the crawl last night not show the same results i see in google?
Last night my keywords were crawled and it shows me that a key word is ranked 14. For 3 days now it has been rank 4 or 5. Is there a reason this is not accurate? I have not checked the rest of my keywords so i am not sure about those. Thanks
Moz Pro | | tom14cat140