Robots.txt error
-
Moz Crawler is not able to access the robots.txt due to server error. Please advice on how to tackle the server error.
-
Hello Shanidel,
Jo from the Moz help team here.
I've had a look at your site and I've not been able to access your robot.txt file, this is what I'm seeing in the browser
https://screencast.com/t/JjQI1WTH3ni
I'm also seeing this error when I check your robots.txt file through a third party tool
https://screencast.com/t/pxsP9pL5
So it looks to me like may be some intermittent issues with your robots.txt file. I would advise reaching out to your web developer to see if they can check your robots.txt file and make sure it's accessible.
If you're still having trouble please let us know at help@moz.com
Best of luck!
Jo
-
Hi,
I'm still having this problem. Moz is unable to crawl the site saying there is a problem with the robots.txt file.
Sorry.
-
happy to been useful
-
Below is the exact message that i received:
**Moz was unable to crawl your site on Aug 29, 2017. **Our crawler was not able to access the robots.txt file on your site. This often occurs because of a server error from the robots.txt. Although this may have been caused by a temporary outage, we recommend making sure your robots.txt file is accessible and that your network and server are working correctly. Typically errors like this should be investigated and fixed by the site webmaster.
-
yoursite.com/robot.txt -----> this is how your robot.txt file should be, so first I will recommend you test your robot.txt file to see if everything is ok, if dont there is an explanation about how to create a robot.txt
How to create a /robots.txt file
Where to put it
The short answer: in the top-level directory of your web server.
The longer answer:
When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.
For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.
See also:
-
Hi,
Can you please share the message you're receiving ? Also, did you check your Google Search Console to see if Google can access to your website ? Knowing the type of errors is the key to advice you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Adding your sitemap to robots.txt
Hi everyone, Best practice question: When adding your sitemap to your robots.txt file, do you add the whole sitemap at once or do you add different subcategories (products, posts, categories,..) separately? I'm very curious to hear your thoughts!
Technical SEO | | WeAreDigital_BE0 -
HREFLANG No Return Tag Error
Keep getting WMT no return tag error. Also got an email today on this issue. Here are a couple pages showing up in the error report: Originating URL: /hawaii/kauai-real-estate/ Alternate URL: /jp/hawaii/kauai-real-estate/ Here are the hreflang tags for each page: /hawaii/kauai-real-estate/ /jp/hawaii/kauai-real-estate/ The only thing I can see is the hreflang= is at the end of the snippet but doesn't seem like that would matter. Any thoughts would be greatly appreciated. Thank you.
Technical SEO | | SoulSurfer80 -
How i can remove 404 redirect error (Wordpress)
Hello ,
Technical SEO | | mayankebabu
I am getting 404 error in some pages of my wordpress site http://engineerbabu.com/ .
Those pages are permanently removed. Is there any plugin to fix this prob or anyway so that google will not crawl these pages.0 -
"Extremely high number of URLs" warning for robots.txt blocked pages
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system: http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search. User-agent: * Disallow: /trackingredirect However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed. If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
Technical SEO | | EhrenReilly0 -
When Should I Ignore the Error Crawl Report
I have a handful of pages listed in the Error Crawl Report, but the report isn't actually showing anything wrong with these pages. I am double checking the code on the site and also can't find anything. Should I just move on and ignore the Error Crawl Report for these few pages?
Technical SEO | | ChristinaRadisic0 -
Increase in authorization permission errors error after site switch
We launched our new site 2 days ago , since site was down for 12 hours for maintenance, we saw google webmaster tool shows this error . Since then google hasnt crawled, its been 36 hours. Do we need to do anyting? We have close to a million page google crawled before and I am wondering if this will effect anything.
Technical SEO | | tpt.com0 -
How does robots.txt affect aliased domains?
Several of my sites are aliased (hosted in subdirectories off the root domain on a single hosting account, but visible at www.theSubDirectorySite.com) Not ideal, I know, but that's a different issue. I want to block bots from viewing those files that are accessible in subdirectories on the main hosting account, www.RootDomain.com/SubDirectorySite/, and force the bots to look at www.SubDirectorySite.com instead. I utilized the canonical meta tag to point bots away from the sub directory site, but I am wondering what will happen if I use robots.txt to block those files from within the root domain. Will the bots, specifically Google bot, still index the site at its own URL, www.AnotherSite.com even if I've blocked that directory with Disallow: /AnotherSite/ ? THANK YOU!!!
Technical SEO | | michaelj_me0 -
Robots.txt blocking site or not?
Here is the robots.txt from a client site. Am I reading this right --
Technical SEO | | 540SEO
that the robots.txt is saying to ignore the entire site, but the
#'s are saying to ignore the robots.txt command? See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file To ban all spiders from the entire site uncomment the next two lines: User-Agent: * Disallow: /0