Robots.txt gone wild
-
Hi guys, a site we manage, http://hhhhappy.com received an alert through web master tools yesterday that it can't be crawled. No changes were made to the site.
Don't know a huge amount about the robots.txt configuration expect that using Yoast by default it sets it not to crawl wp admin folder and nothing else. I checked this against all other sites and the settings are the same. And yet 12 hours later after the issue Happy is still not being crawled and meta data is not showing in search results. Any ideas what may have triggered this?
-
Hi Radi!
Have Matt and/or Martijn answered your question? If so, please mark one or both of their responses "Good Answer."
Otherwise, what's still tripping you up?
-
Have you checked the downtime of the site recently? Sometimes it could be that Google isn't able to reach your robots.txt file and because of that they'll stop crawling your site temporarily.
-
Are you getting the message in Search Console that there were errors crawling your page?
This typically means that your host was temporarily down when Google landed on your page. These types of things happen all the time and are no big deal.
Your homepage cache shows a crawl date of today so I'm assuming things are working properly ... if you really want to find out, try doing a "Fetch" of your site in Search Console.
Crawl > Fetch as Google > Fetch (big red button)
You should get a status of "Complete." If you get anything else there should be an error message with it. If so, paste that here.
I have checked the site headers, cache, crawlability with Screaming Frog, and everything is fine. This seems like one of those temporary messages but if the problem persists definitely let us know!
-
Our host has just offered this response which does not get me any closer:
Hi Radi,
It looks like your site has its own robots.txt file, which is not blocking any user agents. The only thing it's doing is blocking bots from indexing your admin area:
<code>User-agent: * Disallow: /wp-admin/</code>
This is a standard robots.txt file, and you shouldn't be having any issues with Google indexing your site from a hosting standpoint. To test this, I curled the site as Googlebot and received a 200OK response:
<code>curl -A "Googlebot/2.1" -IL [hhhhappy.com](http://hhhhappy.com) HTTP/1.1 200 OK Date: Sat, 05 Mar 2016 22:17:26 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive Set-Cookie: __cfduid=d3177a1baa04623fb2573870f1d4b4bac1457216246; expires=Sun, 05-Mar-17 22:17:26 GMT; path=/; domain=.[hhhhappy.com](http://hhhhappy.com); HttpOnly X-Cacheable: bot Cache-Control: max-age=10800, must-revalidate X-Cache: HIT: 17 X-Cache-Group: bot X-Pingback: [http://hhhhappy.com/xmlrpc.php](http://hhhhappy.com/xmlrpc.php) Link: <[http://hhhhappy.com/](http://hhhhappy.com/)>; rel=shortlink Expires: Thu, 19 Nov 1981 08:52:00 GMT X-Type: default X-Pass-Why: Set-Cookie: X-Mapping-fjhppofk=2C42B261F74DA203D392B5EC5BF07833; path=/ Server: cloudflare-nginx CF-RAY: 27f0f02445920f09-IAD</code>
I didn't see any plugins on your site that looked like they would overwrite robots.txt, but I urge you to take another look at them, and then dive into your site's settings for the meta value that Googlebot would pick up. Everything on our end seems to be giving the green light.
Please let us know if you have any other questions or issues in the meantime.
Cheers,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When serving a 410 for page gone, should I serve an error page?
I'm removing a bunch of old & rubbish pages and was going to serve 410 to tell google they're gone (my understanding is it'll get them out of the index a bit quicker than a 404). I should still serve an error page though, right? Similar to a 404. That doesn't muddy the "gone" message that I'm giving Google? There's no need to 410 and die?
Intermediate & Advanced SEO | | HSDOnline0 -
SEO website migration gone wrong - noticed too late?
I have just been contacted by a company whose website has lost nearly all of its traffic. The web developers appeared to know nothing about the SEO aspects, when it came to migrating the website (this website change took place first week of August) - the traffic has gone from 7,000 sessions to 200 sessions a month. I can work through the usual SEO migration steps to help recover performance, yet normally I get employed on this kind of project as soon as the traffic loss is noticed... this time the traffic loss kicked in nearly 2 months ago - what are the implications of such a time lag re: SEO recovery?
Intermediate & Advanced SEO | | McTaggart0 -
Block subdomain directory in robots.txt
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
Intermediate & Advanced SEO | | gamesecure
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'. so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt? This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version. Thanks,
Rajiv0 -
Robots.txt help
Hi Moz Community, Google is indexing some developer pages from a previous website where I currently work: ddcblog.dev.examplewebsite.com/categories/sub-categories Was wondering how I include these in a robots.txt file so they no longer appear on Google. Can I do it under our homepage GWT account or do I have to have a separate account set up for these URL types? As always, your expertise is greatly appreciated, -Reed
Intermediate & Advanced SEO | | IceIcebaby0 -
Robots.txt Blocked Most Site URLs Because of Canonical
Had a bit of a "Gotcha" in Magento. We had Yoast Canonical Links extension which worked well , but then we installed Mageworx SEO Suite.. which broke Canonical Links. Unfortunately it started putting www.mysite.com/catalog/product/view/id/516/ as the Canonical Link - and all URLs with /catalog/productview/* is blocked in Robots.txt So unfortunately We told Google that the correct page is also a blocked page. they haven't been removed as far as I can see but traffic has certainly dropped. We have also , at the same time had some Site changes grouping some pages & having 301 redirects. Resubmitted site map & did a fetch as google. Any other ideas? And Idea how long it will take to become unblocked?
Intermediate & Advanced SEO | | s_EOgi_Bear0 -
Robots.txt
What would be a perfect robots.txt file my site is propdental.es Can i just place: User-agent: * Or should i write something more???
Intermediate & Advanced SEO | | maestrosonrisas0 -
Robot.txt help
Hi, We have a blog that is killing our SEO. We need to Disallow Disallow: /Blog/?tag*
Intermediate & Advanced SEO | | Studio33
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspx But Allow everything below /Blog/Post The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt Thanks0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0