Robots.txt gone wild
-
Hi guys, a site we manage, http://hhhhappy.com received an alert through web master tools yesterday that it can't be crawled. No changes were made to the site.
Don't know a huge amount about the robots.txt configuration expect that using Yoast by default it sets it not to crawl wp admin folder and nothing else. I checked this against all other sites and the settings are the same. And yet 12 hours later after the issue Happy is still not being crawled and meta data is not showing in search results. Any ideas what may have triggered this?
-
Hi Radi!
Have Matt and/or Martijn answered your question? If so, please mark one or both of their responses "Good Answer."
Otherwise, what's still tripping you up?
-
Have you checked the downtime of the site recently? Sometimes it could be that Google isn't able to reach your robots.txt file and because of that they'll stop crawling your site temporarily.
-
Are you getting the message in Search Console that there were errors crawling your page?
This typically means that your host was temporarily down when Google landed on your page. These types of things happen all the time and are no big deal.
Your homepage cache shows a crawl date of today so I'm assuming things are working properly ... if you really want to find out, try doing a "Fetch" of your site in Search Console.
Crawl > Fetch as Google > Fetch (big red button)
You should get a status of "Complete." If you get anything else there should be an error message with it. If so, paste that here.
I have checked the site headers, cache, crawlability with Screaming Frog, and everything is fine. This seems like one of those temporary messages but if the problem persists definitely let us know!
-
Our host has just offered this response which does not get me any closer:
Hi Radi,
It looks like your site has its own robots.txt file, which is not blocking any user agents. The only thing it's doing is blocking bots from indexing your admin area:
<code>User-agent: * Disallow: /wp-admin/</code>
This is a standard robots.txt file, and you shouldn't be having any issues with Google indexing your site from a hosting standpoint. To test this, I curled the site as Googlebot and received a 200OK response:
<code>curl -A "Googlebot/2.1" -IL [hhhhappy.com](http://hhhhappy.com) HTTP/1.1 200 OK Date: Sat, 05 Mar 2016 22:17:26 GMT Content-Type: text/html; charset=UTF-8 Connection: keep-alive Set-Cookie: __cfduid=d3177a1baa04623fb2573870f1d4b4bac1457216246; expires=Sun, 05-Mar-17 22:17:26 GMT; path=/; domain=.[hhhhappy.com](http://hhhhappy.com); HttpOnly X-Cacheable: bot Cache-Control: max-age=10800, must-revalidate X-Cache: HIT: 17 X-Cache-Group: bot X-Pingback: [http://hhhhappy.com/xmlrpc.php](http://hhhhappy.com/xmlrpc.php) Link: <[http://hhhhappy.com/](http://hhhhappy.com/)>; rel=shortlink Expires: Thu, 19 Nov 1981 08:52:00 GMT X-Type: default X-Pass-Why: Set-Cookie: X-Mapping-fjhppofk=2C42B261F74DA203D392B5EC5BF07833; path=/ Server: cloudflare-nginx CF-RAY: 27f0f02445920f09-IAD</code>
I didn't see any plugins on your site that looked like they would overwrite robots.txt, but I urge you to take another look at them, and then dive into your site's settings for the meta value that Googlebot would pick up. Everything on our end seems to be giving the green light.
Please let us know if you have any other questions or issues in the meantime.
Cheers,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Results Pages Blocked in Robots.txt?
Hi I am reviewing our robots.txt file. I wondered if search results pages should be blocked from crawling? We currently have this in the file /searchterm* Is it a good thing for SEO?
Intermediate & Advanced SEO | | BeckyKey0 -
Will disallowing URL's in the robots.txt file stop those URL's being indexed by Google
I found a lot of duplicate title tags showing in Google Webmaster Tools. When I visited the URL's that these duplicates belonged to, I found that they were just images from a gallery that we didn't particularly want Google to index. There is no benefit to the end user in these image pages being indexed in Google. Our developer has told us that these urls are created by a module and are not "real" pages in the CMS. They would like to add the following to our robots.txt file Disallow: /catalog/product/gallery/ QUESTION: If the these pages are already indexed by Google, will this adjustment to the robots.txt file help to remove the pages from the index? We don't want these pages to be found.
Intermediate & Advanced SEO | | andyheath0 -
Robots txt is case senstive? Pls suggest
Hi i have seen few urls in the html improvements duplicate titles Can i disable one of the below url in the robots.txt? /store/Solar-Home-UPS-1KV-System/75652
Intermediate & Advanced SEO | | Rahim119
/store/solar-home-ups-1kv-system/75652 if i disable this Disallow: /store/Solar-Home-UPS-1KV-System/75652 will the Search engines scan this /store/solar-home-ups-1kv-system/75652 im little confused with case senstive.. Pls suggest go ahead or not in the robots.txt0 -
Robots.txt - Googlebot - Allow... what's it for?
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke User-Agent: Googlebot Allow: /.js Allow: /.css
Intermediate & Advanced SEO | | McTaggart0 -
Google Indexing Duplicate URLs : Ignoring Robots & Canonical Tags
Hi Moz Community, We have the following robots command that should prevent URLs with tracking parameters being indexed. Disallow: /*? We have noticed google has started indexing pages that are using tracking parameters. Example below. http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html http://www.oakfurnitureland.co.uk/furniture/original-rustic-solid-oak-4-drawer-storage-coffee-table/1149.html?ec=affee77a60fe4867 These pages are identified as duplicate content yet have the correct canonical tags: https://www.google.co.uk/search?num=100&site=&source=hp&q=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&oq=site%3Ahttp%3A%2F%2Fwww.oakfurnitureland.co.uk%2Ffurniture%2Foriginal-rustic-solid-oak-4-drawer-storage-coffee-table%2F1149.html&gs_l=hp.3..0i10j0l9.4201.5461.0.5879.8.8.0.0.0.0.82.376.7.7.0....0...1c.1.58.hp..3.5.268.0.JTW91YEkjh4 With various affiliate feeds available for our site, we effectively have duplicate versions of every page due to the tracking query that Google seems to be willing to index, ignoring both robots rules & canonical tags. Can anyone shed any light onto the situation?
Intermediate & Advanced SEO | | JBGlobalSEO0 -
Whole site blocked by robots in webmaster tools
My URL is: www.wheretobuybeauty.com.auThis new site has been re-crawled over last 2 weeks, and in webmaster tools index status the following is displayed:Indexed 50,000 pagesblocked by robots 69,000Search query 'site:wheretobuybeauty.com.au' returns 55,000 pagesHowever, all pages in the site do appear to be blocked and over the 2 weeks, the google search query site traffic declined from significant to zero (proving this is in fact the case ).This is a Linux php site and has the following: 55,000 URLs in sitemap.xml submitted successfully to webmaster toolsrobots.txt file existed but did not have any entries to allow or disallow URLs - today I have removed robots.txt file completely URL re-direction within Linux .htaccess file - there are many rows within this complex set of re-directions. Developer has double checked this file and found that it is valid.I have read everything that google and other sources have on this topic and this does not help. Also checked webmaster crawl errors, crawl stats, malware and there is no problem there related to this issue.Is this a duplicate content issue - this is a price comparison site where approx half the products have duplicate product descriptions - duplicated because they are obtained from the suppliers through an XML data file. The suppliers have the descriptions from the files in their own sites.Help!!
Intermediate & Advanced SEO | | rrogers0 -
My site was number 1 for competetive keyword on Google now completely gone- what should I do?
WHat is the best way to determine if I have somehow been delisted from Google?
Intermediate & Advanced SEO | | TBKO0 -
Block all but one URL in a directory using robots.txt?
Is it possible to block all but one URL with robots.txt? for example domain.com/subfolder/example.html, if we block the /subfolder/ directory we want all URLs except for the exact match url domain.com/subfolder to be blocked.
Intermediate & Advanced SEO | | nicole.healthline0