Robot.txt File Not Appearing, but seems to be working?
-
Hi Mozzers,
I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank.
Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks!
Jared
-
There is an old webmaster world thread that explains how to hide the robots.txt file from browsers.... not sure why one would do this however....
http://www.webmasterworld.com/forum93/74.htm
Perhaps they are doing something like this?
-
I verified that I was checking /robots.txt. I had trouble verifying if it was under the non-www because everything redirects to the www. I also checked to see if it was being blocked, and it is not.
I went to Archive.org (Wayback Machine), and I can see the robot.txt file in previous versions of the site. I cannot, however, view it online, even though Google says they are downloading it successfully, and the robots.txt file is successfully blocking URLs from the search index.
-
Be sure you are visiting /robots.txt In all of your copy above, you are referencing robot.txt
Also, check to see if it possibly is only showing up on the www. version or the site or the non-www version of the site.
To be sure if it's working, you can test URLs of your website within Google Webmaster Tools. Go to Crawl->Blocked URLs and scroll down to the bottom.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Pages blocked by robots
**yazılım sürecinde yapılan bir yanlışlıktı.** Sorunu hızlı bir şekilde nasıl çözebilirim? bana yardım et. ```[XTRjH](https://imgur.com/a/XTRjH)
Intermediate & Advanced SEO | | mihoreis0 -
Google disavow file
Does anybody have any idea how often Google reads the disavow file?
Intermediate & Advanced SEO | | seoman100 -
Blacklisted website no longer blacklisted, but will not appear on Google's search engine.
We have a client who before us, had a website that was blacklisted by Google. After we created their new website, we submitted an appeal through Google's Webmaster Tools, and it was approved. One year later, they are still unable to rank for anything on Google. The keyword we are attempting to rank for on their home page is "Day in the Life Legal Videos" which shouldn't be too difficult to rank for after a year. But their website cannot be found. What else can we do to repair this previously blacklisted website after we're already been approved by Google? After doing a link audit, we found only one link with a spam score of 7, but I highly doubt that is what is causing this website to no longer appear on Google. Here is the website in question: https://www.verdictvideos.com/
Intermediate & Advanced SEO | | rodneywarner0 -
How long should I keep the 301 redirect file
We've setup an new site and many pages don't exist anymore (clean up done). But for many of them we have new pages with new url's. We've monitored the 404 and have now many URL's redirected with 301 (apache file). How long should we keep this in place? Checking all links manually to see of new url is in place of the old url (in google) is too much work. tx!
Intermediate & Advanced SEO | | KBC0 -
Robots.txt vs noindex
I recently started working on a site that has thousands of member pages that are currently robots.txt'd out. Most pages of the site have 1 to 6 links to these member pages, accumulating into what I regard as something of link juice cul-d-sac. The pages themselves have little to no unique content or other relevant search play and for other reasons still want them kept out of search. Wouldn't it be better to "noindex, follow" these pages and remove the robots.txt block from this url type? At least that way Google could crawl these pages and pass the link juice on to still other pages vs flushing it into a black hole. BTW, the site is currently dealing with a hit from Panda 4.0 last month. Thanks! Best... Darcy
Intermediate & Advanced SEO | | 945010 -
How to leverage browser cache a specific file
Hello all,
Intermediate & Advanced SEO | | asbchris
I am trying to figure out how to add leverage browser caching to these items. http://maps.googleapis.com/maps/api/js?v=3.exp&sensor=false&language=en http://ajax.googleapis.com/ajax/libs/webfont/1/webfont.js http://www.google-analytics.com/analytics.js Whats hard is I understand the purpose, but unlike a css file, how do you specify an expiration on an actual direct path file? Any help or link to get help is appreciated. Chris0 -
Negative impact on crawling after upload robots.txt file on HTTPS pages
I experienced negative impact on crawling after upload robots.txt file on HTTPS pages. You can find out both URLs as follow. Robots.txt File for HTTP: http://www.vistastores.com/robots.txt Robots.txt File for HTTPS: https://www.vistastores.com/robots.txt I have disallowed all crawlers for HTTPS pages with following syntax. User-agent: *
Intermediate & Advanced SEO | | CommercePundit
Disallow: / Does it matter for that? If I have done any thing wrong so give me more idea to fix this issue.0 -
There seems to be something obvious stopping our product pages from being listed in organic results - help!
Hello All Firstly new to SEO MOZ but what a fantastic resource, good work! I help run a platform at ethical community (dot) com (have phrased it like that so google doesn't pick up this thread hope thats ok). We seem to have something glaringly obvious with the SEO ability of our product pages. We now have over 7000 products on the site and would like to think we have done a pretty good job in terms of optimisning them, lots of nice keywords, relevant page titles, good internal links, and even recently have reduced the loading speeds a fair amount. We have a sitemap set up feeding in URLS to Google and some of them are now nearly a year old. The problem, when doing an EXACT google search on a product title the product pages dont show up for the majority of the 7000 products. HOWEVER.... we get fantastic ranking in google products, and get sales through other areas of the site, which seems even more odd. For example, if you type in "segway" you'll see us ranking on the first page of google in google products, but the product page itself is nowhere to be seen. For example "DARK CHOCOLATE STRANDS 70G CAKE DECORATION" gets no results on google (aside from google products) when we have this page at OURDOMAIN/eco-shop/food/dark-chocolate-strands-70g-cake-decoration-5592 Can anyone help identify if there is a major bottleneck here our gut feeling is there is one major factor that is causing this.
Intermediate & Advanced SEO | | ethicalcommunity0