Robot.txt File Not Appearing, but seems to be working?
-
Hi Mozzers,
I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank.
Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks!
Jared
-
There is an old webmaster world thread that explains how to hide the robots.txt file from browsers.... not sure why one would do this however....
http://www.webmasterworld.com/forum93/74.htm
Perhaps they are doing something like this?
-
I verified that I was checking /robots.txt. I had trouble verifying if it was under the non-www because everything redirects to the www. I also checked to see if it was being blocked, and it is not.
I went to Archive.org (Wayback Machine), and I can see the robot.txt file in previous versions of the site. I cannot, however, view it online, even though Google says they are downloading it successfully, and the robots.txt file is successfully blocking URLs from the search index.
-
Be sure you are visiting /robots.txt In all of your copy above, you are referencing robot.txt
Also, check to see if it possibly is only showing up on the www. version or the site or the non-www version of the site.
To be sure if it's working, you can test URLs of your website within Google Webmaster Tools. Go to Crawl->Blocked URLs and scroll down to the bottom.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
After hack and remediation, thousands of URL's still appearing as 'Valid' in google search console. How to remedy?
I'm working on a site that was hacked in March 2019 and in the process, nearly 900,000 spam links were generated and indexed. After remediation of the hack in April 2019, the spammy URLs began dropping out of the index until last week, when Search Console showed around 8,000 as "Indexed, not submitted in sitemap" but listed as "Valid" in the coverage report and many of them are still hack-related URLs that are listed as being indexed in March 2019, despite the fact that clicking on them leads to a 404. As of this Saturday, the number jumped up to 18,000, but I have no way of finding out using the search console reports why the jump happened or what are the new URLs that were added, the only sort mechanism is last crawled and they don't show up there. How long can I expect it to take for these remaining urls to also be removed from the index? Is there any way to expedite the process? I've submitted a 'new' sitemap several times, which (so far) has not helped. Is there any way to see inside the new GSC view why/how the number of valid URLs in the indexed doubled over one weekend?
Intermediate & Advanced SEO | | rickyporco0 -
SEO's Structuring Your Work Week
Hi I wanted some feedback on how other SEO's structure their time. I feel as though I'm falling into the trap of fire fighting with tasks rather than working on substantial projects... I don't feel as though I'm being as effective as I could be. Here's our set up - Ecommerce site selling thousands of products - more of a generalist with 5 focus areas. 2 x product/merchandising teams - bring in new products, write content/merchandise products Web team - me (SEO), Webmaster, Ecommcerce manager Studio - Print/Email marketing/creative/photography. A lot of my time is split between working for the product teams doing KWD research, briefing them on keywords to use, checking meta. SEO Tasks - Site audits/craws, reporting Blogs - I try and do a bit as I need it so much for SEO, so I've put a content/social plan together but getting a lot of things actioned is hard... I'm trying to coordinate this across teams Inbetween all that, I don't have much time to work on things I know are crucial like a backlink/outreach plan, blog/user guide/content building etc. How do you plan your time as an SEO? Big projects? Soon I'm going to pull back from the product optimisation & try focussing on category pages, but for an Ecommerce site they are extremely difficulty to promote. Just asking for opinions and advice 🙂
Intermediate & Advanced SEO | | BeckyKey3 -
Links via Commenting on Popular website. Does it work?
Getting links from popular website using real name or brand name as comment still works? I know that anchor tags in name while commenting on other sites would directly go to spam.
Intermediate & Advanced SEO | | welcomecure0 -
Block subdomain directory in robots.txt
Instead of block an entire sub-domain (fr.sitegeek.com) with robots.txt, we like to block one directory (fr.sitegeek.com/blog).
Intermediate & Advanced SEO | | gamesecure
'fr.sitegeek.com/blog' and 'wwww.sitegeek.com/blog' contain the same articles in one language only labels are changed for 'fr' version and we suppose that duplicate content cause problem for SEO. We would like to crawl and index 'www.sitegee.com/blog' articles not 'fr.sitegeek.com/blog'. so, suggest us how to block single sub-domain directory (fr.sitegeek.com/blog) with robot.txt? This is only for blog directory of 'fr' version even all other directories or pages would be crawled and indexed for 'fr' version. Thanks,
Rajiv0 -
Huge increase in server errors and robots.txt
Hi Moz community! Wondering if someone can help? One of my clients (online fashion retailer) has been receiving huge increase in server errors (500's and 503's) over the last 6 weeks and it has got to the point where people cannot access the site because of server errors. The client has recently changed hosting companies to deal with this, and they have just told us they removed the DNS records once the name servers were changed, and they have now fixed this and are waiting for the name servers to propagate again. These errors also correlate with a huge decrease in pages blocked by robots.txt file, which makes me think someone has perhaps changed this and not told anyone... Anyone have any ideas here? It would be greatly appreciated! 🙂 I've been chasing this up with the dev agency and the hosting company for weeks, to no avail. Massive thanks in advance 🙂
Intermediate & Advanced SEO | | labelPR0 -
Whole site blocked by robots in webmaster tools
My URL is: www.wheretobuybeauty.com.auThis new site has been re-crawled over last 2 weeks, and in webmaster tools index status the following is displayed:Indexed 50,000 pagesblocked by robots 69,000Search query 'site:wheretobuybeauty.com.au' returns 55,000 pagesHowever, all pages in the site do appear to be blocked and over the 2 weeks, the google search query site traffic declined from significant to zero (proving this is in fact the case ).This is a Linux php site and has the following: 55,000 URLs in sitemap.xml submitted successfully to webmaster toolsrobots.txt file existed but did not have any entries to allow or disallow URLs - today I have removed robots.txt file completely URL re-direction within Linux .htaccess file - there are many rows within this complex set of re-directions. Developer has double checked this file and found that it is valid.I have read everything that google and other sources have on this topic and this does not help. Also checked webmaster crawl errors, crawl stats, malware and there is no problem there related to this issue.Is this a duplicate content issue - this is a price comparison site where approx half the products have duplicate product descriptions - duplicated because they are obtained from the suppliers through an XML data file. The suppliers have the descriptions from the files in their own sites.Help!!
Intermediate & Advanced SEO | | rrogers0 -
We recently fixed a Meta Refresh that was affecting our home page - But something still seems wrong. Any suggestions?
We recently fixed a meta refresh issue on our home page. Our store URL: http://www.ccisolutions.com had a meta refresh on it that was going to: www.ccisolutions.com/StoreFront/IAFDispatcher?iafAction=showMain The meta refresh is now gone, however there still seem to be some problems: Our IT Director has not been successful in trying to make www.ccisolutions.com/StoreFront/IAFDispatcher?iafAction=showMain 301 redirect to http://www.ccisolutions.com - so I believe we now have a duplicate content issue If you look at both of these URLs in OSE, you will see that www.ccisolutions.com/StoreFront/IAFDispatcher?iafAction=showMain is getting credit for almost all of the Internal Followed Links, while http://www.ccisolutions.com is getting all the credit for External Followed links. Why doesn't http://www.ccisolutions.com show the same number of Internal Followed Links? I realize this is more of a developer/webmaster question and would be very appreciative of any suggestions or advice. Thanks!
Intermediate & Advanced SEO | | danatanseo0 -
Page HTML great for humans, but seems to be very bad for bots?
We recently switched platforms and use Joomla for our website. Our product page underwent a huge transformation and it seems to be user friendly for a human, but when you look at one of our product pages in SEOBrowser it seems that we are doing a horrible job optimizing the page and our html almost makes us look spammy. Here is an example or a product page on our site: http://urbanitystudios.com/custom-invitations-and-announcements/shop-by-event/cocktail/beer-mug And, if you take a look in something like SEObrowser, it makes us look not so good. For example, all of our footer and header links show up. Our color picker is a bunch of pngs (over 60 to be exact), our tabs are the same (except for product description and reviews) on every single product page... In thinking about the bots: 1-How do we handle all of the links from footer, header and the same content in the tabs 2-How do we signal to them that all that is important on the page is the description of the product? 3-We installed schema for price and product image, etc but can we take it further? 4-How do we handle the "attribute" section (i.e. our color picker, our text input, etc). Any clarification I need to provide, please let me know.
Intermediate & Advanced SEO | | UrbanityStudios0