Invisible robots.txt?
-
So here's a weird one...
Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com"
So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK!
But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before.
Question is: can a robots.txt file be stored in a way that can't be seen?
Thanks!
-
Hi Josh
Did you ever find out how this was happening?
I've got the same issue with a wordpress site.. no robots.txt visible in FTP but it is accessible in a browser to view. -
I'm seeing the meta tag that's added for the first option:
<meta name="robots" content="index, follow" />
... but I could actually access a file at domain.com/robots.txt that had the content mentioned above. When I logged in via FTP, it wasn't there. I added an actual file there with the correct information and reloaded it to make sure it was showing the correct information.
I tested it on my local install and I'm not seeing a robots file being generated.
Very odd!
-
Yes, you probably answered your own question. In WordPress, there are two different settings under Settings > Privacy:
-
I would like my site visible to everyone, including search engines and archivers.
-
I would like to block search engines, but allow normal visitors
If option #2 was selected, WordPress doesn't create a robots.txt file for you but instead it automatically generates a tag on every single page.
I hope that helps!
-
-
Just make sure you don't set that Privacy setting in a live directory. It takes weeks/months to fully recover.
-
This is interesting. I am currently working on the robots.txt and testing it for different purposes. I also thought to do some test with wordpress websites as well so thanks for the update I’ll keep that in mind before actually testing different stuff.
Thanks!
-
I should mention that this is a WordPress site and, with that, I may have answered my own question. Perhaps WordPress generates a robots.txt dynamically when the setting is active at Settings > Privacy?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt on http vs. https
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https. I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt? Strangely, I cannot find a single ressource about this...
Technical SEO | | zeepartner0 -
How to solve the meta : A description for this result is not available because this site's robots.txt. ?
Hi, I have many URL for commercialization that redirects 301 to an actual page of my companies' site. My URL provider say that the load for those request by bots are too much, they put robots text on the redirection server ! Strange or not? Now I have a this META description on all my URL captains that redirect 301 : A description for this result is not available because this site's robots.txt. If you have the perfect solutions could you share it with me ? Thank You.
Technical SEO | | Vale70 -
Should I block robots from URLs containing query strings?
I'm about to block off all URLs that have a query string using robots.txt. They're mostly URLs with coremetrics tags and other referrer info. I figured that search engines don't need to see these as they're always better off with the original URL. Might there be any downside to this that I need to consider? Appreciate your help / experiences on this one. Thanks Jenni
Technical SEO | | ShearingsGroup0 -
Robots.txt query
Quick question, if this appears in a clients robots.txt file, what does it mean? Disallow: /*/_/ Does it mean no pages can be indexed? I have checked and there are no pages in the index but it's a new site too so not sure if this is the problem. Thanks Karen
Technical SEO | | Karen_Dauncey0 -
Robots.txt usage
Hey Guys, I am about make an important improvement to our site's robots.txt we have large number of properties on our site and we have different views for them. List, gallery and map view. By default list view shows up and user can navigate through gallery view. We donot want gallery pages to get indexed and want to save our crawl budget for more important pages. this is one example of our site: http://www.holiday-rentals.co.uk/France/r31.htm When you click on "gallery view" URL of this site will remain same in your address bar: but when you mouse over the "gallery view" tab it will show you URL with parameter "view=g". there are number of parameters: "view=g, view=l and view=m". http://www.holiday-rentals.co.uk/France/r31.htm?view=l http://www.holiday-rentals.co.uk/France/r31.htm?view=g http://www.holiday-rentals.co.uk/France/r31.htm?view=m Now my question is: I If restrict bots by adding "Disallow: ?view=" in our robots.txt will it effect the list view too? Will be very thankful if yo look into this for us. Many thanks Hassan I will test this on some other site within our network too before putting it to important one's. to measure the impact but will be waiting for your recommendations. Thanks
Technical SEO | | holidayseo0 -
Confused about robots.txt
There is a lot of conflicting and/or unclear information about robots.txt out there. Somehow, I can't make out what's the best way to use robots even after visiting the official robots website. For example I have the following format for my robots. User-agent: * Disallow: javascript.js Disallow: /images/ Disallow: /embedconfig Disallow: /playerconfig Disallow: /spotlightmedia Disallow: /EventVideos Disallow: /playEpisode Allow: / Sitemap: http://www.example.tv/sitemapindex.xml Sitemap: http://www.example.tv/sitemapindex-videos.xml Sitemap: http://www.example.tv/news-sitemap.xml Is this correct and/or recommended? If so, then how come I see a list of over 200 or so links blocked by robots when Im checking out Google Webmaster Tools! Help someone, anyone! Can't seem to understand this robotic business! Regards,
Technical SEO | | Netpace0 -
Blocking other engines in robots.txt
If your primary target of business is not in China is their any benefit to blocking Chinese search robots in robots.txt?
Technical SEO | | Romancing0 -
Using robots.txt to deal with duplicate content
I have 2 sites with duplicate content issues. One is a wordpress blog. The other is a store (Pinnacle Cart). I cannot edit the canonical tag on either site. In this case, should I use robots.txt to eliminate the duplicate content?
Technical SEO | | bhsiao0