What can I do if Google Webmaster Tools doesn't recognize the robots.txt file?
-
I'm working on a recently hacked site for a client and and in trying to identify how exactly the hack is running I need to use the fetch as Google bot feature in GWT.
I'd love to use this but it thinks the robots.txt is blocking it's acces but the only thing in the robots.txt file is a link to the sitemap.
Unde the Blocked URLs section of the GWT it shows that the robots.txt was last downloaded yesterday but it's incorrect information. Is there a way to force Google to look again?
-
No, but they might write to it, modify it, or do all sorts of other nasty stuff I've seen hackers do when they get a hold of any writeable file on a system.
-
lol it's a robots text file. what are they going to do. Steal it? I should have clarified do a 777 to make sure that is not your problem, then yes change the permission to be tighter
-
Eesh I don't recommend 777. 644 or, if you're going to change it right back, 755 at most.
-
File permission maybe? Change it to 777 and try it again
-
If you have shell access on Linux you can use wget or GET or run lynx.
If google is getting the wrong robots file then your web server must be sending out something other than what you think is the robots file.
What happens if you do this in your browser:
-
Looking in my log files, Google hits robots.txt just about every time it crawls our site.
What are you trying to accomplish using fetch as Googlebot? Any chance CURL could do the job for you, or another tool that ignores robots.txt?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No descripton on Google/Yahoo/Bing, updated robots.txt - what is the turnaround time or next step for visible results?
Hello, New to the MOZ community and thrilled to be learning alongside all of you! One of our clients' sites is currently showing a 'blocked' meta description due to an old robots.txt file (eg: A description for this result is not available because of this site's robots.txt) We have updated the site's robots.txt to allow all bots. The meta tag has also been updated in WordPress (via the SEO Yoast plugin) See image here of Google listing and site URL: http://imgur.com/46wajJw I have also ensured that the most recent robots.txt has been submitted via Google Webmaster Tools. When can we expect these results to update? Is there a step I may have overlooked? Thank you,
Technical SEO | | adamhdrb
Adam 46wajJw0 -
Webmaster Tools stopped updating my sites
On Feb 6th WMT stopped updating the stats for my two main sites. I have seen other people complain about this but it seems like it's not known what causes it or if it's by design. Does anyone here have any insight on this issue? Thanks, Greg Davis AntiqueBanknotes
Technical SEO | | Banknotes0 -
Google Disavow Tool
Some background: My rankings have been wildly fluctuating for the past few months for no apparent reason. When I inquired about this, many people said that even though I haven't received any penalty notice, I was probably affected by penguin. (http://moz.com/community/q/ranking-fluctuations) I recently did a link detox by LinkRemovalTools and it gave me a list of all my links, 2% were toxic and 51% were suspiscious. Should I simply disavow the 2%? There are many sites where is no contact info.
Technical SEO | | EcomLkwd0 -
Webmaster tools doesn't pick up 301 redirect
I had a few hundred URLs that died on my site. Google Webmaster Tools notified me about the increase in 404 errors. I fixed all of them by 301 redirecting them to the most relevant page and did multiple header checks to ensure that the 301 has been implemented correctly. Now a few weeks later, Google is giving me the exact same message in Google Webmaster Tools but they are all still 301 redirected. WTF?
Technical SEO | | DROIDSTERS0 -
Difference between SEOMOZ and Webmaster Tools information
Hello, There is an issue that confuses me and I thought perhaps you will be able to help me shed some light on it. I have a website which shows 2,549 crawled pages on SEOMOZ and 24,542 pages on webmaster tools! Obviously there is some technical issue with the site, but my question is: why the vast difference between what the SEOMOZ crawl report and webmaster tools report show? Thanks! Guy Cizner
Technical SEO | | ciznerguy0 -
How can I tell Google, that a page has not changed?
Hello, we have a website with many thousands of pages. Some of them change frequently, some never. Our problem is, that googlebot is generating way too much traffic. Half of our page views are generated by googlebot. We would like to tell googlebot, to stop crawling pages that never change. This one for instance: http://www.prinz.de/party/partybilder/bilder-party-pics,412598,9545978-1,VnPartypics.html As you can see, there is almost no content on the page and the picture will never change.So I am wondering, if it makes sense to tell google that there is no need to come back. The following header fields might be relevant. Currently our webserver answers with the following headers: Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0, public
Technical SEO | | bimp
Pragma: no-cache
Expires: Thu, 19 Nov 1981 08:52:00 GMT Does Google honor these fields? Should we remove no-cache, must-revalidate, pragma: no-cache and set expires e.g. to 30 days in the future? I also read, that a webpage that has not changed, should answer with 304 instead of 200. Does it make sense to implement that? Unfortunatly that would be quite hard for us. Maybe Google would also spend more time then on pages that actually changed, instead of wasting it on unchanged pages. Do you have any other suggestions, how we can reduce the traffic of google bot on unrelevant pages? Thanks for your help Cord0 -
Robots.txt question
Hello, What does the following command mean - User-agent: * Allow: / Does it mean that we are blocking all spiders ? Is Allow supported in robots.txt ? Thanks
Technical SEO | | seoug_20050 -
Robots.txt and robots meta
I have an odd situation. I have a CMS that has a global robots.txt which has the generic User-Agent: *
Technical SEO | | Highland
Allow: / I also have one CMS site that needs to not be indexed ever. I've read in various pages (like http://www.jesterwebster.com/robots-txt-vs-meta-tag-which-has-precedence/22 ) that robots.txt always wins over meta, but I have also read that robots.txt indicates spiderability whereas meta can control indexation. I just want the site to not be indexed. Can I leave the robots.txt as is and still put NOINDEX in the robots meta?0