Googlebot Can't Access My Sites After I Repair My Robots File
-
Hello Mozzers,
A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file'
My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows:
User-agent: *
Disallow: /cgi-bin/
After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage.
I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period.
Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'?
Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
-
Oleg gave a great answer.
Still I would add 2 things here:
1. Go to GWMT and under "Health" do a "Fetch as Googlebot" test.
This will tell you what pages are reachable.2. I`ve saw some occasions of server-level Googlebot blockage.
If your robots.txt is fine and your page contains no "no-index" tags, and yet you still getting an error message while fetching, you should get a hold on your access logs and check it for Googlebot user-agents to see if (and when) you were last visited.This will help you pin-point the issue, when talking to your hosting provider (or 3rd party security vendor).
If unsure, you can find Googlebot information (user agent and IPs ) at Botopedia.org.
-
A great answer
-
Maybe the spacing is off when you posted it here, but blank lines can affect robots.txt files. Try code:
User-agent: *
Disallow: /cgi-bin/
#End Robots#Also, check for robot blocking meta tags on the individual pages.
You can test to see if Google can access specific pages through GWT > Health > Blocked URLs (should see your robots.txt file contents int he top text area, enter the urls to test in the 2nd text area, then press "Test" at the bottom - test results will appear at the bottom of the page)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My keywords aren't performing for my Umbraco site
A client of mine has just redesigned their site which was pretty small (homepage, about us page and contact us page) and now it includes homepage, about us page, 3 services pages, 5 blog posts and a contact us page. their domain authority is 5, so that gives you an idea of their size. We updated their key pages with keyword optimised content and added the keyword to their meta title and meta description. they're in the process of adding the alt tags and also they need to enable meta tags for the blog posts. Everything is quite in the process at the moment and their organic traffic is low. But I believe that some of the keywords should start moving places for the pages that have been optimised and they haven't. Is there any reason for this? I believe the services pages which have meta tags should have started ranking at least in very low position for the selected keywords. Is there something I'm missing? Thank you!
Intermediate & Advanced SEO | | Chris_Wright0 -
How to remove my site's pages in search results?
I have tested hundreds of pages to see if Google will properly crawl, index and cached them. Now, I want these pages to be removed in Google search except for homepage. What should be the rule in robots.txt? I use this rule, but I am not sure if Google will remove the hundreds of pages (for my testing). User-agent: *
Intermediate & Advanced SEO | | esiow2013
Disallow: /
Allow: /$0 -
301's, Mixed-Case URLs, and Site Migration Disaster
Hello Moz Community, After placing trust in a developer to build & migrate our site, the site launched 9 weeks ago and has been one disaster after another. Sadly, after 16 months of development, we are building again, this time we are leveled-up and doing it in-house with our people. I have 1 topic I need advice on, and that is 301s. Here's the deal. The newbie developer used a mixed-case version for our URL structure. So what should have been /example-url became /Example-Url on all URLs. Awesome right? It was a duplicate content nightmare upon launch (among other things). We are re-building now. My question is this, do we bite the bullet for all URLs and 301 them to a proper lower-case URL structure? We've already lost a lot of link equity from 301ing the site the first time around. We were a PR 4 for the last 5 years on our homepage, now we are a PR 3. That is a substantial loss. For our primary keywords, we were on the first page for the big ones, for the last decade. Now, we are just barely cleaving to the second page, and many are 3rd page. I am afraid if we 301 all the URLs again, a 15% reduction in link equity per page is really going to hurt us, again. However, keeping the mixed-case URL structure is also a whammy. Building a brand new site, again, it seems like we should do it correctly and right all the previous wrongs. But on the other hand, another PR demotion and we'll be in line at the soup kitchen. What would you do?
Intermediate & Advanced SEO | | yogitrout10 -
Can we retrieve all 404 pages of my site?
Hi, Can we retrieve all 404 pages of my site? is there any syntax i can use in Google search to list just pages that give 404? Tool/Site that can scan all pages in Google Index and give me this report. Thanks
Intermediate & Advanced SEO | | mtthompsons0 -
Can't seem to get traffic back post Panda / Penguin. WHY?
I have done and am doing everything I can think of to bring back lost traffic after the late 2012 updates from google hit us. I just is not working. We had some issues with our out of house web developers which screwed up our site in 2012 and after taking it in house we have Eden doing damage control form months now. We think we have fixed pretty much everything. URL structure filling up with good unique content(under way. Lots still to do) making better category descriptions redesigned homepage. Updated product pages (CMS is holding things back on that part otherwise they would be better. New CMS under construction) started more link building(its a real weak spot on our SEO as far as I can see) audited bad links from dodgy irelavent sites. hired writers to create content and link bait articles. Begun making high quality video's for both YouTube (brand awareness and viral) and on site hosting (link building and conversions) (in the pipeline not online yet). Flattened out site architecture. optimise internal link flow (got this wrong by using nofollows. In the process of thinking of a better way by reducing nun wanted Nav links on page.) i realise its not all done but I have been working ever since the drop in traffic and I'm just seeing no increase at all. I have been asking a few questions on here for the past few days but still can't put my finger on the issue. Am I just impatient and need to wait on the traffic as I am doing all the correct things? Or have I missed something and need to fix it. you anyone would like to have a quick look at my site and see if there is an obvious issue I have missed It would be great as I have been tearing my hair out trying to find the issues with my site. It's www.centralsaddlery.co.uk Criticism would me much appreciated.
Intermediate & Advanced SEO | | mark_baird0 -
Effect duration of robots.txt file.
in my web site there is demo site in that also, index in Google but no need it now.so i have created robots file and upload to server yesterday.in the demo folder there are some html files,and i wanna remove all these in demo file from Google.but still in web master tools it showing User-agent: *
Intermediate & Advanced SEO | | innofidelity
Disallow: /demo/ How long this will take to remove from Google ? And are there any alternative way doing that ?0 -
Is there some tool or person that can review my site and tell me what to improve on?
I'm using the SEO on page tool from this site but I get no help on how to fix the problem. I'm looking for the all in 1 tool that tells me everything about my site (description, keyword density, what to improve, what to add, what to remove, etc). If someone would be kind enough to review it for me, that would be great !
Intermediate & Advanced SEO | | 6786486312640 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0