Googlebot Can't Access My Sites After I Repair My Robots File
-
Hello Mozzers,
A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file'
My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows:
User-agent: *
Disallow: /cgi-bin/
After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage.
I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period.
Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'?
Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
-
Oleg gave a great answer.
Still I would add 2 things here:
1. Go to GWMT and under "Health" do a "Fetch as Googlebot" test.
This will tell you what pages are reachable.2. I`ve saw some occasions of server-level Googlebot blockage.
If your robots.txt is fine and your page contains no "no-index" tags, and yet you still getting an error message while fetching, you should get a hold on your access logs and check it for Googlebot user-agents to see if (and when) you were last visited.This will help you pin-point the issue, when talking to your hosting provider (or 3rd party security vendor).
If unsure, you can find Googlebot information (user agent and IPs ) at Botopedia.org.
-
A great answer
-
Maybe the spacing is off when you posted it here, but blank lines can affect robots.txt files. Try code:
User-agent: *
Disallow: /cgi-bin/
#End Robots#Also, check for robot blocking meta tags on the individual pages.
You can test to see if Google can access specific pages through GWT > Health > Blocked URLs (should see your robots.txt file contents int he top text area, enter the urls to test in the 2nd text area, then press "Test" at the bottom - test results will appear at the bottom of the page)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can’t put a finger on, what is causing 12 year domain, SEO optimized and decent link profile to rank lower than other less superior domains.
Can’t put a finger on, what is causing 12 year domain, SEO optimized and decent link profile to rank lower than other less superior domains. I have dissected the site and link, content, etc profile using ahrefs tools, still no luck, and unfortunately they do not have a community to ask anyone opinion. Hoping someone on Moz will be able to provide me with a secondary opinion or something I obviously missing here. Looking for any constructive feedback/professional opinion with fresh look on what maybe the cause of our down rankings and what may be a cause of it. Any feedback is very much appreciated. Search Term: 3030 aventura condos / One of our link samples (SE Position #6): https://goo.gl/FbYj4V Competing Domains (SE Position #1): https://goo.gl/fLPKX5 Competing Domains (SE Position #2): https://goo.gl/GqXGse
Intermediate & Advanced SEO | | Im_Jake0 -
Open Site Explorer - Top Pages that don't exist / result of a hack(?)
Hi all, Last year, a website I monitor, got hacked, or infected with malware, I’m not sure which. The result that I got to see is 100’s of ‘not found’ entries in Google Search Console / Crawl Errors for non-existent pages relating to / variations of ‘Canada Goose’. And also, there's a couple of such links showing up in SERPs. Here’s an example of the page URLs: ourdomain.com/canadagoose.php ourdomain.com/replicacanadagoose.php I looked for advice on the webmaster forums, and was recommended to just keep marking them as ‘fixed’ in the console. Sooner or later they’ll disappear. Still, a year after, they appear. I’ve just signed up for a Moz trail and, in Open Site Explorer->Top Pages, the top 2-5 pages are relating to these non-existent pages: URLs that are the result of this ‘canada goose’ spam attack. The non-existent pages each have around 10 Linking Root Domains, with around 50 Inbound Links. My question is: Is there a more direct action I should take here? For example, informing Google of the offending domains with these backlinks. Any thoughts appreciated! Many thanks
Intermediate & Advanced SEO | | macthing1 -
Linking to one of my own sites, from my site
Hi experts, I own a site for castingjobs (Site1) and a site for selling paintings (Site2). In a long time, I've had a link at the bottom of Site1, linking to Site 2. (Basicaly: Partnerlink: Link site 2). Site1 is for me the the only important site, since it's where Im making my monthly revenue. I added the link like 5 years ago or so, to try to boost site 2. My question is:
Intermediate & Advanced SEO | | KasperGJ
1. Is it somehow bad for SEO for site 1, since the two sites have nothing to do with each other, they are basically just owned by me.
2. Would it make sense to link from Site 2 to Site 1 indstead?0 -
When Mobile and Desktop sites have the same page URLs, how should I handle the 'View Desktop Site' link on a mobile site to ensure a smooth crawl?
We're about to roll out a mobile site. The mobile and desktop URLs are the same. User Agent determines whether you see the desktop or mobile version of the site. At the bottom of the page is a 'View Desktop Site' link that will present the desktop version of the site to mobile user agents when clicked. I'm concerned that when the mobile crawler crawls our site it will crawl both our entire mobile site, then click 'View Desktop Site' and crawl our entire desktop site as well. Since mobile and desktop URLs are the same, the mobile crawler will end up crawling both mobile and desktop versions of each URL. Any tips on what we can do to make sure the mobile crawler either doesn't access the desktop site, or that we can let it know what is the mobile version of the page? We could simply not show the 'View Desktop Site' to the mobile crawler, but I'm interested to hear if others have encountered this issue and have any other recommended ways for handling it. Thanks!
Intermediate & Advanced SEO | | merch_zzounds0 -
Any reasons why social media properties are ranking higher than the site's own name?
The site below has social media properties and other sites coming up before it's own listing even for the exact search of the site name. Any ideas why this is happening? Link Any input is appreciated.
Intermediate & Advanced SEO | | SEO5Team0 -
Do you add 404 page into robot file or just add no index tag?
Hi, got different opinion on this so i wanted to double check with your comment is. We've got /404.html page and I was wondering if you would add this page to robot text so it wouldn't be indexed or would you just add no index tag? What would be the best approach? Thanks!
Intermediate & Advanced SEO | | Rubix0 -
What if you can't navigate naturally to your canonicalized URL?
Assume this situation for a second... Let's say you place a rel= canonical tag on a page and point to the original/authentic URL. Now, let's say that that original/authentic URL is also populated into your XML sitemap... So, here's my question... Since you can't actually navigate to that original/authentic URL (it still loads with a 200, it's just not actually linkded to from within the site itself), does that create an issue for search engines? Last consideration... The bots can still access those pages via the canonical tag and the XML sitemap, it's just that the user wouldn't be able to access those original/authentic pages in their natural site navigation. Thanks, Rodrigo
Intermediate & Advanced SEO | | AlgoFreaks0 -
On-Site Optimization Tips for Job site?
I am working on a job site that only ranks well for the homepage with very low ranking internal pages. My job pages do not rank what so ever and are database driven and often times turn to 404 pages after the job has been filled. The job pages have to no content either. Anybody have any technical on-site recommendations for a job site I am working on especially regarding my internal pages? (Cross Country Allied.com)
Intermediate & Advanced SEO | | Melia0