Manipulate Googlebot
-
**Problem: I have found something wierd on the server log as below. the googlebot visit the folders and files which do not exist at all. there is no photo folder on the server, but googlebot visit the files inside the photo folder and return 404 error. **
I wonder if it is SEO hacking attempts, and how can someone manage to Manipulate Googlebot.
==================================================
**66.249.71.200 - - [22/Aug/2012:02:31:53 -0400] "GET /robots.txt HTTP/1.0" 200 2255 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
**66.249.71.25 - - [22/Aug/2012:02:36:55 -0400] "GET /photo/pic24.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:03 -0400] "GET /photo/pic20.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:11 -0400] "GET /photo/pic22.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:28 -0400] "GET /photo/pic19.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:36 -0400] "GET /photo/pic17.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:44 -0400] "GET /photo/pic21.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
-
Hi
This is a valid concert.
As Mat correctly stated, Googlebot is not easily manipulated.
Having said that, Googlebot impersonation is a sad fact.Recently we released a Fake Googlebot study in which we've found out that 21% of all Googlebot visits are made by different impersonators - fairly "innocent" SEO tools used for competition check-ups, various spammer and even malicious scanner that will use Googlebot user-agent to try and slip in between the cracks and lay a path for a more serious attack to come (DDoS, IRA and etc).
To identify your visitor can use Botopedia's "IP check tool" - it will cross-verify the IP and help reveal most fake bots.
(I`ve already searched for 66.249.71.25 and it's legit - see attached image)Still, IPs can be spoofed.
So, if in doubt, I would promote a "better safe than sorry" approach and advise you to look into free bad bot protection services (there are several good ones).GL
-
If anyone did manage to get control of googlebot they could find better uses to put it to than that.
Much more likely is that there are links somewhere to those URLs - they may well be on someone else's site. Google is following the link to see what it there, then finding nothing. However it works on a file by file basis rather than by directory so it could happen quite a bit.
If you want to stop it clogging up your error logs (and ensure that googlebot cycles are spent indexing better stuff) just block that directory in your robots.txt file.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Block Googlebot from submit button
Hi, I have a website where many searches are made by the googlebot on our internal engine. We can make noindex on result page, but we want to stop the bot to call the ajax search button - GET form (because it pass a request to an external API with associate fees). So, we want to stop crawling the form button, without noindex the search page itself. The "nofollow" tag don't seems to apply on button's submit. Any suggestion?
Intermediate & Advanced SEO | | Olivier_Lambert0 -
Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.
An example URL can be found here: http://symptom.healthline.com/symptomsearch?addterm=Neck%20pain&addterm=Face&addterm=Fatigue&addterm=Shortness%20Of%20Breath A couple of questions: Why is Google reporting an issue with these URLs if they are marked as noindex? What is the best way to fix the issue? Thanks in advance.
Intermediate & Advanced SEO | | nicole.healthline0 -
Googlebot Can't Access My Sites After I Repair My Robots File
Hello Mozzers, A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file' My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows: User-agent: * Disallow: /cgi-bin/ After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage. I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period. Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'? Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
Intermediate & Advanced SEO | | NiallSmith1 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0 -
Why specify robots instead of googlebot for a Panda affected site?
Daniweb is the poster child for sites that have recovered from Panda. I know one strategy she mentioned was de-indexing all of her tagged content, fo rexample: http://www.daniweb.com/tags/database Why do you think more Panda affected sites specifying 'googlebot' rather than 'robots' to capture traffic from Bing & Yahoo?
Intermediate & Advanced SEO | | nicole.healthline0 -
How to find what Googlebot actually sees on a page?
1. When I disable java-script in Firefox and load our home page, it is missing entire middle section. 2. Also, the global nav dropdown menu does not display at all. (with java-script disabled) I believe this is not good. 3. But when type in <website name="">in Google search and click on the cached version of home page > and then click on text only version, It displays the Global nav links fine.</website> 4. When I switch the user agent to Googlebot(using Firefox plugin "User Agent Swticher)), the home page and global nav displays fine. Should I be worried about#1 and #2 then? How to find what Googlebot actually sees on a page? (I have tried "Fetch as Googlebot" from GWT. It displays source code.) Thanks for the help! Supriya.
Intermediate & Advanced SEO | | Amjath0 -
Googlebot crawling partial URLs
Hi guys, I've checked my email this morning and I've got a number of 404 errors over the weekend where Google has tried to crawl some of my existing pages but not found the full URL. Instead of hitting 'domain.com/folder/complete-pagename.php' it's hit 'domain.com/folder/comp'. This is definitely Googlebot/2.1; http://www.google.com/bot.html (66.249.72.53) but I can't find where it would have found only the partial URL. It certainly wasn't on the domain it's crawling and I can't find any links from external sites pointing to us with the incorrect URL. GoogleBot is doing the same thing across a single domain but in different sub-folders. Having checked Webmaster Tools there aren't any hard 404s and the soft ones aren't related and haven't occured since August. I'm really confused as to how this is happening.. Thanks!
Intermediate & Advanced SEO | | panini0 -
Googlebot HTTP 204 Status Code Handling?
If a user runs a search that returns no results, and the server returns a 204 (No Content), will Googlebot treat that as the rough equivalent of a 404 or a noindex? If not, then it seems one would want to noindex the page to avoid low quality penalties, but that might require more back and forth with the server, which isn't ideal. Kurus
Intermediate & Advanced SEO | | kurus0