Manipulate Googlebot

semer

**Problem: I have found something wierd on the server log as below. the googlebot visit the folders and files which do not exist at all. there is no photo folder on the server, but googlebot visit the files inside the photo folder and return 404 error. **

I wonder if it is SEO hacking attempts, and how can someone manage to Manipulate Googlebot.

==================================================

**66.249.71.200 - - [22/Aug/2012:02:31:53 -0400] "GET /robots.txt HTTP/1.0" 200 2255 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **

**66.249.71.25 - - [22/Aug/2012:02:36:55 -0400] "GET /photo/pic24.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:03 -0400] "GET /photo/pic20.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:11 -0400] "GET /photo/pic22.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:28 -0400] "GET /photo/pic19.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:36 -0400] "GET /photo/pic17.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:44 -0400] "GET /photo/pic21.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **

Igal_Zeifman

Hi

This is a valid concert.

As Mat correctly stated, Googlebot is not easily manipulated.
Having said that, Googlebot impersonation is a sad fact.

Recently we released a Fake Googlebot study in which we've found out that 21% of all Googlebot visits are made by different impersonators - fairly "innocent" SEO tools used for competition check-ups, various spammer and even malicious scanner that will use Googlebot user-agent to try and slip in between the cracks and lay a path for a more serious attack to come (DDoS, IRA and etc).

To identify your visitor can use Botopedia's "IP check tool" - it will cross-verify the IP and help reveal most fake bots.
(I`ve already searched for 66.249.71.25 and it's legit - see attached image)

Still, IPs can be spoofed.
So, if in doubt, I would promote a "better safe than sorry" approach and advise you to look into free bad bot protection services (there are several good ones).

GL

TBHPK

matbennett

If anyone did manage to get control of googlebot they could find better uses to put it to than that.

Much more likely is that there are links somewhere to those URLs - they may well be on someone else's site. Google is following the link to see what it there, then finding nothing. However it works on a file by file basis rather than by directory so it could happen quite a bit.

If you want to stop it clogging up your error logs (and ensure that googlebot cycles are spent indexing better stuff) just block that directory in your robots.txt file.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Manipulate Googlebot

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl and Indexation Error - Googlebot can't/doesn't access specific folders on microsites

How do I know if I am correctly solving an uppercase url issue that may be affecting Googlebot?

Https Homepage Redirect & Issue with Googlebot Access

Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.

Best way to view Global Navigation bar from GoogleBot's perspective

Googlebot on paywall made with cookies and local storage

Googlebot found an extremely high number of URLs on your site

How to find what Googlebot actually sees on a page?