Manipulate Googlebot
-
**Problem: I have found something wierd on the server log as below. the googlebot visit the folders and files which do not exist at all. there is no photo folder on the server, but googlebot visit the files inside the photo folder and return 404 error. **
I wonder if it is SEO hacking attempts, and how can someone manage to Manipulate Googlebot.
==================================================
**66.249.71.200 - - [22/Aug/2012:02:31:53 -0400] "GET /robots.txt HTTP/1.0" 200 2255 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
**66.249.71.25 - - [22/Aug/2012:02:36:55 -0400] "GET /photo/pic24.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:03 -0400] "GET /photo/pic20.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:11 -0400] "GET /photo/pic22.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:28 -0400] "GET /photo/pic19.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:36 -0400] "GET /photo/pic17.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:44 -0400] "GET /photo/pic21.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
-
Hi
This is a valid concert.
As Mat correctly stated, Googlebot is not easily manipulated.
Having said that, Googlebot impersonation is a sad fact.Recently we released a Fake Googlebot study in which we've found out that 21% of all Googlebot visits are made by different impersonators - fairly "innocent" SEO tools used for competition check-ups, various spammer and even malicious scanner that will use Googlebot user-agent to try and slip in between the cracks and lay a path for a more serious attack to come (DDoS, IRA and etc).
To identify your visitor can use Botopedia's "IP check tool" - it will cross-verify the IP and help reveal most fake bots.
(I`ve already searched for 66.249.71.25 and it's legit - see attached image)Still, IPs can be spoofed.
So, if in doubt, I would promote a "better safe than sorry" approach and advise you to look into free bad bot protection services (there are several good ones).GL
-
If anyone did manage to get control of googlebot they could find better uses to put it to than that.
Much more likely is that there are links somewhere to those URLs - they may well be on someone else's site. Google is following the link to see what it there, then finding nothing. However it works on a file by file basis rather than by directory so it could happen quite a bit.
If you want to stop it clogging up your error logs (and ensure that googlebot cycles are spent indexing better stuff) just block that directory in your robots.txt file.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Googlebot being redirected but not users?
Hi, We seem to have a slightly odd issue. We noticed that a number of our location category pages were slipping off 1 page, and onto page 2 in our niche. On inspection, we noticed that our Arizona page had started ranking in place of a number of other location pages - Cali, Idaho, NJ etc. Weirdly, the pages they had replaced were no longer indexed, and would remain so, despite being fetched, tweeted etc. One test was to see when the dropped out pages had been last crawled, or at least cached. When conducting the 'cache:domain.com/category/location' on these pages, we were getting 301 redirected to, you guessed it, the Arizona page. Very odd. However, the dropped out pages were serving 200 OK when run through header checker tools, screaming frog etc. On the face of it, it would seem Googlebot is getting redirected when it is hitting a number of our key location pages, but users are not. Has anyone experienced anything like this? The theming of the pages are quite different in terms of content, meta etc. Thanks.
Intermediate & Advanced SEO | | Sayers0 -
How does Googlebot evaluate performance/page speed on Isomorphic/Single Page Applications?
I'm curious how Google evaluates pagespeed for SPAs. Initial payloads are inherently large (resulting in 5+ second load times), but subsequent requests are lightning fast, as these requests are handled by JS fetching data from the backend. Does Google evaluate pages on a URL-by-URL basis, looking at the initial payload (and "slow"-ish load time) for each? Or do they load the initial JS+HTML and then continue to crawl from there? Another way of putting it: is Googlebot essentially "refreshing" for each page and therefore associating each URL with a higher load time? Or will pages that are crawled after the initial payload benefit from the speedier load time? Any insight (or speculation) would be much appreciated.
Intermediate & Advanced SEO | | mothner1 -
Do CTR manipulation services actually work to improve rankings?
I've seen a variety of services on the fringe of the SEO world that send a flow of (fake) traffic to your website via Google, to drive up your SERP CTR and site engagement. Seems gray hat, but I'm curious as to whether it actually works. The latest data I've seen from trustworthy sources (example and example 2) seems mixed on whether CTR has a direct impact on search rankings. Google claims it doesn't. I think it's possible it directly impacts rankings, or its possible Google is using some other metric to reward high engagement pages and CTR correlates with that. Any insight on whether CTR manipulation services actually work?
Intermediate & Advanced SEO | | AdamThompson1 -
After Receiving a "Googlebot can't access your site" would this stop your site from being crawled?
Hi Everyone,
Intermediate & Advanced SEO | | AMA-DataSet
A few weeks ago now I received a "Googlebot can't access your site..... connection failure rate is 7.8%" message from the webmaster tools, I have since fixed the majority of these issues but iv noticed that all page except the main home page now have a page rank of N/A while the home page has a page rank of 5 still. Has this connectivity issues reduced the page ranks to N/A? or is it something else I'm missing? Thanks in advance.0 -
Re-Direct Users But Don't Affect Googlebot
This is a fairly technical question... I have a site which has 4 subdomains, all targeting a specific language. The brand owners don't want German users to see the prices on the French sub domain and are forcing users into a re-direct to the relevant subddomain, based on their IP address. If a user comes from a different country, (ie the US) they are forced on the UK sub domain. The client is insistent on keeping control of who sees what (I know that's a debate in it's own right), but these re-directs we're implementing to make that happen, are really making it difficult to get all the subdomains indexed as I think googlebot is also getting re-directed and is failing to do it's job. Is there are a way of re-directing users, but not Googlebot?
Intermediate & Advanced SEO | | eventurerob0 -
Googlebot Can't Access My Sites After I Repair My Robots File
Hello Mozzers, A colleague and I have been collectively managing about 12 brands for the past several months and we have recently received a number of messages in the sites' webmaster tools instructing us that 'Googlebot was not able to access our site due to some errors with our robots.txt file' My colleague and I, in turn, created new robots.txt files with the intention of preventing the spider from crawling our 'cgi-bin' directory as follows: User-agent: * Disallow: /cgi-bin/ After creating the robots and manually re-submitting it in Webmaster Tools (and receiving the green checkbox), I received the same message about Googlebot not being able to access the site, only difference being that this time it was for a different site that I manage. I repeated the process and everything, aesthetically looked correct, however, I continued receiving these messages for each of the other sites I manage on a daily-basis for roughly a 10-day period. Do any of you know why I may be receiving this error? is it not possible for me to block the Googlebot from crawling the 'cgi-bin'? Any and all advice/insight is very much welcome, I hope I'm being descriptive enough!
Intermediate & Advanced SEO | | NiallSmith1 -
Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
Hi, Any assistance would be greatly appreciated. Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site". This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc. Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years. We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what . what is the best solution .? A couple of example of our problems are here : In Google type - site:bestathire.co.uk inurl:"br" You will see 107 results. This is one of many lot we need to get rid of. Also - site:bestathire.co.uk intitle:"All items from this hire company" Shows 25,300 indexed pages we need to get rid of Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical. As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ? Any advice on both points greatly appreciated? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0 -
How to find what Googlebot actually sees on a page?
1. When I disable java-script in Firefox and load our home page, it is missing entire middle section. 2. Also, the global nav dropdown menu does not display at all. (with java-script disabled) I believe this is not good. 3. But when type in <website name="">in Google search and click on the cached version of home page > and then click on text only version, It displays the Global nav links fine.</website> 4. When I switch the user agent to Googlebot(using Firefox plugin "User Agent Swticher)), the home page and global nav displays fine. Should I be worried about#1 and #2 then? How to find what Googlebot actually sees on a page? (I have tried "Fetch as Googlebot" from GWT. It displays source code.) Thanks for the help! Supriya.
Intermediate & Advanced SEO | | Amjath0