Manipulate Googlebot
-
**Problem: I have found something wierd on the server log as below. the googlebot visit the folders and files which do not exist at all. there is no photo folder on the server, but googlebot visit the files inside the photo folder and return 404 error. **
I wonder if it is SEO hacking attempts, and how can someone manage to Manipulate Googlebot.
==================================================
**66.249.71.200 - - [22/Aug/2012:02:31:53 -0400] "GET /robots.txt HTTP/1.0" 200 2255 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
**66.249.71.25 - - [22/Aug/2012:02:36:55 -0400] "GET /photo/pic24.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:03 -0400] "GET /photo/pic20.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:11 -0400] "GET /photo/pic22.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:28 -0400] "GET /photo/pic19.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:36 -0400] "GET /photo/pic17.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:44 -0400] "GET /photo/pic21.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **
-
Hi
This is a valid concert.
As Mat correctly stated, Googlebot is not easily manipulated.
Having said that, Googlebot impersonation is a sad fact.Recently we released a Fake Googlebot study in which we've found out that 21% of all Googlebot visits are made by different impersonators - fairly "innocent" SEO tools used for competition check-ups, various spammer and even malicious scanner that will use Googlebot user-agent to try and slip in between the cracks and lay a path for a more serious attack to come (DDoS, IRA and etc).
To identify your visitor can use Botopedia's "IP check tool" - it will cross-verify the IP and help reveal most fake bots.
(I`ve already searched for 66.249.71.25 and it's legit - see attached image)Still, IPs can be spoofed.
So, if in doubt, I would promote a "better safe than sorry" approach and advise you to look into free bad bot protection services (there are several good ones).GL
-
If anyone did manage to get control of googlebot they could find better uses to put it to than that.
Much more likely is that there are links somewhere to those URLs - they may well be on someone else's site. Google is following the link to see what it there, then finding nothing. However it works on a file by file basis rather than by directory so it could happen quite a bit.
If you want to stop it clogging up your error logs (and ensure that googlebot cycles are spent indexing better stuff) just block that directory in your robots.txt file.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Suggested Screaming Frog configuration to mirror default Googlebot crawl?
Hi All, Does anyone have a suggested Screaming Frog (SF) configuration to mirror default Googlebot crawl? I want to test my site and see if it will return 429 "Too Many Requests" to Google. I have set the User Agent as Googlebot (Smartphone). Is the default SF Menu > Configuration > Speed > Max Threads 5 and Max URLs 2.0 comparable to Googlebot? Context:
Intermediate & Advanced SEO | | gravymatt-se
I had tried NetPeak SEO Spider which did a nice job and had a cool feature that would pause a crawl if it got to many 429. Long Story short, B2B site threw 429 Errors when there should have been no load on a holiday weekend at 1:00 AM.0 -
What IP Address does Googlebot use to read your site when coming from an external backlink?
Hi All, I'm trying to find more information on what IP address Googlebot would use when arriving to crawl your site from an external backlink. I'm under the impression Googlebot uses international signals to determine the best IP address to use when crawling (US / non-US) and then carries on with that IP when it arrives to your website? E.g. - Googlebot finds www.example.co.uk. Due to the ccTLD, it decides to crawl the site with a UK IP address rather than a US one. As it crawls this UK site, it finds a subdirectory backlink to your website and continues to crawl your website with the aforementioned UK IP address. Is this a correct assumption, or does Googlebot look at altering the IP address as it enters a backlink / new domain? Also, are ccTLDs the main signals to determine the possibility of Google switching to an international IP address to crawl, rather than the standard US one? Am I right in saying that hreflang tags don't apply here at all, as their purpose is to be used in SERPS and helping Google to determine which page to serve to users based on their IP etc. If anyone has any insight this would be great.
Intermediate & Advanced SEO | | MattBassos0 -
Do CTR manipulation services actually work to improve rankings?
I've seen a variety of services on the fringe of the SEO world that send a flow of (fake) traffic to your website via Google, to drive up your SERP CTR and site engagement. Seems gray hat, but I'm curious as to whether it actually works. The latest data I've seen from trustworthy sources (example and example 2) seems mixed on whether CTR has a direct impact on search rankings. Google claims it doesn't. I think it's possible it directly impacts rankings, or its possible Google is using some other metric to reward high engagement pages and CTR correlates with that. Any insight on whether CTR manipulation services actually work?
Intermediate & Advanced SEO | | AdamThompson1 -
Mobile Googlebot vs Desktop Googlebot - GWT reports - Crawl errors
Hi Everyone, I have a very specific SEO question. I am doing a site audit and one of the crawl reports is showing tons of 404's for the "smartphone" bot and with very recent crawl dates. If our website is responsive, and we do not have a mobile version of the website I do not understand why the desktop report version has tons of 404's and yet the smartphone does not. I think I am not understanding something conceptually. I think it has something to do with this little message in the Mobile crawl report. "Errors that occurred only when your site was crawled by Googlebot (errors didn't appear for desktop)." If I understand correctly, the "smartphone" report will only show URL's that are not on the desktop report. Is this correct?
Intermediate & Advanced SEO | | Carla_Dawson0 -
Why would our server return a 301 status code when Googlebot visits from one IP, but a 200 from a different IP?
I have begun a daily process of analyzing a site's Web server log files and have noticed something that seems odd. There are several IP addresses from which Googlebot crawls that our server returns a 301 status code for every request, consistently, day after day. In nearly all cases, these are not URLs that should 301. When Googlebot visits from other IP addresses, the exact same pages are returned with a 200 status code. Is this normal? If so, why? If not, why not? I am concerned that our server returning an inaccurate status code is interfering with the site being effectively crawled as quickly and as often as it might be if this weren't happening. Thanks guys!
Intermediate & Advanced SEO | | danatanseo0 -
Https Homepage Redirect & Issue with Googlebot Access
Hi All, I have a question about Google correctly accessing a site that has a 301 redirect to https on the homepage. Here’s an overview of the situation and I’d really appreciate any insight from the community on what the issue might be: Background Info:
Intermediate & Advanced SEO | | G.Anderson
My homepage is set up as a 301 redirect to a https version of the homepage (some users log in so we need the SSL). Only 2 pages on the site are under SSL and the rest of the site is http. We switched to the SSL in July but have not seen any change in our rankings despite efforts increasing backlinks and out put of content. Even though Google has indexed the SSL page of the site, it appears that it is not linking up the SSL page with the rest of the site in its search and tracking. Why do we think this is the case? The Diagnosis: 1) When we do a Google Fetch on our http homepage, it appears that Google is only reading the 301 redirect instructions (as shown below) and is not finding its way over to the SSL page which has all the correct Page Title and meta information. <code>HTTP/1.1 301 Moved Permanently Date: Fri, 08 Nov 2013 17:26:24 GMT Server: Apache/2.2.16 (Debian) Location: https://mysite.com/ Vary: Accept-Encoding Content-Encoding: gzip Content-Length: 242 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 <title>301 Moved Permanently</title> # Moved Permanently The document has moved [here](https://mysite.com/). * * * <address>Apache/2.2.16 (Debian) Server at mysite.com</address></code> 2) When we view a list of external backlinks to our homepage, it appears that the backlinks that have been built after we switched to the SSL homepage have been separated from the backlinks built before the SSL. Even on Open Site, we are only seeing the backlinks that were achieved before we switched to the SSL and not getting to track any backlinks that have been added after the SSL switch. This leads up to believe that the new links are not adding any value to our search rankings. 3) When viewing Google Webmaster, we are receiving no information about our homepage, only all the non-https pages. I added a https account to Google Webmaster and in that version we ONLY receive the information about our homepage (and the other ssl page on the site) What Is The Problem? My concern is that we need to do something specific with our sitemap or with the 301 redirect itself in order for Google to read the whole site as one entity and receive the reporting/backlinks as one site. Again, google is indexing all of our pages but it seems to be doing so in a disjointed way that is breaking down link juice and value being built up by our SSL homepage. Can anybody help? Thank you for any advice input you might be able to offer. -Greg0 -
Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
Hi, Any assistance would be greatly appreciated. Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site". This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc. Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years. We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what . what is the best solution .? A couple of example of our problems are here : In Google type - site:bestathire.co.uk inurl:"br" You will see 107 results. This is one of many lot we need to get rid of. Also - site:bestathire.co.uk intitle:"All items from this hire company" Shows 25,300 indexed pages we need to get rid of Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical. As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ? Any advice on both points greatly appreciated? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0 -
We would like to know where googlebots search from - IP location?
We have a client who has two sites for different countries, 1 US, 1 UK and redirects visitors based on IP. In order to make sure that the English site is crawable, we need to know where the googlebot searches from. Is this a US IP or a UK IP for a UK site / server?
Intermediate & Advanced SEO | | AxonnMedia0