How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Continuous drop in Rankings on Google.uk
I have a Website who was ranked top 6 for most competitive keywords on Google.co.uk but suddenly dropped to page 2 after 21st January 2013 and later back to top 10 position and now dropped to pages 2 to 5, for most terms. I initially thought it was the transition rank causing this fluctuations. The strange thing is, i haven't done anything wrong in terms of back linking efforts. My site is content rich and offers real value to users also, the website PR increased from 3 to 4 recently, my site seomoz rank, PA, DA and Moz trust is higher than top ranked websites above mine. Please any suggestions?? Kindly help.....
Competitive Research | | ayo10 -
Majestic gives me a 24 situation and 24 trust flow. Seomoz just a total number of 7\. How come the difference? My ranking is still bad, so is Majestic crawling faster then google?
Hi, my total domain value number on SEOmoz is 7. In Majestic it is 24 situation and 24 trust flow. My ranking is still bad (page2) and my competitors have a lower trust/ situation flow in Majestic. But in Seomoz the're better. Is the conclusion that Majestic is more up to date then Google itself and that Seomoz is more inline with the google crawling? Because Majestic doesnt reflect my ranking. (ps I started with the domain for a month, and I only have some history in registration)
Competitive Research | | remkoallertz0 -
How much keyword density for Google?
I have several pages on one site which have gone down during the past few months. They keyword density on those pages, which is not unnatural, pleased Google for many years. it still pleases Bing. But Google now seems very picky. Based upon your experience, what is the ideal % keyword density for 2 and 3 word phrases, and should they be left out of alt tags even when proper to put them there? While Google dominates, we do not wish to alienate BIng/Yahoo. It is a huge mystery, and experimentation with more non-keyword-related text has so far not born any fruit. Thank you, GH
Competitive Research | | gheh20130 -
Sending autmated queries to Google hurting SEO?
Anyone have any ideas whether there could be a chance that a site might get penalized if it is sending automated queries to Google (ie, to check rankings)? I was reading the recently updated Google Webmaster Guidelines and saw on the section - "Quality guidelines - specific guidelines" that mentioned about sending automated queries to Google... Just wondering what are the chances that Google will actually penalize a site that sends automated queries (if they are able to identify which site is doing so in the first place)..
Competitive Research | | globalsources.com0 -
Why is different the difficulty of a keyword in Google Spain and Google mexico?
In your opinion, Which are the main reasons of this difference?
Competitive Research | | BorjaUrreta910 -
How come the results in Google vary with domains
Hello, How is everyone doing? My question is about the google search engine results page. How come some results have the www. in front of them and some don't. Also what are the SEO implications of having www. in front of your search results vs. not. Is this something to do with canonical? I have included a screen shot so you will see what I mean. One result is www.gearyi.com and the result without the www is ingenexdigital.com. R6GLL.png
Competitive Research | | digitalops0 -
When providing search results for SEO purposes to you use the exact results in Google Adwords
Hi Mozzers Just quick question When an SEO company are supplying their testimonials for example
Competitive Research | | mcliddy
Keyword Search Term has 33,000 visits a month
Keywords is in position 1 but the search volume they are showing is broad, i was always brought up to do research on exact results unless im using the reserch for a PPC campagin? Has anyone got any ideas?? should it be braod im looking at or exact?? Many Thanks Matt0 -
How does a site get to no 3 in Google with no KW in their links?!!
Hello everyone, my first post - ahhh I'm investigating a niche and there is a site that should have no right being there in my view. It's no. 3 Google UK for 'company formation' with a small site with 65 weak links from only 7 domains and hosted in the US. But more importantly, the Open Site Explorer says there is not 1 link with that term in its anchor text. This I find crazy and makes me suspicious. But before I go back to my client saying "oh they must be black hat" I would like your expert views. I'm not sure whether to tut or congratulate them and for the first time I'm not sure what reasons to give for their amazing performance! What's your views?
Competitive Research | | GOYMedia480