How to Safely Scrape Google Results?
-
I've built a couple of small tools that I use personally, maybe 2 or 3 times per day.
Both tools scrape the top 10 results from Google and provide more details about each domain (like the SEOMoz Keyword Difficulty Tool).
Google seem to have banned my IP address for automated searches... can anyone tell me a safe way of scraping the google results? Is there a suitable API for this?
How do SEO Moz do this on such a huge scale?
-
As I doubt that the APIs have considerably improved since this blog post http://www.seomoz.org/blog/the-nasty-problem-with-scraping-results-from-the-engines, google scraping is still a big issue and necessary for our daily seo work.
Scraping savely can only work if you succeed in convincing Google that you're a "natural" user and not a scarping robot. How can you do that?
- Search with alternating IPs, from different locations using proxies from the countries where you'd like to scrape from
- don't send too many requests at once from the same source
Consider that, when requesting a URL, the browser sends various information elements to the server, containing, for example, your Operating System, browser version, referer, etc. - every element can and should be changed to virtually change your identity when executing a new search.
- change browsers, browser versions, operating system information, etc.
- take care when changing browser localization values (en-GB, en-US probably don't return the same results)
- have a good network of proxy servers ready to send the different requests with your different identities to
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I Got A Scraper Delisted From Google ...
I have an electronics niche news website. A scraper who had an online store selling products in my niche copied every one of my articles and posted them on his site under the heading "News" ... generally within 1/2 hour of me posting them on my site. His site was even showing up in the rankings before mine. I filed a copyright infringement claim with Google two weeks ago via their online form explaining what he was doing. Today, I received an email from Google saying that they have reviewed his site and have delisted it from the search engine. I just checked, and he is GONE ... completely delisted, no trace. My site traffic has also jumped at least 25% today. It pays to complain! Just sharing 😉
Competitive Research | | Humanovation3 -
Continuous drop in Rankings on Google.uk
I have a Website who was ranked top 6 for most competitive keywords on Google.co.uk but suddenly dropped to page 2 after 21st January 2013 and later back to top 10 position and now dropped to pages 2 to 5, for most terms. I initially thought it was the transition rank causing this fluctuations. The strange thing is, i haven't done anything wrong in terms of back linking efforts. My site is content rich and offers real value to users also, the website PR increased from 3 to 4 recently, my site seomoz rank, PA, DA and Moz trust is higher than top ranked websites above mine. Please any suggestions?? Kindly help.....
Competitive Research | | ayo10 -
Majestic gives me a 24 situation and 24 trust flow. Seomoz just a total number of 7\. How come the difference? My ranking is still bad, so is Majestic crawling faster then google?
Hi, my total domain value number on SEOmoz is 7. In Majestic it is 24 situation and 24 trust flow. My ranking is still bad (page2) and my competitors have a lower trust/ situation flow in Majestic. But in Seomoz the're better. Is the conclusion that Majestic is more up to date then Google itself and that Seomoz is more inline with the google crawling? Because Majestic doesnt reflect my ranking. (ps I started with the domain for a month, and I only have some history in registration)
Competitive Research | | remkoallertz0 -
Someone help me with these results?
I've been using SEOMOZ for several months now. I've been working on cleaning up my onpage SEO for a while now. I have much less errors then my competition and my competitive link analysis is better than the #1 and the same as the #2 google result for "Kayak Fishing" in Google US. Can anyone offer any more advice on how I can get rank better? My site yakangler.com is currently ranked #52 on Google US. My SEO Report overview: http://awesomescreenshot.com/044ekugfc One of the competition overview: http://awesomescreenshot.com/0f6ekvg89 Looking at the link analysis will the difference in links make that much of a difference? http://awesomescreenshot.com/03cekwo55
Competitive Research | | mr_w0 -
How come the results in Google vary with domains
Hello, How is everyone doing? My question is about the google search engine results page. How come some results have the www. in front of them and some don't. Also what are the SEO implications of having www. in front of your search results vs. not. Is this something to do with canonical? I have included a screen shot so you will see what I mean. One result is www.gearyi.com and the result without the www is ingenexdigital.com. R6GLL.png
Competitive Research | | digitalops0 -
Can i have chance to rank higher than official website in google local domains ?
Q : can i have chance to rank higher than official website in google local domains ? for example : rank higher than microsoft,kaspersky,nokia etc... in google italy or google germany or any other local domain for google
Competitive Research | | activeacts0 -
My client has shown me a similar site, though not a competitor. He wants to know what sites they are linked from that give them such a good Google rank for certain kewords. Can SEOMoz tell me this?
When using google.com.au and searching for "travel to france", www.frenchtravel.com.au is the 3rd organic result. (the 1st two are not travel businesses, they are non profit travel guides) My client, who runs www.visituk.com.au, an Australian site that organises tours of the UK, said "so we just need to add these sort of words to the site?" I said, yes, but it doesn't end there. The real task is to have a link to your site on other sites surrounded with the words "travel" and "UK". He asked if he could see a list of the sites the french site was being referred by relevant to the search phrase. Is there an SEOmoz tool for this? Or is there another way I can generate that list? Thanks Simon
Competitive Research | | electrik0 -
Government Sites Cluttering Results?
Hi Guys, Have you ever come across government sites that are cluttering up SERP's that you're trying to rank for? For a new site that I'm working on one of the keyword terms is "driving test cancellations" and is in the UK. 3 of the top 4 results are government related sites which have verry little (if not nothing) to do with the keywords. Whilst these government sites are (very) loosley related to the keyword terms, and understandably have high pa/da, what would be the best way to try and rrank higher than these sites. I'm in the process of building links and social profiles - I'm really just wondering if there's something I'm missing that is an "easy fix" for jumping ahead of these sites - or getting them removed due to their lack of relevence. Gary...
Competitive Research | | perfectweb0