Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
UTM source errors in google search console
Dear Friends, I need help with UTM source and UTM medium errors. There are 300 such errors on my site which is affecting the site i think, The URL appended at the end is utm_source=rss&utm_medium=rss&utm_campaign= How do i resolve this? Please help me with it.Thanks ccEpFDn.png ccEpFDn.png
Reporting & Analytics | | marketing910 -
Search Keywords, Meta Keywords and Meta Descriptions; Keeping Webmaster Tools Current
What are Search Keywords, Meta Keywords and Meta Descriptions? What exactly is the difference between them and which one is more important? In regards to Webmaster Tools, if we delete a page or a product, it still shows up in Search Analytics. How can we update Webmaster Tools so as to keep it current with our website? Lastly, again in regards to Webmaster Tools, in Search Analytics. At the moment we put relevant queries into the Meta Description of low ranking pages, in order to raise the position of the page. Is this the right way to handle queries? Should we be putting the queries into the Meta Description or the Meta Keywords?
Reporting & Analytics | | CostumeD0 -
Can not divide in different properties a domain in Search Console (Webmaster Tools)
Dear Moz Community, I hope you can give me a hand with the following questions. Im in charge of SEO of an ecommerce site in LATAM. It´s service is available in several countries, therefore each country has it subdirectory Eg. /ar /pe /co /bo /cl /br,etc... (in the future we will move to differente ccTLDs). I have been recomended to split or create different Search Console or Webmaster Tools properties (one for each subdirectory) but when Im creating a new property with a subdirectory, lets say www.domain.com/ar, Webmaster tools starts creating a property for www.domain.com/ar/ (NOTICE THE LAST SLASH) and it returns since that page doesn´t exist, what do you recomend me to do? Best wishes, Pablo Lòpez C
Reporting & Analytics | | pablo_carrara0 -
What are all the 5's in SEO Queries in Analytics?
Every small business client has the same thing. 5 impressions for keywords, row after row, every single month. Why exactly 5 and why month after month the same thing? I see this in every local business I work in - and for very important phrases! It's gotten to the point that I think those are fake and I just look at the impressions that have numbers great than 5. Obviously I have to get their impressions up, but what am I to believe about these?
Reporting & Analytics | | katandmouse0 -
What's the best way to figure out which keywords are the highest converting?
We have a client using Google Analytics. They currently have 3 goals set up to track when website visitors fill out 3 forms: Form A, Form B, Form C. I can easily figure out what traffic sources have driven the highest number of conversions on each form (Search for Form A, for instance, or Referrals for Form B), but of course, when I try to drill down on search terms that have driven conversions to each form, I get stuck in "not provided" territory. I'd like to know what people are searching for when they ultimately fill out each form. This will answer questions like: are people familiar with us already when they convert, or did they randomly find our website when searching for something we sell? It seems like there must be a way, using Google Webmaster Tools, Analytics, or another third-party app, to answer the question: what keyword searches are responsible for the highest number of conversions? Especially on a website that has traffic of 10,000+/month and a healthy dose of search traffic. Right? Where am I missing this information?
Reporting & Analytics | | timfrick1 -
What is the best way to track mobile sites in Google Analytics?
Hello! I am wondering what the pros and cons of using the regular Google Analytics tracking code on a mobile site versus the tracking documentation from Google specifically on it found at http://code.google.com/mobile/analytics/docs/web/ which is still in labs mode. Does the mobile specific tracking have the same features as the regular one to be able to track events and report the same statistics? Thanks for the help on this one!
Reporting & Analytics | | CabbageTree0 -
Google encryption of search results
Hello Is there any definitive information on whether the recent changes to the way Google encrypts search results for people logged into Google affects the traffic present via the Google Adwords Keyword tool? Plus, how does SEOmoz ensure/minimise the affect of personalisation/localisation etc. on the rankings data provided via the pro tool. Thanks in advance for your assistance. Kind regards Neil
Reporting & Analytics | | mccormackmorrison0 -
Using Clients GA account for their over all website to Optimize Their Blog? or is there a better way to compare apples to apples
I would like to optimize my clients blog. If I am using their GA account for their over all website but put in the sub folder for SEOMOz, will it detect that I only want to optimize the blog ex. http://xxx.com/blog or will I be seeing the GA for the entire site on SEOMOZ?
Reporting & Analytics | | CliffordC0