Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Console Crawl Errors/Not Found - Strange URLs
Hello, In Google Search Console under Crawl > Crawl Errors > Not found I have strange URLs like the following: https://www.domain.com//UbaOZ/
Reporting & Analytics | | chuck-layton
https://www.domain.com//UPhXZ/
https://www.domain.com//KaUpZ/WYdhZ/SnQZZ/MOcUZ/ There is no info in Linked From tab. Have you seen this type of error??
Does anyone know whats causing it??
How should it be fixed?? Thanks for reading and the help!0 -
Google Search Console
To the Moz Community, Should we be considering the information that Google Search Console is telling us? It is showing a dramatic drop in our SEO and our pages are not being indexed, however it is showing differently in our Moz Analytics section. Any clarification will be greatly appreciated. Many thanks Dawn
Reporting & Analytics | | DawnQ0 -
Google search console
Checking our links to our site on GSC and we've dropped from 515 to 324 over night - is this normal?
Reporting & Analytics | | RayflexGroup0 -
UTM source errors in google search console
Dear Friends, I need help with UTM source and UTM medium errors. There are 300 such errors on my site which is affecting the site i think, The URL appended at the end is utm_source=rss&utm_medium=rss&utm_campaign= How do i resolve this? Please help me with it.Thanks ccEpFDn.png ccEpFDn.png
Reporting & Analytics | | marketing910 -
What is the difference between "Organic Traffic" and the "Non-Paid Search Traffic" default segment in Google Analytics?
These two filtering options ("organic traffic" in the left sidebar and "non-paid search traffic" in the advanced segments) give me slightly different numbers. Any idea why this would be the case?
Reporting & Analytics | | FPD_NYC1 -
(Not provided) organic search results in Analytics
Hi, So from what I've read, (not provided) organic search results in Analytics is a result of the user being logged into Google. The problem I have is about 20% of the organic search results are (Not provided), so what is mainly left are branded search terms. I think I'm clutching at straws, but is there anything I can do to see these organic traffic sources?
Reporting & Analytics | | JuiceBoxOM0 -
Google Search Results inconsistent from different computers
Recently after some optimization activities - I do not see much movement in search rankings - my client is seeing the results on page 1 position 3 and I see page 2 for the same keyword. How does Google change ranking based on past searches and how can I get an accurate picture of what the actual rank is?
Reporting & Analytics | | devonkrusich0 -
Nofollow page is being reported as a landing page for organic search in Google Analytics
One of my client's websites includes a series of pages for an enrollment process. All of these pages are blocked by robots.txt. In Google Analytics these pages are showing data as landing pages for organic search traffic, and have been for quite some time. There was recently a surge of organic search traffic landing on one of these pages, coming from multiple search engines. The pages appear to be blocked and I'm not finding any of them in the search results for the keywords that are being reported in GA or by searching for the url. Does anyone have any insights into why this might be happening?
Reporting & Analytics | | rgibson1000