Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console not loading some resources
When I check an URL with Search Console it cannot load some page resources, even from other domains (like: ssl.google-analytics.com, www.facebook.com and www.google-analytics.com).
Reporting & Analytics | | TottiataHUN
Have any of you experienced this issue?
Steps to reproduce: open Google Search Console check an URL click "View crawled page" link select "More info" tab click "Page resources" {?}/{?} couldn't be loaded When I check the listed resources, all of them can be loaded from a web browser.
So I do not understand why Google cannot load them.
And there is no additional info why the resources could not be loaded.
Any ideas? google-search-console-other-error-1.png google-search-console-other-error-2.png0 -
Can Anyone help me with Google search console soft 404 error?
Hello everyone, I just build one site on WordPress and submitted it to the Google search console along with the sitemap. Some URLs got indexed but one of my URL is showing error in Search console. My post https://hotpass.site/create-subdomain-godaddy/ is showing soft error 404 webpage not available in Google search console. I have rechecked the site and seo settings. Nothing is there. I have even checked it with LIVE URL test. But still Search console is not accepting this URL. Can anyone help me with this? Help will be appriciated.
Reporting & Analytics | | Pauline210 -
Search Console Linking
Hey everyone! Quick question; there have been problems in the past with some of our accounts' Search Console integration becoming unlinked from Google Analytics without us knowing about it. I did some digging into Google Analytics to see if there was a way to create an alert for search queries dropping off, but there doesn't seem to be any options for Search Console alerts like there are for other integrations like Google Ads. Has anyone discovered a way to have an alert or some type of notification set up for when a Search Console integration is unlinked in Google Analytics or if search queries drop off significantly? I'd appreciate the help!
Reporting & Analytics | | ReunionMarketing0 -
The difference between organic searches in Acquisition and organic searches in Default Channel Grouping
Hi guys, We have a question. In Google Analytics, there are 2 types of identifying organic searches: through Acquisition and through Default Channel Grouping. On our website, we have some differences between the number of organic sessions. Which one do you think is more relevant? Which one do you use? Many thanks in advance!
Reporting & Analytics | | RIDGID_Europe0 -
Search console Search Analytics devices not showing mobile and tablet data since July 29th, have anyone noticed that too?
If you filter for devices in the search analytics at search console you get that from July 29th all the data is tagged as desktop and mobile and tablet have no data from that date. I see that for all my websites I have search console for, any input on that?
Reporting & Analytics | | amirbt0 -
Adding a Query String to a Static URL is that good or bad?
I just went through this huge process to shorten my URL structure and remove all dynamic strings. Now my analytics team wants to add query strings to track clicks from the homepage. Is this going to destroy my clean url structure by appending a query string to the end of the URL structure.
Reporting & Analytics | | rpaiva0 -
Paid Search Referral
I have a brand new site with a paid search referral in my G&A, but we are not running any adwords or any paid marketing for it. The referral is "not set" so I do not know where it is coming from.
Reporting & Analytics | | KJ-Rodgers0 -
Blocking our IP's but wondering if Google still uses our search data?
The company owner here has our (company) website as his home page. I excluded our static IP’s on Google Analytics, but is that good enough to keep Google from using his search traffic as an indicator of anything negative. Does Google still take into account his activity, but simply block it from my reporting? Finally, does one person actually have that kind of influence as far as time on site, bounce rates, etc. Should I convince him to find a new home page?
Reporting & Analytics | | Ticket_King0