Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can't see backlinks in Search Console
Hello, We have 7 sites and have noticed that for one site, we don't see any backlink info in Search Console even if it's been linked for over 8 months. Other tools show thousands of backlinks, but Search Console is still pending. I also see very little info in regards to Performance even if we are getting thousands of hits a day. Could this be a sign of a bigger problem? In summary, the site is up and running, getting hits, getting backlinks, but Search Console still looks like it's a new site with no activity several months after being installed.
Reporting & Analytics | | CJolicoeur0 -
Right way to delete and update old blog-posts?
Hi all, Rand Fishkin once tweeted about success story of Quickbooks blog as they deleted their old and outdated content to show only high quality content to their audience. We are planning to implement the same strategy to our blog which consists of 800+ blog-posts. I'm just wondering the best way to proceed on this and planning to follow as below. Please correct me if I'm wrong and if there are any better steps of follow: Get the list of blog-posts. Check the traffic of each blog-post. If the blog-post needs to be existed, update any info on the blog. If the blog no more needed, do I need to delete or noindex or redirect? What's the better way to measure the success? Thanks
Reporting & Analytics | | vtmoz1 -
Changes to cookies, August 24 or 25? (organic search not being cookied?)
Hi folks. We've seen a precipitous change in the referral data that we're able to gather from cookies. Specifically: On August 24 or 25, traffic to our site that was cookied as coming from organic search dropped by ~25%, and traffic coming in with no referrer data at all (i.e., it appears to be "direct") rose by roughly the same amount. As far as we can see, we haven't changed anything in our systems that would have caused this (we're not just mis-reading the cookie info), so I'm looking for external reasons. Has anyone else seen this? Or have any ideas why it would happen?
Reporting & Analytics | | RobM4160 -
Google Analytics - Organic Search Traffic & Queries -What caused the huge difference?
Our website traffic dropped a little bit during the last month, but it's getting better now, almost the same with previous period. But our conversion rate dropped by 50% for the last three weeks. What could cause this huge drop in conversion rate? In Google Analytics, I compared the Organic Search Traffic with previous period, the result is similar. But the Search Engine Optimization ->Queries shows that the clicks for last month is almost zero. What could be the cause of this huge differnce? e9sJNwD.png k4M8Fa5.png
Reporting & Analytics | | joony0 -
How to safely exclude search result pages from Google's index?
Hello everyone,
Reporting & Analytics | | llamb
I'm wondering what's the best way to prevent/block search result pages from being indexed by Google. The way search works on my site is that search form generates URLs like:
/index.php?blah-blah-search-results-blah I wanted to block everything of that sort, but how do I do it without blocking /index.php ? Thanks in advance and have a great day everyone!0 -
Google Making all searches secure - "Not provided" data to increase in Analytics
A lot of you might already be aware of the recent Google change at encrypting all search activity except for clicks on ads. Rand did a whiteboard session on this recently. How is everyone planning to adjust their research data to accommodate for this change?
Reporting & Analytics | | SEO5Team0 -
Tracing Google Analytics 'goal' back to original search phrase
I added Goals to my Google Analytics tracking. It's working; I get visitors who have completed Goals showing up in the reporting. My question is: Is it possible to trace backwards from a completed Goal to the original search phrase a user entered in Google to come to my site (for those who entered from Google.com via organic search result)? I'm trying to answer the question of which search phrases are resulting in completed Goals (as opposed to bouncing off the site or just any behaviour other than completing a Goal). It seems like this should be one of Analytics' default reports -- help identify which search phrases are converting well. It's probably there and I'm just not seeing it... Thanks.
Reporting & Analytics | | scanlin0 -
Using Clients GA account for their over all website to Optimize Their Blog? or is there a better way to compare apples to apples
I would like to optimize my clients blog. If I am using their GA account for their over all website but put in the sub folder for SEOMOz, will it detect that I only want to optimize the blog ex. http://xxx.com/blog or will I be seeing the GA for the entire site on SEOMOZ?
Reporting & Analytics | | CliffordC0