Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz reports way fewer backlinks than google search?
My site is only 11 months old but has steadily (if not slowly) been gaining backlinks. My question, is why Moz shows me at 303 backlinks and Google search console is showing at 1,237? I am more than a little suspicious that this could highlight the reason Moz shows such an unfavorable DA ranking for our site at a DA12. Other competitors that rank for similar keywords to mine are DA 42, DA 65, DA 73, etc. If the largest ranking factor is links, and they have mine reported incorrectly - is this the issue with DA as it relates to sites like mine? Any answer from someone who has experienced similar, or has a definitive answer is more than welcome to chime in! Thanks, Kevin
Reporting & Analytics | | kvncrll0 -
Organic reports showing a URL that isn't in Search Ask Question
In the image I've attached you can see that I have pulled a source/medium > google organic report. I've also made "landing page" my secondary dimension. The first landing page that is showing up is /v3/?slug=fnl, that is this page (https://orders.freshnlean.com/v3/?slug=fnl). You can see that the page has 230 sessions from Sep 3 - 9 and 17 transactions during that same time frame. The only thing is, that landing page is nowhere to be found in the SERPs. So how is it showing up in this report as having received google organic visitors that converted if it's not even in search? 05OclDp
Reporting & Analytics | | tdastru0 -
Search my keyword on google
When I search my keyword on google I can see my website, but when I connect to a VPN and again search it, I'm not in the search result, what happens on my website?
Reporting & Analytics | | Pintapin0 -
How to safely exclude search result pages from Google's index?
Hello everyone,
Reporting & Analytics | | llamb
I'm wondering what's the best way to prevent/block search result pages from being indexed by Google. The way search works on my site is that search form generates URLs like:
/index.php?blah-blah-search-results-blah I wanted to block everything of that sort, but how do I do it without blocking /index.php ? Thanks in advance and have a great day everyone!0 -
Cross-Domain Tracking Urgent Query :-(
Hi Mozzers! One of my clients is having an issue with cross-domain tracking, in other words their own domain is seen as a huge referrer. When you land on their site, which is www.sunway.ie, and then choose a holiday to book it then takes you to another domain which is www.sunwayholidays.ie, during the booking process. I'm just wondering if there is a Google Analytics genius out there who may be able to take a quick look and let me know if there is any obvious solution to this within the Google Analytics code? Thanks in advance everyone! Gavin
Reporting & Analytics | | strategemilabs0 -
Difference between site: search and Total Indexed in Google Webmaster Tools.
This morning I did a search on Google for my site using the site: operator. I noticed that the number of results returned was significantly different than the "Total indexed" in Google Webmaster Tools. What is the difference and is it normal to have two very different numbers here?
Reporting & Analytics | | Gordian0 -
Is there any way to see how one my keywords ranked historically before I started tracking it?
Hello there! I'm in need of some historical data on some keywords as they relate to my site. Basically, I'd like to track which events have had the most impact on moving me through the rankings, but I wasn't using Moz at the time I had made some changes to my website... Is there any way to see how I ranked for a particular keyword at a given point in time? Thanks! Gene
Reporting & Analytics | | BGroup0 -
Google Secure Search Announcement: How do you think this will play out?
So, I read this post on the Google Blog and then this commentary about it. Wondering what some of you veterans think about how this will impact results. I always remain logged out for queries and use several different software tools to check when I run an analysis. Since social influences are becoming a bigger factor, do you think this is going to create some issues in data? What are your thoughts?
Reporting & Analytics | | TheARKlady0