Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google My Business app search report
What is the deal with the search reports in the Google My Business app? I downloaded this app so prospective customers could message my business, and when I look at the search reports on the app, the results seem nonsensical. According to google analytics my business receives pretty steady traffic every day. Why does the report say that I receive zero visitors one day and 400 the next? (See the screenshot below) 4yLRCHG
Reporting & Analytics | | RandyHT1 -
Did Analytics change the way to handle Google images searches on Dec 12?
Dear all, One of the sites I'm monitoring receives a lot of traffic from image searches or images that appear in universal search results. On Dec 12th, 2015, the bounce rate for these sessions went from around 30% the day before to around 87%. See screen shot below. Did anybody notice similar bounces in the bounce rate? Did Google change something in the way that image search is handled? Looking forward to your ideas! large?v=mpbl-1&px=999
Reporting & Analytics | | AABMarketing_Frank0 -
New GSC Search Analytics report: position mixes web and image
Dear all, I am auditing a site in Google Seach Console (GSC, formerly Google Webmaster Tools) and find the Position data in the new Search Analytics report very, very improbable. I suspect that even if you filter by "SearchType = web", the Position data does count the ranking of images in the Image search widget as a search position. Has anybody observed this as well? Here is the case: the site targets a quite broad search query in the bath room domain. I have made a number of searches with private browser sessions, different browsers, alternative IP address via a VPN, etc, and the look of the search result in the relevant geographical market is consistently the following. Three Adwords ads #1 organic result Images universal results widget #2-10 organic results The site’s first page ranks consistently around #15 of the organic results, hence on the second SERP. But it also consistently has an image in the Images universal results widget (usually #2 or #3). This is consistent with the data I have in Moz Analytics. Yet, the GSC Search Analytics report shows 2.2 as average position with the default SearchType=Web setting. I have done the search over and over, and never has a PAGE of the site ranked that high. Is there any public information how exactly the position is calculated? I mean, something more precise than the very general information on https://support.google.com/webmasters/answer/6155685?hl=en Is there any way to get the correct position/ranking? Thanks for sharing your experience!
Reporting & Analytics | | QRN0 -
Google Search Console - Why is my average mobile position better than my average desktop position?
I'm wondering why my average mobile position is much better than my average desktop position. I'm wondering if Google is comparing the same queries for both mobile vs desktop or if they're only showing me the top ranked for each type of search. Is it example 1 or 2? Example 1: Desktop may have 5,000 ranking queries that average to 21.6
Reporting & Analytics | | Pauly_Gigs
Mobile may have 1,500 ranking queries that average to 8.5 OR Example 2: example.com has 5,000 total ranking keywords, those queries' average ranking in a desktop search 21.6 and mobile search 8.5. I'm curious to know exactly what I'm seeing in Google's Search Console. https://08875344305734164866.googlegroups.com/attach/777ae98664ed418f/Mobile%20VS%20Desktop.png?part=0.1&view=1&vt=ANaJVrEHOjRLlPH43i00NnC8PxaG3ct7bsHum_TWnUoa7xVamCpRp8jrvRQJL-gz4n7Q0otqKcKxcAJA5z1VySs2naQU_Zy5tDps6bJhUSZsLRQq4uU-tJQ0 -
How many ways to use Event in Google Tag Manager for Event Tracking?
Hello Experts, How many ways to use Event in Google Tag Manager for Event Tracking? As per me there are 5 ways given below are they correct? 2nd thing if yes can you please please tell me procedure of using all or which is the best one to use? HTML 5 Data Attributes Classic Google Analytics Example: _gaq.push(['_trackPageview', '/downloads/pdfs/corporateBrief.pdf']);
Reporting & Analytics | | bkmitesh
3) Universal Analytics Example: ga(‘send’, ‘pageview’, ‘page path’); query string? or is it possible without any coding on website we can configure id's in google tag manager? Thanks! BK Mitesh0 -
Best way to handle ignored Rel=Canonical?
My Google Analytics is reporting organic traffic for URL's with a QueryString attached, even though there's a Canonical tag that points to the preferred (non-QueryStringed) version. Would the best way to handle this be the GWT URL Parameters Tool? I'm fairly unfamiliar with the tool, but after some research, it looks like this might be the best way to go. Does anyone have any good/bad advice for using the tool? Thanks!
Reporting & Analytics | | GalcoIndustrial1 -
Google Analytics Site Search to new sub-domain
Hi Mozzers, I'm setting up Google's Site Search on a website. However this isn't for search terms, this will be for people filling in a form and using the POST action to land on a results page. This is similar to what is outlined at http://support.google.com/analytics/bin/answer.py?hl=en&answer=1012264 ('<a class="zippy zippy-collapse">Setting Up Site Search for POST-Based Search Engines').</a> However my approach is different as my results appear on a sub-domain of the top level domain. Eg.. user is on www.domain.com/page.php user fills in form submits user gets taken to results.domain.com/results.php The issue is with the suggested code provided by Google as copied below.. Firstly, I don't use query strings on my results page so I would have to create an artificial page which shouldn't be a problem. But what I don't know is how the tracking will work across a sub-domain without the _gaq.push(['_setDomainName', '.domain.com']); code. Can this be added in? Can I also add Custom Variables? Does anyone have experience of using Site Search across a sub-domain perhaps to track quote form values? Many thanks!
Reporting & Analytics | | panini0 -
How can I capture search terms and forward it along with my contact form?
I would like to be able to capture the keywords when someone comes to my site with a PPC click or a keyword search. I can't imagine that this is an uncommon request, but I can't find the tools that would let me do that, or heck something that will help me build it myself. What is everyone using to capture search information from people who contact you?
Reporting & Analytics | | ciinc0