Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 Status Codes in Google Search Console
Hi all, I've noticed in Google Search Console under 'Crawl errors' - 1. Why does the status code '410' come up as an 'error' in the crawl report? 2. Why are some articles labelled as '404' error when they have been completely deleted and should be a '410' - there are roughly around 1000-2000 of these. Thanks!
Reporting & Analytics | | lucwiesman0 -
Does the new Google Analytics Search Console Beta tool use API to pull more data?
So my client has been asking for definitive proof of why the search query data provided on Google Search Console does not exactly match up the data presented directly in the Search Console itself. The simple answer is that the Google Search Console is limited to 1000 rows of data. However our client is requesting a Google article/documentation of why the new Search Console beta tool has no row limit (hence much more data for a big website). I know that the Google Search Console API was available before Google announced the new Search Console Beta tool in Google Analytics. I also know this API could pull in more data than the 1000 row limit. However is there any article available (preferably from Google) that Google Analytics is pulling this Search Console data via API? Thanks!
Reporting & Analytics | | RosemaryB0 -
In Google Search Console: Total Clicks differ from the sum clicks on search queries
Hi.This may be a google technical question, but I've searched a lot and I couldn't find any certain information about that.The problem appears when you look at two stats in Search Console ( WebMasters Tools, some months ago), particulary in the Search Analysis.1- Total clicks2- Sum of clicks at every search query.I attached an image to make me clear.8IZsxs6.png
Reporting & Analytics | | NachoRetta2 -
Need help understanding what happened to our organic search.
I help run an ecommerce business that mainly runs on Google organic search (yes, I realize this is not a good place to be). Last April, we saw around 25-30% of our organic search cut. I’m pretty sure it was due to the mobile update but we had some changes in the past month or two before that. I’m looking for someone to review my analytics account and see what happened. Possibly this is called an audit? Also, we’re looking to ask some questions about long term strategy as we are thinking about a redesign and switching to a new platform. Maybe more content? Maybe more social?
Reporting & Analytics | | kirbyf0 -
Are organic search visitors always seen as organic in origin, even if their return to the site is direct?
Many of our conversions occur in a customers second visit to the site. Often, a customer will arrive at our site, submit a finance application, leave, and return at a later date to checkout. We are interested in tracking how many of our checkouts come from customers who originally found our site through an organic search result. If a customer enters the site through organic search, leaves, and returns later through an email link or directly entering our URL, will G analytics show that customer as direct or organic origin? Cheers, Ben
Reporting & Analytics | | WSPL0 -
Standard % of nonbranded organic search traffic?
I am working on developing goals for 2012. I will be doing several SEO projects, including one on a website for study abroad programs. Currently, nonbranded organic search is 15.49% of our visits and in 2010, it was 15.02%. I am wondering if anyone has any advice or knows of a benchmark percentage that could help me set an attainable goal. Thank you!!
Reporting & Analytics | | CIEEwebTeam0 -
Can you get local search numbers/traffic out of Google Analytics?
With Google's new local search I am more curious as to market penetration on keywords that are now localized to my different US cities. I understand that you can separate out Google traffic based on regional Google domains, but I am curious if there is an effective way to separate out searches and keywords based on a my local US Metros? If google cannot do this, any recommendations on products that can? Thanks.
Reporting & Analytics | | Thos0031 -
Google Analytics - April Search Report Data Gone For All Clients - Anyone Else Seeing This?
Hey everyone, In the last hour I was doing some SEO referral reportingin Google Analytics and discovered that my April data was completely gone (flat lined to zero). Specifically, if i select a date range for any point in april and select search from the advanced segments area I am seeing this issue. This is happening for ALL of my active clients... anyone else seeing this?
Reporting & Analytics | | dpeddle0