Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google My Business app search report
What is the deal with the search reports in the Google My Business app? I downloaded this app so prospective customers could message my business, and when I look at the search reports on the app, the results seem nonsensical. According to google analytics my business receives pretty steady traffic every day. Why does the report say that I receive zero visitors one day and 400 the next? (See the screenshot below) 4yLRCHG
Reporting & Analytics | | RandyHT1 -
Google Search Console All Verfication Methods Failed
Hi Guys, We are trying to get google webmaster tools verified for this site: https://www.nookie.com.au/ We have implemented the recommended: HTML tag from Search Console which is: Which is installed see: http://prntscr.com/g66cle However, verification still failed. We then tried Google Analytics (failed), alternative methods (HTML file upload failed) Google Tag Manager. All failed. I have no idea why this has happened, any ideas? Cheers.
Reporting & Analytics | | jaynamarino0 -
In Google Search Console: Total Clicks differ from the sum clicks on search queries
Hi.This may be a google technical question, but I've searched a lot and I couldn't find any certain information about that.The problem appears when you look at two stats in Search Console ( WebMasters Tools, some months ago), particulary in the Search Analysis.1- Total clicks2- Sum of clicks at every search query.I attached an image to make me clear.8IZsxs6.png
Reporting & Analytics | | NachoRetta2 -
Can not divide in different properties a domain in Search Console (Webmaster Tools)
Dear Moz Community, I hope you can give me a hand with the following questions. Im in charge of SEO of an ecommerce site in LATAM. It´s service is available in several countries, therefore each country has it subdirectory Eg. /ar /pe /co /bo /cl /br,etc... (in the future we will move to differente ccTLDs). I have been recomended to split or create different Search Console or Webmaster Tools properties (one for each subdirectory) but when Im creating a new property with a subdirectory, lets say www.domain.com/ar, Webmaster tools starts creating a property for www.domain.com/ar/ (NOTICE THE LAST SLASH) and it returns since that page doesn´t exist, what do you recomend me to do? Best wishes, Pablo Lòpez C
Reporting & Analytics | | pablo_carrara0 -
Rank #1 for a 110,000/month query search, but barely any traffic?
Hi guys, As it says in the title, we've recently reached the absolute #1 position for a certain key phrase in the travel industry which the Google Keyword Tool tells me averages 110,000 local (165,000 global) searches a month... however we have received barely any traffic at all over the past TWO months for it and I'm trying my best to determine why. We've checked on multiple different devices with all forms of personalisation off, different browsers, 3G connections as opposed to office Wi-FI etc. and it still returns us as the #1 rank. Meta descriptions and title tags are pretty much pristine if I don't say so myself, however what should be a very lucrative key phrase is currently returning little to no traffic results. Has anyone had experience in a similar situation to this? Any possible causes that I might be missing? Would greatly appreciate any help. Thanks.
Reporting & Analytics | | ExperienceOz0 -
Blocking our IP's but wondering if Google still uses our search data?
The company owner here has our (company) website as his home page. I excluded our static IP’s on Google Analytics, but is that good enough to keep Google from using his search traffic as an indicator of anything negative. Does Google still take into account his activity, but simply block it from my reporting? Finally, does one person actually have that kind of influence as far as time on site, bounce rates, etc. Should I convince him to find a new home page?
Reporting & Analytics | | Ticket_King0 -
Google Analytics internal Site Search - Destination pages dispaly Search results
Hi, Im having a bit of an issue with Google Analytics internal site search, I am able to currently track the search terms through my website internal search but when I click onto destination pages I just get the search result page. When clicking destination pages I would expect to get the pages on which the user ended up after the results page, instead I just get the results page which is pretty much useless ?submitsearchXXXXXX hope you can help, look forward to your response. Thanks,
Reporting & Analytics | | Tug-Agency1 -
Site: Query Question
Hi All, Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time. I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages. What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned. When I do a query for www.newark.com "/dp/" I get ~845,000 results returned. Either I am doing something stupid or these numbers are completely backwards? Any thoughts? Thanks, Ben
Reporting & Analytics | | BenRush0