Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How will this affect the rankings and traffic of the new site once this happens?
Hi, we will be moving a clients’ site address from one domain to another and will of course be doing 301 redirects and notifying Google of the site address change in WMT. The problem is, that at some point in the future (say 3-6 months), the old domain will be going live with a new site as the current client does not own the domain and the owner will be wanting it back unfortunately. How will this affect the rankings and traffic of the new site (new domain) once this (old domain with new site) happens? Will the site address change be enough to keep the rankings but it will lose backlink traffic? Or will rankings go down since the 301 redirects will in essence no longer be in affect? Many thanks for your help in advance.
Intermediate & Advanced SEO | | WSIDW0 -
New site. How important is traffic for a new site? And what about domain age?
Hi guys. I've been building a new site because i've seen a real SEO opportunity out there. I'm a mixing professional by trade and so I wanted to take advantage of SEO to help gain more work. Here's the site: www.signalchainstudios.co.uk I'm curious about domain age. This site fairly well optimised for my keywords, and my site got pretty good content on it (i think so anyway). But it's no where to be seen on the SERP's (link at all). Is this just a domain age issue? I'd have though it might be in the top 50 because my site's services are not hard to rank for at all! Also what about traffic? Does Google want to see an 'active' site before it considers 'promoting' it up the ranks? Or are back links and good content the main factor in the equation? Thanks in advance. I love this community to bits 🙂 Isaac.
Intermediate & Advanced SEO | | isaac6631 -
How can I track traffic source for each user?
We received an enquiry on one of our landing pages and I am trying to track down where that user come from? Whether he came from social networks or search engines and if it is from search engine which keywords he used etc.. Does anyone know if there is any way I could see that?
Intermediate & Advanced SEO | | Rubix0 -
Should I noindex the site search page? It is generating 4% of my organic traffic.
I read about some recommendations to noindex the URL of the site search.
Intermediate & Advanced SEO | | lcourse
Checked in analytics that site search URL generated about 4% of my total organic search traffic (<2% of sales). My reasoning is that site search may generate duplicated content issues and may prevent the more relevant product or category pages from showing up instead. Would you noindex this page or not? Any thoughts?0 -
I need help with a local tax lawyer website that just doesn't get traffic
We've been doing a little bit of linkbuilding and content development for this site on and off for the last year or so: http://www.olsonirstaxattorney.com/ We're trying to rank her for "Denver tax attorney," but in all honesty we just don't have the budget to hit the first page for that term, so it doesn't surprise me that we're invisible. However, my problem is that the site gets almost NO traffic. There are days when Google doesn't send more than 2-3 visitors (yikes). Every site in our portfolio gets at least a few hundred visits a month, so I'm thinking that I'm missing something really obvious on this site. I would expect that we'd get some type of traffic considering the amount of content the site has, (about 100 pages of unique content, give or take) and some of the basic linkbuilding work we've done (we just got an infographic published to a few decent quality sites, including a nice placement on the lawyer.com blog). However, we're still getting almost no organic traffic from Google or Bing. Any ideas as to why? GWMT doesn't show a penalty, doesn't identify any site health issues, etc. Other notes: Unbeknownst to me, the client had cut and pasted IRS newsletters as blog posts. I found out about all this duplicate content last November, and we added "noindex" tags to all of those duplicated pages. The site has never been carefully maintained by the client. She's very busy, so adding content has never been a priority, and we don't have a lot of budget to justify blogging on a regular basis AND doing some of the linkbuilding work we've done (guest posts and infographic).
Intermediate & Advanced SEO | | JasonLancaster0 -
When you provide traffic estimates, do you factor in CTR?
There are several studies that show CTR based on position. When a client asks for traffic estimates do you multiply CTR by estimated search volume? Why or why not?
Intermediate & Advanced SEO | | nicole.healthline0 -
Whats your regular routine ? Can we learn new things from each other
I tend to work on the on page changes first of all following keyword research. Then take a look at some internal linking, Setup a wordpress blog on /blog or sub domain and get my copywriter to start adding regular content . Next stage is link building Old fashioned emails requests, blog comments taking a look through existing sites we own for relevant places. On going analysis once positions change.
Intermediate & Advanced SEO | | onlinemediadirect0