Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting traffic to https
Hey! i was wondering, should i force all traffic to https address? i know that overall a better secured website will rank better since it earns more trust from users which means less bounce rate and the list of benefits is endless..
Intermediate & Advanced SEO | | SharonEKG
but should i FORCE ALL traffic to a https? or maybe only force a http to https? or not at all?2 -
My direct traffic went up and my organic traffic went down. Help!
So on Oct. 21, our direct traffic increased 3x and our organic traffic decreased 3x. And it has been that way ever since. Almost like they flip flopped. Additionally, that was the same day I started retargeting to our site. I have tagged all the links from the ads and they're being counted as google paid clicks in GA. And our accounts are linked. I am just dumbfounded as to how this could happen.
Intermediate & Advanced SEO | | Eric_OWPP1 -
Traffic drop after Facebook push
Hi all, We experienced a strange phenonema after a Facebook push, it appears the Google organic traffic was all but dead for five days after. Totally not sure why! It has since returned to about 80% of previous levels. http://postimg.org/image/3n1b7m7hf/
Intermediate & Advanced SEO | | ScottOlson0 -
New site. How important is traffic for a new site? And what about domain age?
Hi guys. I've been building a new site because i've seen a real SEO opportunity out there. I'm a mixing professional by trade and so I wanted to take advantage of SEO to help gain more work. Here's the site: www.signalchainstudios.co.uk I'm curious about domain age. This site fairly well optimised for my keywords, and my site got pretty good content on it (i think so anyway). But it's no where to be seen on the SERP's (link at all). Is this just a domain age issue? I'd have though it might be in the top 50 because my site's services are not hard to rank for at all! Also what about traffic? Does Google want to see an 'active' site before it considers 'promoting' it up the ranks? Or are back links and good content the main factor in the equation? Thanks in advance. I love this community to bits 🙂 Isaac.
Intermediate & Advanced SEO | | isaac6631 -
How to get traffic from a particular Geographical region?
Our company is based out of India and has a web site with .in domain ; however our target customers are from North America and Australia.
Intermediate & Advanced SEO | | TPS2013
The problem is we get as high as 70% of organic traffic from India.
This 70% traffic from India has little use to us. Possibly because we have ”.in “ domain the Google local search is active.
How to reverse this situation; I mean we are looking for more traffic from across the globe except India.
Any suggestions ? P.S. Changing domain from .in to .com is not an option as its the part of our brand advertised for last 7 years1 -
E-commerce Site - Filter Pages
Hi, We have a client who has a fairly large e-commerce site that went live quite recently. The site is near enough fully indexed by Google, but one thing I've noticed is that filtered search results pages are being indexed, all with duplicate page titles. Obviously this is an issue that needs to be looked at ASAP. My questions is this - would we be better tweaking site settings so that page titles are constructed from the filters (brand/price/size) and therefore unique (and useful for searchers who are after a specific brand or size of a given item). Or should we rel=canonical the filtered pages so that they are eventually dropped from the index (the safer of the two options)? Thanks in advance for your help!
Intermediate & Advanced SEO | | jasarrow0 -
Canonical Tags & Search Bots
Does anyone know for sure if search engine bots still crawl links on a page whose canonical tags are set to a different page? So in short, would it be similar to a no-index follow? Thanks! -Margarita
Intermediate & Advanced SEO | | MargaritaS0 -
Ranking & Traffic drops in last month
Over the last month, our rankings have been in a slow slide - that is until this week, when they absolutely crashed. Here are some example phrases: Phrase 11-Mar 5-Mar bug shields 24 9
Intermediate & Advanced SEO | | ShawnHerrick
floor mats 25 14
nerf bars 23 12
running boards 61 14
snow plows 25 18 For the life of me, I can't see what would have caused such drastic changes. Our site is almost completely unique content. Some things, like Warranty & Install instructions, are from the manufacturer to protect us from liabilities. We come up with our own feature text, and we have custom written articles, blog posts, research guides, etc. We also appear to be the only one of our competitors being affected in this fashion. Any thoughts would be helpful. Domain is realtruck.com.0