Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does traffic for branded searches help a site rank for general terms?
A year or two ago we put up some websites which were specific to brands we own. Sure enough those sites (eg 'myBrand.com') started to rank pretty well for those brand terms eg 'mybrand curling tongs' (it's not curling tongs, btw, but you get the idea). We were getting a decent amount of traffic presumably from people who have bought or seen these products on our amazon/ebay stores. Before long, we see us starting to rank well for non branded searches eg 'curling tongs' even among decent competition. Next thing you know I'm getting told by the boss that we need to put up websites for all specific ranges, not just brands, because specificity is a bonus for ranking well. While there's probably a point that a site for MybrandCurlingTongs lends itself well to ranking for curling tongs, is there also an element that the branded searches we got (via making our brand known on amazon/ebay) helped the site gain recognition and authority? As such a new website about 'ionising hair dryers' would not rank well based on being specific, because it wouldn't be helped by a lot of branded traffic?
Intermediate & Advanced SEO | | HSDOnline2 -
Site under attack from Android SEO bots - expert help needed
For last 25 days, we are facing a weird attack on our site. We are getting 10x the normal mobile traffic - all from Android, searching for our name specifically. We are sure that this is not authentic traffic as the traffic is coming from Organic searches and bouncing off. Initially, we thought this was a DDoS attack, but that does not seem to be the case. It looks like someone is trying to damage our Google reputation by performing too many searches and bouncing off. Has any one else faced a similar issue before? What can be done to mitigate the impact on site. (FYI - we get ~2M visits month on month, 80% from Google organic searches). Any help would be highly appreciated.
Intermediate & Advanced SEO | | KJ_AV0 -
How can I stop spam Google Organic traffic?
Hey Moz, I'm a rather experienced SEO who just encountered a problem I have never faced. I am hoping to get some advice or be pointed in the right direction. I just started work for a new client. Really great client and website. Nicer than most design/content. They will need some rel canonical work but that is not the issue here. The traffic looked great at first glance 131k visits in April. Google Analytics Acquisition Overview showed 94% of the traffic as organic. When I dug deeper and looked at the organic source I saw that Google was 99.9% of it. Normal enough. Then I looked at the time on site and my jaw dropped. 118,454 Organic New Users for Google only stayed on the site for 3 seconds. There is no way that the traffic is real. It does not match what Google Webmaster tools, Moz, and Ahrefs are telling me. How do I stop a service that is sending fake organic Google traffic?
Intermediate & Advanced SEO | | placementLabs0 -
Decreased organic traffic but increased Webmaster Tool Queries
We have a client who has had a significant decrease in organic traffic this last month (about 20%) but in Webmaster tools it tells me there was an increase in impressions and clicks. How can these both be accurate?
Intermediate & Advanced SEO | | jfeitlinger0 -
95% of organic traffic lands in my homepage, despite having a 250 page website with a "seo optimized" hierarchical structure. Any suggestion as to what might be happening?
Challenging issue All the "usual suspects" have been discarded: all pages included in google index, no google penalties, metas optimized, kw's segregated by pages/cluster of pages to avoid cannibalization... BUT, we know we are missing something website is www.e-florex.com and is an e-commerce site based on magento Any ideas you might think are worth exploring? Thanks in advance for your help Juan
Intermediate & Advanced SEO | | juanmarn0 -
Url structure for multiple search filters applied to products
We have a product catalog with several hundred similar products. Our list of products allows you apply filters to hone your search, so that in fact there are over 150,000 different individual searches you could come up with on this page. Some of these searches are relevant to our SEO strategy, but most are not. Right now (for the most part) we save the state of each search with the fragment of the URL, or in other words in a way that isn't indexed by the search engines. The URL (without hashes) ranks very well in Google for our one main keyword. At the moment, Google doesn't recognize the variety of content possible on this page. An example is: http://www.example.com/main-keyword.html#style=vintage&color=blue&season=spring We're moving towards a more indexable URL structure and one that could potentially save the state of all 150,000 searches in a way that Google could read. An example would be: http://www.example.com/main-keyword/vintage/blue/spring/ I worry, though, that giving so many options in our URL will confuse Google and make a lot of duplicate content. After all, we only have a few hundred products and inevitably many of the searches will look pretty similar. Also, I worry about losing ground on the main http://www.example.com/main-keyword.html page, when it's ranking so well at the moment. So I guess the questions are: Is there such a think as having URLs be too specific? Should we noindex or set rel=canonical on the pages whose keywords are nested too deep? Will our main keyword's page suffer when it has to share all the inbound links with these other, more specific searches?
Intermediate & Advanced SEO | | boxcarpress0 -
SIte Redesign - Disaster for Organic Traffic
A client just redesigned their site and launched it around May 30. The organic traffic has had a MAJOR drop and has not returned yet. All of the old pages have been 301 redirected to the new pages. Any thoughts on what could be causing this to www.brickhousesecurity.com? In Google Webmaster Tools, before the redesign we were receiving about 300,000 impressions and 10-12,000 clicks. Now the impressions are only 100,000 with half as many clicks. Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0 -
30-40% drops in organic Google traffic starting early/mid Feb to Now??
Hello All: I would really appreciate any insight that can be provided into this dilemma. I've ruled out bad links issues - we don't have them and never engaged in grey to black hat SEO linkbuilding. I cannot find the cause for this drop off. Maybe I'm missing something because I am so close to it? The site is markspace Thanks in advance for any and all help or insight!! 🙂
Intermediate & Advanced SEO | | holdtheonion0