Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. Â If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name.  The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. Â I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page =Â 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... Â Â Â ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). Â When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: Â ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. Â What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Creating two websites from one and building up traffic to the new domain quickly
A client has an existing successful website that sells niche products - they are well known in their marketplace. They have two sets of key customers, let's call them (a) and (b), that need addressing in different ways to maximise sales. (a) is the more specialist end of the market, where people have complex needs - there are fewer of them but repeat business is likely, and we can talk to them in more technical language. (b) is the layman's end of the market - there is a vast pool of potential customers but they'll be more casual buyers and need to be addressed more in layman's terms. So what they want to do is to take their existing website, and essentially split it into two different websites, one for each market. The one that will use the existing domain, with all the links that have built up over the years pointing to it, will be the site for the more specialist end of the market (a). The domain name suits it better, which is why he wants to use the existing domain with that site and not the other. (b) will be a brand new domain. The client will write new product descriptions across the board so that the two sets of product information are not duplicate. I'd rather he didn't do this at all, because of the risk involved, and the difficulty of building up the traffic to the new site, which is after all the one with the best chance of mass market sales. But given that the client has decided that this is definitely what he wants, does anyone have any thoughts on what the action plan should be?
Intermediate & Advanced SEO | | helga730 -
Site Migration Question - Do I Need to Preserve Links in Main Menu to Preserve Traffic or Can I Simply Link to on Each Page?
Hi There We are currently redesigning the following site https://tinyurl.com/y37ndjpn The local pages links in the main menu do provide organic search traffic. In order to preserve this traffic, would be wise to preserve these links in the main menu? Or could we have a secondary menu list (perhaps in the header or footer), featured on every page, which links to these pages? Many Thanks In Advance for Responses
Intermediate & Advanced SEO | | ruislip180 -
Organic search traffic improved (besides Google) for last 6 months
Hi, to follow up on my previous post (http://moz.com/community/q/low-on-google-ranking-despite-error-free), I was wandering if someone can tell me whether we are penalised by Google or not? Since the last 6 months, we see a rise in organic visitors coming from Bing, yahoo but Google remains the same. Despite the advice given in previous post, I just feel that something else must be wrong. Perhaps more inbound links with high PR? Socially, we are pretty much engaging 50-60% of our audience, yet no link flow will count for our organic ranking sadly enough... Hopefully someone can have a look at our site www.mercadonline.es in more detail? Ask me in a PM for more info! Thank you Ivordg
Intermediate & Advanced SEO | | ivordg0 -
Has anyone else seen a Google Plus Local listing displace a regular search listing?
I have a particular site that I have been working on for about eight months and had the site on Page 1 of Google search results for eight keywords (they are fairly small local-based keywords, so I'm really not trying to boast). Perhaps six weeks ago for two of the keywords we popped into the #2 position for Google Plus Local results. When this happened the site completely disappeared from the regular search results. A couple weeks later, the Google Plus Local listing was gone, and the site was back on Page 1 in the regular listings. This has gone back and forth several times, with either a very high Local result or a very high regular search result, but only one at a time. I suppose it would make sense for the same site to only be able to have one position on the front page at any given time, but my searches for info on this have been entirely fruitless. Has anyone else seen anything like this or have any thoughts? Cheers.
Intermediate & Advanced SEO | | IanKietzman271 -
Declining Organic Traffic despite PR, links and engagement
I have a client site that launched last June and rebranded this February 2012 as http://49thshelf.com The search traffic since Feb has been steadily declining despite some great campaigns to drive traffic and engagement. April down 40% vs. Mar May down 37% Jun down 51% Jul 16% We have a couple of challenges. The site is the only collection of Canadian-authored titles. It's like an Amazon of only Canadian titles. But it's not ecommerce, we direct traffic to other vendors like Amazon or the publisher to buy. We have 40,000 unique products on the site and the descriptions are primarily supplied by the publishers, which means it's the same content on the publisher site as Goodreads, Amazon and anyone else they share data with. Those big players like Amazon and Goodreads use user generated content to alter the descriptions but we don't have that level of activity on the site. Members create reading lists, the editorial staff curate collections on the homepage and there are interviews, blog posts and guest posts. No black hat SEO, no bad links that I can see. Great organic membership growth and interactions. Good activity from social media sites to the site. Good, trusted links from news sites and legit blogs. I don't know what to do to improve the organic traffic. July is the first month that we haven't seen 40-50% drops. Any advice is welcome, thank you!
Intermediate & Advanced SEO | | SoMisguided0 -
Is traffic and content really important for an e-commerce site???
Hi All, I'm maintaining an e-commerce website and I've encountered some related keywords that I know will not convert to sales but are related to the subject and might help becoming an "authority". I'll give an example... If a car dealership wrote an amazing article about cleaning a car.
Intermediate & Advanced SEO | | BeytzNet
Obviously it is related but the chances of someone looking to clean his car will go ahead and buy one now are quite low. Also, he will probably bounce out of this page after reading the piece. To conclude, Would such an article do GOOD (helping to become an authority and having more visitors) or BAD (low conversion rate and high bounce rate)? Thanks0 -
Planning for Website traffic on a self-hosted web server
How would you plan for the levels of traffic on an in-house web server? The scenario is that website is basically running on a T1 (1.5 Mbps) connection pipe, and traffic projects to increase significantly with content growing from about 40 uniques a day (on less than 20 poorly optimized web pages + associated PDF documents), to over 150 search optimized content pages + offsite traffic and link building. I'm trying to figure out what kinds of avg traffic levels (plus spikes) would represent a maximum bandwidth capacity for this...given that its a narrow specialty B2B focus. Any answers would be useful.
Intermediate & Advanced SEO | | GLogic0 -
How to prevent Google from crawling our product filter?
Hi All, We have a crawler problem on one of our sites www.sneakerskoopjeonline.nl. On this site, visitors can specify criteria to filter available products. These filters are passed as http/get arguments. The number of possible filter urls is virtually limitless. In order to prevent duplicate content, or an insane amount of pages in the search indices, our software automatically adds noindex, nofollow and noarchive directives to these filter result pages. However, we’re unable to explain to crawlers (Google in particular) to ignore these urls. We’ve already changed the on page filter html to javascript, hoping this would cause the crawler to ignore it. However, it seems that Googlebot executes the javascript and crawls the generated urls anyway. What can we do to prevent Google from crawling all the filter options? Thanks in advance for the help. Kind regards, Gerwin
Intermediate & Advanced SEO | | footsteps0