Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do I need to remove pages that don't get any traffic from the index?
Hi, Do I need to remove pages that don't get any traffic from the index? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
Blog Traffic
Hi all, As of today, we put up approximately 900 high-quality, 100% original articles on our blog. However, we have not been able to generate any good traffic since July when it was first launched (blog.ostanding.com). Any suggestion would be greatly appreciated! Thanks again.
Intermediate & Advanced SEO | | businessowner0 -
What to do after a sudden drop in traffic on May 8?
Hello, I own Foodio54.com, which provides restaurant recommendations (mostly for the US). I apologize in advance for the lengthy questions below, but we're not sure what else to do. On May 8 we first noticed a dip in Google results, however the full impact of this sudden change was masked by an increase in Mother's Day traffic and is only today fully apparent. It seems as though we've lost between 30% and 50% of our traffic. We have received no notices in Google Webmaster Tools of any unnatural links, nor do we engage in link buying or anything else that's shady, and have no reason to believe this is a manual action. I have several theories and I was hoping to get feedback on them or anything else that anyone thinks could be helpful. 1. We have a lot of pictures of restaurants and each picture has its own page and these pages aside from the image are very similar. I decided to put a noindex,follow on the picture pages (just last night) especially considering Google's recent changes to image search that send less traffic anyways. Is there any way to remove these faster? There's about 3.5 million of them. I was going to exclude them in robots.txt, but that won't help the ones that are already indexed. Example Photo Page: http://foodio54.com/photos/trulucks-austin-2143458 2. We recently (within the last 2 months) got menu data from SinglePlatform, which also provides menus to UrbanSpoon and Yelp and many others, we were worried that adding a page just for menus that was identical to what is on Urbanspoon et all would just be duplicate content so we added these inline with our listing pages. We've added menus on about 200k listings.
Intermediate & Advanced SEO | | MikeVH
A. Is Google considering this entire listing page duplicate content because the menu is identical to everyone else?
B. If it is, should we move the menus to their own pages and just exclude them with robots.txt? We have an idea on how to make these menus unique for us, but it's going to be a while before we can create enough content to make that worthwhile. Example Listing with Menu: http://foodio54.com/restaurant/Austin-TX/d66e1/Trulucks 3. Anything else? Thank you in advance. Any insight anyone in the community has would be greatly appreciated. --Mike Van Heyde0 -
Google's Exact Match Algorithm Reduced Our Traffic!
Google's first Panda de-valued our Web store, www.audiobooksonline.com, and our traffic went from 2500 - 3000 (mostly organic referrals) per month to 800 - 1000. Google's under-valuing of our Web store continued to reduce our traffic to 400-500 for the past few months. From 4/5/2013 to 4/6/2013 our traffic dropped 50% more, because (I believe) of Google's "exact domain match" algorithm implementation. We were, even after Panda and up to 4/5/2013 getting a significant amount of organic traffic for search terms such as "audiobooks online," "audio books online," and "online audiobooks." We no longer get traffic for these generic keywords. What I don't understand is why a UK company, www.audiobooksonline.co.uk/, with a very similar domain name, ranks #5 for "audio books online" and #4 for "audiobooks online" while we've almost disappeared from Google rankings. By any measurement I am aware of, our site should rank higher than audiobooksonline.co.uk. Market Samurai reports for "audio books online" and "audiobooks online" shows that our Web store is significantly "stronger" than audiobooksonline.co.uk but they show up on Google's first page and we are down several pages. I also checked a few titles on audiobooksonline.co.uk and confirmed they are using the same publisher descriptions we and many other online book / audiobook merchants do = duplicate content. We have never received notice that our Web store was being penalized. Why would audiobooksonline.co.uk rank so much higher than audiobooksonline.com? Does Google treat non-USA sites different than USA sites?
Intermediate & Advanced SEO | | lbohen0 -
My Domain rank is falling but my traffic is improving?
I have been here for one month today and have been reworking many pages to improve my On-Page results for my site www.antiquebanknotes.com I have seen some really nice improvement in my organic, search and non paid keywords. (up 38%, 21% and 29% this week) But last week all of a sudden my domain authority dropped from 10 to 9. Not tragic but still odd since I have been getting some decent results from my optimazations. My competitors have domain authority in the 20's so it's something I am sure I need to work on. I have added links out to relevant sites and added lots of content but my domain authority falls? Is this common when a site makes lots of changes?
Intermediate & Advanced SEO | | Banknotes0 -
Sudden drop in ranks and traffic after migrating community website into main domain
Hi, We recently moved our community website (around 50K web pages) to our main domain. It now resides as a sub-domain on our main website. e.g. Before - we had www.mainwebsite.com and www.communitywebsite.com After - we have www.communitywebsite.mainwebsite.com This change took place on July 19th. After a week, we saw 16% drop in organic traffic to mainwebsite.com. Our ranks on most of the head keywords including brand keywords have dropped. We had created 301 redirects from pages on www.communitywebsite.com before this change was made. Has anybody seen this kind of impact when domains are merged? Should we expect that within 3-4 weeks Google will be able to re-index and re-rank all the pages? Is there anything else we could do to rectify the situation? Any feedback/suggestions are welcome!
Intermediate & Advanced SEO | | Amjath0 -
Removing large section of content with traffic, what is best de-indexing option?
If we are removing 100 old urls (archives of authors that no longer write for us), what is the best option? we could 301 traffic to the main directory de-index using no-index, follow 404 the pages Thanks!
Intermediate & Advanced SEO | | nicole.healthline0 -
Filter after 301 and linked with high PR
Hi, I'd like to ask you what should I do in my situation. I've shorted my URLs from something like this: domain.com/module/action/type/id/keyword to this: domain.com/keyword After 301 SERP refreshed and position stayed the same (yea, lucky me :). After 2 days I got some hight PR links (4 and 5). After 8 days my new URL disapprear to one keyword. Now this take 6 days... I've removed these links and still no results. So the question is - what should I do? Remove new url and replace it with old one, get new links?
Intermediate & Advanced SEO | | sui0