Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website Traffic Is Down
Hi, My Website www.financeninvestments.com is down for almost now 2 years. I was receiving the good traffic before this but now the traffic is almost down. I want to again do something to get my Traffic back with some consistent efforts. So what efforts should i do to make this back.Pls suggest.
Intermediate & Advanced SEO | | rahulsoni250 -
How do I get the sub-domain traffic to count as sub-directory traffic without moving off of WordPress?
I want as much traffic as possible to my main site, but right now my blog lives on a blog.brand.com URL rather than brand.com/blog. What are some good solutions for getting that traffic to count as traffic to my main site if my blog is hosted on WordPress? Can I just create a sub-directory page and add a rel canonical to the blog post?
Intermediate & Advanced SEO | | johnnybgunn0 -
Do I have a Panda filter on a specific segment?
Our site gets a decent level of search traffic and doesn't have any site-wide penalty issues, but one of our sections looks like it might be under some form of filter. Unfortunately for us, they're our buy pages! Check out http://www.carwow.co.uk/deals/Volkswagen/Golf it's unique content and I've built white hat links into it, including about 5 from university websites (.ac.uk domains DA70+). If you search something like "volkswagen golf deals" the pages on page 1 have weak thin content and pretty much no links. That content section wasn't always unique, in fact the vast majority of it may well be classed as dupe content as there's no Trim data and they look like this: http://www.carwow.co.uk/deals/Fiat/Punto While we never had much volume, the traffic on all /deals/ pages appears to drop significantly around the time of the May Panda update (4.0). We're planning on completely re-launching these pages with a new design, unique trim content and a paragraph (c.200 words) about the model. Am I right in assuming that there's a Panda filter on the /deals/ segment so regardless of what I do to one deals page it won't rank well, and we have to re-do the whole section?
Intermediate & Advanced SEO | | Matt.Carwow0 -
Dramatic decline in traffic with same unchanged rankings
Hello I would be grateful for any input on this. I'm the webmaster of the site.. -> www.worktopfactory.co.uk Before May 22, 2013, penguin 2 updates, i was getting around 700 - 800 Unique hits per day After pengin 2 Updates, There is no difference In ranking... But my traffic has halved Saturday for example the only received 66 hits. Please check my ranking stats Total Keywords 300 Rankings 220 In Top 3 288 On First Page 6. But traffic stats is Week ending: 6/16 Change 6/23 6/16 Change 6/23 6/16 Change 6/23
Intermediate & Advanced SEO | | JaffeyApple
Organic Search Visits
Total number of organic (unpaid) visits to your site from search engines.
1,782 -11% 1,589 37 -16% 31 1,745 -11% 1,558
URLs Receiving Entrances Via Search
The number of distinct URLs on your site that receive one or more organic (unpaid) visits from a search engine.
370 -4% 354 8 13% 9 362 -5% 345
Non-Paid Keywords Sending Search Visits
The number of distinct keywords that send one or more organic (unpaid) visits to your site.
886 -2% 865 8 0% 8 878 -2% 857 My questions are 1. Why is there a major decline in traffic when ranking is more orless same 2. What is the possible solution? 3. Am I targeting wrong keywords? If so, what would the alternatives be? Please note the 300 I have inserted were simply cut and pasted from a list of 1103 targeted kws. I would be grateful for any suggestions, so I may get traffic back to where it was before. Thanks0 -
Keyword search filter in Google Adwords: broad? exact? phrase?
Hello all I am working in my website and analysing the potential best keywords for the SEO (post/page name and url path name). 1. I am using Google Adwords. Any other tool you would recommend? 2. Which selection should I make in the Google Adwords Keyword Tool in order to know the monthly global searches of the keywords I should target? Exact? Phrase? Broad? For instance, KEYWORD SEARCH:"Information about Madrid" BROAD MATCH: 300,000 EXACT MATCH: 1,500 Te potential of the keyword is 300,000? 300,000 searches are undertaken on a month that contains that sentence and its variations? Or the relevant keyword potential is the exacta match traffic? Thank you very much! Antonio
Intermediate & Advanced SEO | | aalcocer20030 -
Having Content be the First thing the bots see
If you have all of your homepage content in a tab set at the bottom of the page, but really would want that to be the first thing Google reads when it crawls your site, is there something you can implement where Google reads your content first before it reads the rest of your site? Does this cause any violations or are there any red flags that get raised from doing this? The goal here would just be to get Google to read the content first, not hide any content
Intermediate & Advanced SEO | | imageworks-2612900 -
Why do i not receive google traffic?
over the 4-5 months i have published over 3000 unique articles which i have payed well over 10 000usd for, but i still only receive about 20 google visitors a day for that content. i uploaded the 3000 articles after i 301 redirected the old site to a a new domain (old site had 1000 articles, and at least 300visits from google a day), and all the old conetnt receives the traffic fine (301 redirect is working 100percent now and pr went from 0 to 3pr) articles are also good ranging from 400-800 words. 90 percent of them are indexed by google, most of them have been bookmarked to digg reddit etc website domain is over 10 years old - alltopics.com why google doesnt send me the traffic i deserve?
Intermediate & Advanced SEO | | rxesiv0 -
Removing large section of content with traffic, what is best de-indexing option?
If we are removing 100 old urls (archives of authors that no longer write for us), what is the best option? we could 301 traffic to the main directory de-index using no-index, follow 404 the pages Thanks!
Intermediate & Advanced SEO | | nicole.healthline0