Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to try when Google excludes your URL only from high-traffic search terms and results?
We have a high authority blog post (high PA) that used to rank for several high-traffic terms. Right now the post continues to rank high for variations of the high-traffic terms (e.g keyword + " free", keyword + " discussion") but the URL has been completed excluded from the money terms with alternative URLs of the domain ranking on positions 50+. There is no manual penalty in place or a DCMA exclusion. What are some of the things ppl would try here? Some of the things I can think of: - Remove keyword terms in article - Change the URL and do a 301 redirect - Duplicate the POST under new URL, 302 redirect from old blog post, and repoint links as much as you have control - Refresh content including timestamps - Remove potentially bad neighborhood links etc Has anyone seen the behavior above for their articles? Are there any recommendations? /PP
Intermediate & Advanced SEO | | ppseo800 -
Do I need to remove pages that don't get any traffic from the index?
Hi, Do I need to remove pages that don't get any traffic from the index? Thanks Roy
Intermediate & Advanced SEO | | kadut1 -
WordPress – parent category "blog" instead of regular "post page"?
In WordPress you normally show you blog posts on: Your home page. Your "posts page" (configurable in the Reading Settings) I want to do neither and have a third option instead: Assign a parent category called "blog" for all posts, and show the latest posts on that category's archive page. For the readers, the experience will be 100% the same as a regular "posts page". The UI, permalinks, and breadcrumbs will be 100% the same. But, I have heard that the "posts page" is important for Google for indexing and understanding your blog. So is is smarter SEO-wise to use a "posts page" instead of a parent category named "blog"? What negative effects might there be, if I have no "posts page" and just use the parent category "blog" instead?
Intermediate & Advanced SEO | | NikolasB0 -
How will this affect the rankings and traffic of the new site once this happens?
Hi, we will be moving a clients’ site address from one domain to another and will of course be doing 301 redirects and notifying Google of the site address change in WMT. The problem is, that at some point in the future (say 3-6 months), the old domain will be going live with a new site as the current client does not own the domain and the owner will be wanting it back unfortunately. How will this affect the rankings and traffic of the new site (new domain) once this (old domain with new site) happens? Will the site address change be enough to keep the rankings but it will lose backlink traffic? Or will rankings go down since the 301 redirects will in essence no longer be in affect? Many thanks for your help in advance.
Intermediate & Advanced SEO | | WSIDW0 -
Brand traffic moved from organic to PPC - could it affect rankings?
Hi, We've just increased a lot of branded PPC clicks for one of our clients. I've worked out that roughly 5000 clicks per month has been moved from organic search to PPC (all brand related search queries). These clicks are very cheap, but the client has expressed worries about what these clicks could do to our organic rankings. Lots of brand search in organic results proves to Google that this is a strong brand, right? So what happens when all the searches are still there, but the organic listings stop getting the clicks? Could this have a ring effect on other non-brand rankings?
Intermediate & Advanced SEO | | Inevo0 -
Loss of traffic due to domain move, not recovering
I have a new client who this year chose to eliminate using a "stronger", older domain (domain authority 50) for a newer, weaker domain (domain authority 38). The redirects actually started end of 2013 and happened over time by page/section. All were completed by Jan 12 2014. While 301 redirects are in place, and the robots.txt is disallowing all (187 pages blocked), it looks as though Google is still indexing pages (149 indexed) although not sure why. Perhaps they should be removed from the server? In spite of the redirects, they are not getting the (combined) traffic expected. Should they have had that expectation? Could it be because they are going from a "stronger", long established domain to a "weaker", newer domain, that it may take a long time to recover? They recently had another agency review the links on the weaker domain and they submitted a file to Google to disavow the links they found to be "toxic" however it doesn't seem to have made any difference, yet. Any idea how long it "should" take to make a difference, if it will indeed make a difference? They do have a blog in a sub-directory that doesn't get much traffic (approx 0.50% of the total traffic). Every post ends with a blatant self-promotion and due to Penguin, they have recently begun to mix up their link text and not include a link on every post. Last their target audience is both B-B and B-C, with B-B being priority. The big question I have is do you see changes take place with almost instant results in Google? Or am I right in telling him, this will take some time. He feels it's been almost 4 months now and their visibility/traffic should be more in par with what it was combined. Something to note is that they were sort of competing with themselves by using both domains however the number of searchers probably hasn't changed much... Thank you so much for giving me your 2 cents!
Intermediate & Advanced SEO | | cindyt-17038
xo0 -
Rankings and search traffic fell off a cliff
Hi Moz community, One of my clients has a beast of a website built in ASP.NET (which causes me problems cos I don't have much experience in that) It is a job-site that aggregates job opportunities from other job-sites and provides a job matching service by email etc. They used to have great presence on Google naturally for thousands of job searches. Since Penguin and Penguin 2.0 (I think) their traffic has fallen off a cliff. I have been doing some "off-page" experimentation, seeing if we can fix a lot of issues by re-sculpting their backlink profile (seeing as it was after penguin). but what I have found is that some pages respond to this off page work but some just do not at all, despite how we approach it, such as disavowing previous links building fresh new top quality content links with natural anchor text etc.... Which has lead me to the conclusion that the wider issue is on-page and potentially site structure. Unfortunately as it is ASP.NET I am not so comfortable diagnosing the issues. I think also some issues will be related to dupe content etc.... but I would LOVE to get some input from my learned Moz colleagues. The website is http://www.allthetopbananas.com/ - any tips on how to recover from this dramatic loss of traffic would be massively appreciated. Kind regards
Intermediate & Advanced SEO | | websearchseo0 -
Fading Text Links Look Like Spammy Hidden Links to a g-bot?
Ah, Hello Mozzers, it's been a while since I was here. Wanted to run something by you... I'm looking to incorporate some fading text using Javascript onto a site homepage using the method described here; http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades/ so, my question is; does anyone think that Google might see this text as a possible dark hat SEO anchor text manipulation (similar to hidden links)? The text will contain various links (4 or 5) that will cycle through one another, fading in and out, but to a bot the text may appear initially invisible, like so; style="display: none;"><a href="">Link Here</a> All links will be internal. My gut instinct is that I'm just being stupid here, but I wanted to stay on the side of caution with this one! Thanks for your time 🙂 http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades
Intermediate & Advanced SEO | | PeterAlexLeigh0