Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Traffic drop after Facebook push
Hi all, We experienced a strange phenonema after a Facebook push, it appears the Google organic traffic was all but dead for five days after. Totally not sure why! It has since returned to about 80% of previous levels. http://postimg.org/image/3n1b7m7hf/
Intermediate & Advanced SEO | | ScottOlson0 -
Can bots identify shmushed keywords?
I remember reading some years ago that domains and pages that have smushed keywords, such as cheapbaseballs.com/redbaseball.html could be identified by Google as "cheap baseballs" and "red base ball". Is this still correct?
Intermediate & Advanced SEO | | CFSSEO0 -
Does blocking foreign country IP traffic to site, hurt my SEO / US Google rankings?
I have a website is is only of interest to US visitors. 99% (at least) of Adsense income is from the US. But I'm getting constant attempts by hackers to login to my admin account. I have countermeasures fo combat that and am initiating others. But here's my question: I am considering not allowing any non US, or at least any non-North American, traffic to the site via a Wordpress plugin that does this. I know it will not affect my business negatively, directly. However, are there any ramifications of the Google bots of these blocked countries not being able to access my site? Does it affect the rankings of my site in the US Google searches. At the very least I could block China, Russia and some eastern European countries.
Intermediate & Advanced SEO | | bizzer0 -
Search traffic down 30% this month
Our search traffic has been growing at a steady clip for the last year but is down about 30% this month. As part of a redesign, we've repurposed our home page (blog.getvero.com). Rather than serve as a feed of recent posts, it's now an email signup page. We created a new page (blog.getvero.com/posts/) to display new posts. I think this is likely the reason for the drop in search traffic but I'm frustrated that it's losing us thousands of visitors per month. A few questions: 1. How long will it take to recover from this? 2. Is there anything we can do to speed up the recovery process? 3. Why are some of our best performing posts seeing less search traffic even though the URL hasn't changed? Any help is greatly appreciated.
Intermediate & Advanced SEO | | Nobody16116983020420 -
Site re-design, full site domain A/B test, will we drop in rankings while leaking traffic
We are re-launching a client site that does very well in Google. The new site is on a www2 domain which we are going to send a controlled amount of traffic to, 10%, 25%, 50%, 75% to 100% over a 5 week period. This will lead to a reduction in traffic to the original domain. As I don't want to launch a competing domain the www2 site will not be indexed until 100% is reached. If Google sees the traffic numbers reducing over this period will we drop? This is the only part I am unsure of as the urls and site structure are the same apart from some new lower level pages which we will introduce in a controlled manner later? Any thoughts or experience of this type of re-launch would be much appreciated. Thanks Pete
Intermediate & Advanced SEO | | leshonk0 -
Loss of traffic due to domain move, not recovering
I have a new client who this year chose to eliminate using a "stronger", older domain (domain authority 50) for a newer, weaker domain (domain authority 38). The redirects actually started end of 2013 and happened over time by page/section. All were completed by Jan 12 2014. While 301 redirects are in place, and the robots.txt is disallowing all (187 pages blocked), it looks as though Google is still indexing pages (149 indexed) although not sure why. Perhaps they should be removed from the server? In spite of the redirects, they are not getting the (combined) traffic expected. Should they have had that expectation? Could it be because they are going from a "stronger", long established domain to a "weaker", newer domain, that it may take a long time to recover? They recently had another agency review the links on the weaker domain and they submitted a file to Google to disavow the links they found to be "toxic" however it doesn't seem to have made any difference, yet. Any idea how long it "should" take to make a difference, if it will indeed make a difference? They do have a blog in a sub-directory that doesn't get much traffic (approx 0.50% of the total traffic). Every post ends with a blatant self-promotion and due to Penguin, they have recently begun to mix up their link text and not include a link on every post. Last their target audience is both B-B and B-C, with B-B being priority. The big question I have is do you see changes take place with almost instant results in Google? Or am I right in telling him, this will take some time. He feels it's been almost 4 months now and their visibility/traffic should be more in par with what it was combined. Something to note is that they were sort of competing with themselves by using both domains however the number of searchers probably hasn't changed much... Thank you so much for giving me your 2 cents!
Intermediate & Advanced SEO | | cindyt-17038
xo0 -
Page HTML great for humans, but seems to be very bad for bots?
We recently switched platforms and use Joomla for our website. Our product page underwent a huge transformation and it seems to be user friendly for a human, but when you look at one of our product pages in SEOBrowser it seems that we are doing a horrible job optimizing the page and our html almost makes us look spammy. Here is an example or a product page on our site: http://urbanitystudios.com/custom-invitations-and-announcements/shop-by-event/cocktail/beer-mug And, if you take a look in something like SEObrowser, it makes us look not so good. For example, all of our footer and header links show up. Our color picker is a bunch of pngs (over 60 to be exact), our tabs are the same (except for product description and reviews) on every single product page... In thinking about the bots: 1-How do we handle all of the links from footer, header and the same content in the tabs 2-How do we signal to them that all that is important on the page is the description of the product? 3-We installed schema for price and product image, etc but can we take it further? 4-How do we handle the "attribute" section (i.e. our color picker, our text input, etc). Any clarification I need to provide, please let me know.
Intermediate & Advanced SEO | | UrbanityStudios0 -
Should I 301 Poorly Worded URL's which are indexed and driving traffic
Hi, I'm working on our sites structure and SEO at present and wondering when the benefit I may get from a well written URL, i.e ourDomain / keyword or keyphrase .html would be preferable to the downturn in traffic i may witness by 301 redirecting an existing, not as well structured, but indexed URL. We have a number of odd looking URL's i.e ourDomain / ourDomain_keyword_92.html alongside some others that will have a keyword followed by 20 underscores in a long line... My concern is although i would like to have a keyword or key phrase sitting on its own in a well targeted URL string I don't want to mess to much with pages that are driving say 2% or 3% of our traffic just because my OCD has kicked in.... Some further advice on strategies i could utilise would be great. My current thinking is that if a page is performing well then i should leave the URL alone. Then if I'm not 100% happy with the keyword or phrase it is targeting I could build another page to handle the new keyword / phrase with the aim of that moving up the rankings and eventually taking over from where the other page left off. Any advice is much appreciated, Guy
Intermediate & Advanced SEO | | guycampbell0