Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filtering Views in GA
Hi there, Does anyone here any experience in filtering views in Google Analytics by TLD? I thought the filter type of hostname would have done what I was looking for but it hasn't and I can only find information online about doing it for subdomains rather than top level ones. Many thanks in advance.
Intermediate & Advanced SEO | | BAO.Agency0 -
What to try when Google excludes your URL only from high-traffic search terms and results?
We have a high authority blog post (high PA) that used to rank for several high-traffic terms. Right now the post continues to rank high for variations of the high-traffic terms (e.g keyword + " free", keyword + " discussion") but the URL has been completed excluded from the money terms with alternative URLs of the domain ranking on positions 50+. There is no manual penalty in place or a DCMA exclusion. What are some of the things ppl would try here? Some of the things I can think of: - Remove keyword terms in article - Change the URL and do a 301 redirect - Duplicate the POST under new URL, 302 redirect from old blog post, and repoint links as much as you have control - Refresh content including timestamps - Remove potentially bad neighborhood links etc Has anyone seen the behavior above for their articles? Are there any recommendations? /PP
Intermediate & Advanced SEO | | ppseo800 -
The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?
Hi We have an issue with images on our site not being found or indexed by Google. We have an image sitemap but the images are served on the Sitecore powered site within <divs>which Google can't read. The developers have suggested the below solution:</divs> Googlebot class="header-banner__image" _src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx"/>_Non Googlebot <noscript class="noscript-image"><br /></span></em><em><span><div role="img"<br /></span></em><em><span>aria-label="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>title="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>class="header-banner__image"<br /></span></em><em><span>style="background-image: url('/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx?mw=1024&hash=D65B0DE9B311166B0FB767201DAADA9A4ADA4AC4');"></div><br /></span></em><em><span></noscript> aria-label="Arctic Safari Camp, Arctic Canada" title="Arctic Safari Camp, Arctic Canada" class="header-banner__image image" data-src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx" data-max-width="1919" data-viewport="0.80" data-aspect="1.78" data-aspect-target="1.00" > Is this something that could be flagged as potential cloaking though, as we are effectively then showing code looking just for the user agent Googlebot?The devs have said that via their contacts Google has advised them that the original way we set up the site is the most efficient and considered way for the end user. However they have acknowledged the Googlebot software is not sophisticated enough to recognise this. Is the above solution the most suitable?Many thanksKate
Intermediate & Advanced SEO | | KateWaite0 -
Traffic drop after Facebook push
Hi all, We experienced a strange phenonema after a Facebook push, it appears the Google organic traffic was all but dead for five days after. Totally not sure why! It has since returned to about 80% of previous levels. http://postimg.org/image/3n1b7m7hf/
Intermediate & Advanced SEO | | ScottOlson0 -
Organic search traffic improved (besides Google) for last 6 months
Hi, to follow up on my previous post (http://moz.com/community/q/low-on-google-ranking-despite-error-free), I was wandering if someone can tell me whether we are penalised by Google or not? Since the last 6 months, we see a rise in organic visitors coming from Bing, yahoo but Google remains the same. Despite the advice given in previous post, I just feel that something else must be wrong. Perhaps more inbound links with high PR? Socially, we are pretty much engaging 50-60% of our audience, yet no link flow will count for our organic ranking sadly enough... Hopefully someone can have a look at our site www.mercadonline.es in more detail? Ask me in a PM for more info! Thank you Ivordg
Intermediate & Advanced SEO | | ivordg0 -
Lot of duplicate content and still traffic is increasing... how does it work?
Hello Mozzers, I've a dilemma with a client's site I am working on that is make me questioning my SEO knowledge, or the way Google treat duplicate content. I'll explain now. The situation is the following: organic traffic is constantly increasing since last September, in every section of the site (home page, categories and product pages) even though: they have tons of duplicate content from same content in old and new URLs (which are in two different languages, even if the actual content on the page is in the same language in both of the URL versions) indexation is completely left to Google decision (no robots file, no sitemap, no meta robots in code, no use of canonical, no redirect applied to any of the old URLs, etc) a lot (really, a lot) of URLs with query parameters (which brings to more duplicated content) linked from the inner page of the site (and indexed in some case) they have Analytics but don't use Webmaster Tools Now... they expect me to help them increase even more the traffic they're getting, and I'll go first on "regular" onpage optimization, as their title, meta description and headers are not optimized at all according to the page content, but after that I was thinking on fixing the issues with indexation and content duplication, but I am worried I can "break the toy", as things are going well for them. Should I be confident that fixing these issues will bring to even better results or do you think is better for me to focus on other kind of improvements? Thanks for your help!
Intermediate & Advanced SEO | | Guybrush_Threepw00d0 -
What's going on with my organic traffic from Google?
I am working on eCommerce website Vista Stores. My website's traffic is going down due to certain reason. I have done R & D and have assumption with auto generated content which I have added on few product pages. You can find out attachment to know more about current situation of traffic. 6789134845_d1a1578960_b.jpg
Intermediate & Advanced SEO | | CommercePundit0 -
I have a great idea for a contest that will generate links...but how do I promote it when I have little traffic?
I'm working on a new real estate site and I've got some great ideas to generate links. One of these ideas is a pretty neat contest. (I'd tell you what it is, but really...this is one of those ideas where people will go, 'Man! I wish I thought of that', so I want to do it first before anyone else steals my idea. 🙂 ) The contest involves readers submitting photos and other readers voting on them. I anticipate it will generate some good press both locally and internationally if I can get enough submissions. But, the question is, how do I get contest submissions when I don't have good traffic yet? Once I've got some good submissions, I'll contact some media, spread it amongst my personal fb friends, etc. Any ideas are welcomed!
Intermediate & Advanced SEO | | MarieHaynes1