Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
The images on site are not found/indexed, it's been recommended we change their presentation to Google Bot - could this create a cloaking issue?
Hi We have an issue with images on our site not being found or indexed by Google. We have an image sitemap but the images are served on the Sitecore powered site within <divs>which Google can't read. The developers have suggested the below solution:</divs> Googlebot class="header-banner__image" _src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx"/>_Non Googlebot <noscript class="noscript-image"><br /></span></em><em><span><div role="img"<br /></span></em><em><span>aria-label="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>title="Arctic Safari Camp, Arctic Canada"<br /></span></em><em><span>class="header-banner__image"<br /></span></em><em><span>style="background-image: url('/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx?mw=1024&hash=D65B0DE9B311166B0FB767201DAADA9A4ADA4AC4');"></div><br /></span></em><em><span></noscript> aria-label="Arctic Safari Camp, Arctic Canada" title="Arctic Safari Camp, Arctic Canada" class="header-banner__image image" data-src="/~/media/images/accommodation/arctic-canada/arctic-safari-camp/arctic-cafari-camp-david-briggs.ashx" data-max-width="1919" data-viewport="0.80" data-aspect="1.78" data-aspect-target="1.00" > Is this something that could be flagged as potential cloaking though, as we are effectively then showing code looking just for the user agent Googlebot?The devs have said that via their contacts Google has advised them that the original way we set up the site is the most efficient and considered way for the end user. However they have acknowledged the Googlebot software is not sophisticated enough to recognise this. Is the above solution the most suitable?Many thanksKate
Intermediate & Advanced SEO | | KateWaite0 -
Why is my Bing traffic dropping?
In the middle of September we launched a redesigned version of our site. The urls all stayed the same. Since site launch traffic in Google has steadily increased but Bing traffic has dropped by about 50%. Any ideas on what I should look at?
Intermediate & Advanced SEO | | EcommerceSite0 -
Organic search traffic improved (besides Google) for last 6 months
Hi, to follow up on my previous post (http://moz.com/community/q/low-on-google-ranking-despite-error-free), I was wandering if someone can tell me whether we are penalised by Google or not? Since the last 6 months, we see a rise in organic visitors coming from Bing, yahoo but Google remains the same. Despite the advice given in previous post, I just feel that something else must be wrong. Perhaps more inbound links with high PR? Socially, we are pretty much engaging 50-60% of our audience, yet no link flow will count for our organic ranking sadly enough... Hopefully someone can have a look at our site www.mercadonline.es in more detail? Ask me in a PM for more info! Thank you Ivordg
Intermediate & Advanced SEO | | ivordg0 -
E-commerce category page optimization - filters vs. categories
Hi, We currently have a site where there are several subcategories for every main category. So this means that visitors will have to click through 3-4 subcategories before reaching products that they could have easily found if the site would be using filters on category pages. My question is - if a subcategory page with 4 products is currently a category page (optimized heading, description) and I'd want this category to be available through filters, how do I still keep it optimized for search engines? So under a category "Cleaners", we have all cleaning products. There are 8 "Cable cleaners" under this category. This is currently a subcategory, but I'd just solve this with a filter in the "Cleaners" screen. Not sure what's right from an SEO standpoint here.
Intermediate & Advanced SEO | | JaanMSonberg0 -
Search traffic decline after redesign and new URL
Howdy Mozzers I’ve been a Moz fan since 2005, and been doing SEO since. This is my first major question to the community! I just started working for a new company in-house, and we’ve uncovered a serious problem. This is a bit of a long one, so I’m hoping you’ll stick it out with me! ***Since the images aren't working, here's a link to the google doc with images. https://docs.google.com/document/d/1I-iLDjBXI4d59Kl3uRMwLvpihWWKF3bQFTTNRb1R3ZM/edit?usp=sharing Background The site has gone through a few changes in the past few years. Drupal 5 and 6 hosted at bcbusinessonline.ca and now on Drupal 7 hosted at bcbusiness.ca. The redesigned responsive design site launched on January 9th, 2013. This includes changing the structure of the URL’s, such as categories, tags, and articles. We submitted a change of address through GWT shortly after the change. Problem Organic site traffic is down 50% over the last three months. Below, Google analytics, and Google Webmaster Tools shows the decline. *They used the same UA number for Google analytics, so that’s why the data is continuous Organic traffic to the site. January 2011 - Dips in January are because of the business crowd on holidays. Google Webmaster Tools data exported for bcbusiness.ca starting as far back as I could get. Redirects During the switch, the site went from bcbusinessonline.ca to bcbusiness.ca. They were implemented as 302’s on January 9th, 2013 to test, then on January 15th, they were all made 301’s. Here is how they were set up: Original: http://www.bcbusinessonline.ca/bcb/bc-blogs/conference/2010/10/07/11-phrases-never-use-your-resume --301-- http://www.bcbusiness.ca/bcb/bc-blogs/conference/2010/10/07/11-phrases-never-use-your-resume --301-- http://www.bcbusiness.ca/careers/11-phrases-never-to-use-on-your-resume Canonical issue On bcbusiness.ca, there are article pages (example) that are paginated. All of the page 2 to page N were set to the first page of the article. We addressed this issue by removing the canonical tag completely from the site on April 16th, 2013. Then, by walking through the Ayima Pagination Guide we decided for immediate and least work choice was to noindex, follow all the pages that simply list articles (example). Google Algorithm Changes (Penguin or Panda) According to SEOmoz Google Algorithm Changes there is no releases that could have impacted our site at the February 20th ballpark. However - Sitemap We have a sitemap submitted to Google Webmaster Tools, and currently have 4,229 pages indexed of 4,312 submitted. But there are a few pages we looked at that there is an inconsistency between what GWT is reporting and what a “site:” search reports. Why would the submit to index button be showing, if it’s in the index? That page is in the sitemap. Updated: 2012-11-28T22:08Z Change Frequency: Yearly Priority: 0.5 *GWT Index Stats from bcbusiness.ca What we looked at so far The redirects are all currently 301’s GWT is reporting good DNS, Server Connectivity, and Robots.txt Fetch We don’t have noindex or nofollow on pages where we haven’t intended them to be. Robots.txt isn’t blocking GoogleBot, or any pages we want to rank. We have added nofollow to all ‘Promoted Content’ or paid advertising / advertorials We had TextLinkAds on our site at one point but I removed them once I satarted working here (April 1). Sitemaps were linking to the old URL, but now updated (April)
Intermediate & Advanced SEO | | Canada_wide_media1 -
Influence on CTR for high traffic keyword in url and redirect
I currently dominate on my site for a very high traffic keyword. My url contains this keyword in it along with the word "Free" in the beginning. Lets say my keyword is "This Keyword" then my url would be freethiskeyword.com. I rank 3rd for this keyword and generates me about 8k on a low month. I was just able to obtain my main keyword as my sole URL through an auction for a measly 2,000.00. (Very Excited about this). So now I have the URL thiskeyword.com What I want to know is what kind of influence can I expect with my new URL have in CTR. Since it is a high traffic keyword is there a automatic "Trust" factor that is involved and will users tend to click on thiskeyword.com as apposed to freethiskeyword.com? My Second Question I am torn as to what I should do with this new URL. Should I redirect my old URL to my new URL and keep both pointing to the same site? or should I try and dominate my niche and build a new site entirely. Since I currently make about 8k a month for third, if I were to build a separate site and be able to obtain 1st place for my new keyword that would generate me 2 amounts in income based on stats. CTR based on http://searchenginewatch.com/article/2049695/Top-Google-Result-Gets-36.4-of-Clicks-Study freethiskeyword.com = 8k/m for 3rd based on 10% of clicks (currently) thiskeyword.com = 24k/m for 1st based on 36% of clicks (in theory) If I keep each site separate and be able to have one site at 3rd and the other at 1st then I would be making about 32k a month. If I redirect my old url to my new url then I would only have 1st place (if I make it to first of course) and that would only make me 24k a month. It seems to me I should keep these sites separate to generate more income. I am torn what I should do. Also with the EMD penalty I am afraid to 301 my site to my new URL since it is my exact keyword as apposed to my current one. I am defiantly branded as "Free This Keyword" so moving it to thiskeyword.com could hurt me more than help (at least I think so) What you think?
Intermediate & Advanced SEO | | cbielich0 -
Google Filter? Drop from top first page to bottom second page?
My site has dropped from the first page top spots to the bottom second page, about 2 month ago. From time to time it reappears in the first page, is this some kind of google filter? How do I solve this issue?
Intermediate & Advanced SEO | | Ofer230