Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Organic Traffic Drop of 90% After Domain Migration
We moved our domain is http://www.nyc-officespace-leader.com on April 4th. It was migrated to https://www.metro-manhattan.com Google Search Console continues to show about 420of URLs indexed for the old "NYC" domain. This number has not dropped on Search Console. Don't understand why Google has not de-indexed the old site.
Intermediate & Advanced SEO | | Kingalan1
For the new "Metro" domain only 114 pages are being shown as valid. Our search volume has dropped from about 85 visits a day to 12 per day. 390 URLs appear as "crawled- currently not indexed". Please note that the migrated content is identical. Nothing at all changed. All re-directs were implemented properly. Also, at the time of the migration we filed a disavow for about 200 spammy links. This disavow file was entered for the old domain and the new one as well. Any ideas as to how to trouble shoot this would be much appreciated!!! This has not been very good for business.0 -
Landing pages for paid traffic and the use of noindex vs canonical
A client of mine has a lot of differentiated landing pages with only a few changes on each, but with the same intent and goal as the generic version. The generic version of the landing page is included in navigation, sitemap and is indexed on Google. The purpose of the differentiated landing pages is to include the city and some minor changes in the text/imagery to best fit the Adwords text. Other than that, the intent and purpose of the pages are the same as the main / generic page. They are not to be indexed, nor am I trying to have hidden pages linking to the generic and indexed one (I'm not going the blackhat way). So – I want to avoid that the duplicate landing pages are being indexed (obviously), but I'm not sure if I should use noindex (nofollow as well?) or rel=canonical, since these landing pages are localized campaign versions of the generic page with more or less only paid traffic to them. I don't want to be accidentally penalized, but I still need the generic / main page to rank as high as possible... What would be your recommendation on this issue?
Intermediate & Advanced SEO | | ostesmorbrod0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | | damienthivolle0 -
Organic search traffic dropped 40% - what am I missing?
Have a client (ecommerce site with 1,000+ pages) who recently switched to OpenCart from another cart. Their organic search traffic (from Google, Yahoo, and Bing) dropped roughly 40%. Unfortunately, we weren't involved with the site before, so we can only rely on the wayback machine to compare previous to present. I've checked all the common causes of traffic drops and so far I mostly know what's probably not causing the issue. Any suggestions? Some URLs are the same and the rest 301 redirect (note that many of the pages were 404 until a couple weeks after the switch when the client implemented more 301 redirects) They've got an XML sitemap and are well-indexed. The traffic drops hit pretty much across the site, they are not specific to a few pages. The traffic drops are not specific to any one country or language. Traffic drops hit mobile, tablet, and desktop I've done a full site crawl, only 1 404 page and no other significant issues. Site crawl didn't find any pages blocked by nofollow, no index, robots.txt Canonical URLs are good Site has about 20K pages indexed They have some bad backlinks, but I don't think it's backlink-related because Google, Yahoo, and Bing have all dropped. I'm comparing on-page optimization for select pages before and after, and not finding a lot of differences. It does appear that they implemented Schema.org when they launched the new site. Page load speed is good I feel there must be a pretty basic issue here for Google, Yahoo, and Bing to all drop off, but so far I haven't found it. What am I missing?
Intermediate & Advanced SEO | | AdamThompson0 -
Drop in traffic after redesign
Is it common for a site to see slight traffic drops after a site redesign (containing cleaner code, more usability and basically just being more helpful for the end user)? A new site of ours went live last Wednesday and has experienced a drop in traffic. If you have seen this in your own site, how did you recover? And how long did the recovery take?
Intermediate & Advanced SEO | | Gordian0 -
Best way to implement canonical tags on an ecommerce site with many filter options?
What would be the best way to add canonical tags to an ecommerce site with many filter options, for example, http://teacherexpress.scholastic.com? Should I include a canonical tag for all filter options under a category even though the pages don't have the same content? Thanks for reading!
Intermediate & Advanced SEO | | DA20130 -
Should I noindex the site search page? It is generating 4% of my organic traffic.
I read about some recommendations to noindex the URL of the site search.
Intermediate & Advanced SEO | | lcourse
Checked in analytics that site search URL generated about 4% of my total organic search traffic (<2% of sales). My reasoning is that site search may generate duplicated content issues and may prevent the more relevant product or category pages from showing up instead. Would you noindex this page or not? Any thoughts?0