Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to handle sorting, filtering, and pagination in ecommerce? Canonical is enough?
Hello, after reading various articles and watching several videos I'm still not sure how to handle faceted navigation (sorting/filtering) and pagination on my ecommerce site. Current indexation status: The number of "real" pages (from my sitemap) - 2.000 pages Google Search Console (Valid) - 8.000 pages Google Search Console (Excluded) - 44.000 pages Additional info: Vast majority of those 50k additional pages (44 + 8 - 2) are pages created by sorting, filtering and pagination. Example of how the URL changes while applying filters/sorting: example.com/category --> example.com/category/1/default/1/pricefrom/100 Every additional page is canonicalized properly, yet as you can see 6k is still indexed. When I enter site:example.com/category in Google it returns at least several results (in most of the cases the main page is on the 1st position). In Google Analytics I can see than ~1.5% of Google traffic comes to the sorted/filtered pages. The number of pages indexed daily (from GSC stats) - 3.000 And so I have a few questions: Is it ok to have those additional pages indexed or will the "real" pages rank higher if those additional would not be indexed? If it's better not to have them indexed should I add "noindex" to sorting/filtering links or add eg. Disallow: /default/ in robots.txt? Or perhaps add "noindex, nofollow" to the links? Google would have then 50k pages less to crawl but perhaps it'd somehow impact my rankings in a negative way? As sorting/filtering is not based on URL parameters I can't add it in GSC. Is there another way of doing that for this filtering/sorting url structure? Thanks in advance, Andrew
Intermediate & Advanced SEO | | thpchlk0 -
Huge organic traffic drom after a perfect domain migration. What to do?
Hi, I already asked the question on different places. But so far nobody could help me.
Intermediate & Advanced SEO | | Dennis1992038
Hope someone can help me out. If possible.
I migrated my website https://vihara.nl to https://meditatieinstituut.nl and lost about 80% traffic (see printscreens). It's over more than a month ago now and there is no sign of getting it back up. Maybe there is nothing to do and
1. I have to be patient and traffic comes back in a few months.
or
2. There is nothing to do and I've lost everything I've build up in the last years. Start over again to get the rankings back.
or maybe, maybe
3. I just forgot something that I still need to do to get the rankings back up. Or there is something I did not think of... This is done: The website is migrated 1 on 1. No changes in content, url, code, etc. Everything is exactly the same as on the previous domain. 301 redirects whole domain (via htaccess a bulk redirect). All the old pages, without exceptions, lead to the exact new page. The new domain is running from CDN (Cloudflare) with the same settings as the previous domain. SSL is installed in the exact same way. Domain migration set up in Search console (working). Uploaded new sitemap (working). Updated internal links. Changed the most important external links (where I could get contact after reaching out) In meanwhile received some new external links and also posted new content Anybody knows what to do? Or do I just have to be more patient and will it come back in a few months by itself? Looking forward to suggetions. Thanks! Gerjan Migratie-Meditatie-Instituut-2048x786.jpg verloop-sinds-de-start-2048x355.jpg0 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | | damienthivolle0 -
Why do I get India, Pakistan, Turkey traffic mostly?
Hi there, I've been wondering. Why do I get most of the traffic from these countries? My sites are english, I host in USA. I don't target a thing for those countries traffic, yet I get huge amounts of traffic from these countries. Any ideas?
Intermediate & Advanced SEO | | melbog0 -
Should I noindex the site search page? It is generating 4% of my organic traffic.
I read about some recommendations to noindex the URL of the site search.
Intermediate & Advanced SEO | | lcourse
Checked in analytics that site search URL generated about 4% of my total organic search traffic (<2% of sales). My reasoning is that site search may generate duplicated content issues and may prevent the more relevant product or category pages from showing up instead. Would you noindex this page or not? Any thoughts?0 -
Does Google bot read embedded content?
Is embedded content "really" on my page? There are many addons nowadays that are used by embedded code and they bring the texts after the page is loaded. For example - embedded surveys. Are these read by the Google bot or do they in fact act like iframes and are not physically on my page? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
New Website Launch - Traffic Way Down
We launched a new website in June. Traffic plummeted after the launch, we crept back up for a couple of months, but now we are flat, nowhere near our pre-launch traffic or previous year's traffic. For the past 6 months our analytics have been worrying us - Overall traffic and new visitor traffic is down over 10%, bounce rate is up almost 35% since site launched, keywords aren't ranking where they used to, and of course, web sales are down. Is this supposed to happen when a new site is launched, and how long does a new this transition last? We have done all the technical audits, adding relevant content, we're at a loss. Any suggestions where to look next to improve traffic to pre-launch numbers?
Intermediate & Advanced SEO | | WaySEO0 -
Url structure for multiple search filters applied to products
We have a product catalog with several hundred similar products. Our list of products allows you apply filters to hone your search, so that in fact there are over 150,000 different individual searches you could come up with on this page. Some of these searches are relevant to our SEO strategy, but most are not. Right now (for the most part) we save the state of each search with the fragment of the URL, or in other words in a way that isn't indexed by the search engines. The URL (without hashes) ranks very well in Google for our one main keyword. At the moment, Google doesn't recognize the variety of content possible on this page. An example is: http://www.example.com/main-keyword.html#style=vintage&color=blue&season=spring We're moving towards a more indexable URL structure and one that could potentially save the state of all 150,000 searches in a way that Google could read. An example would be: http://www.example.com/main-keyword/vintage/blue/spring/ I worry, though, that giving so many options in our URL will confuse Google and make a lot of duplicate content. After all, we only have a few hundred products and inevitably many of the searches will look pretty similar. Also, I worry about losing ground on the main http://www.example.com/main-keyword.html page, when it's ranking so well at the moment. So I guess the questions are: Is there such a think as having URLs be too specific? Should we noindex or set rel=canonical on the pages whose keywords are nested too deep? Will our main keyword's page suffer when it has to share all the inbound links with these other, more specific searches?
Intermediate & Advanced SEO | | boxcarpress0