Can I, in Google's good graces, check for Googlebot to turn on/off tracking parameters in URLs?

KenShafer

Basically, we use a number of parameters in our URLs for event tracking. Google could be crawling an infinite number of these URLs. I'm already using the canonical tag to point at the non-tracking versions of those URLs....that doesn't stop the crawling tho.

I want to know if I can do conditional 301s or just detect the user agent as a way to know when to NOT append those parameters.

Just trying to follow their guidelines about allowing bots to crawl w/out things like sessionID...but they don't tell you HOW to do this.

Thanks!

john4math

No problem Ashley!

It sounds like that would fall under cloaking, albeit pretty benign as far as cloaking goes. There's some more info here. The Matt Cutts video on that page has a lot of good information. Apparently any cloaking is against Google's guidelines. I would suspect you could get away with it, but I'd be worried everyday about a Google penalty getting handed down.

KenShafer

The syntax is correct. Assuming the site: and inurl: operators work in Bing, as they do in Google, then Bing is not indexing URLs with the parameters.

That article you've referred to only tells how to sniff out Google...one of a couple. What it doesn't tell me, unfortunately, is if there are any consequences of doing so and taking some kind of action...like shutting off the event tracking parameters in this case.

Just to be clear...thanks a bunch for helping out!

john4math

My sense from what you told me is that canonicals should be working in your case. What you're trying to use them for is what they're intended to do. You're sure the syntax is correct, and they're in the of the page or being set in the HTTP header?

Google does set it up so you can sniff out Googlebot and return different content (see here), but that would be unusual to do given the circumstances. I doubt you'd get penalized for cloaking for redirecting parameterized URLs to canonical ones for only Googlebot, but I'd still be nervous about doing it.

Just curious, is Bing respecting the canonicals?

KenShafer

Yeah, we can't noindex anything because there literally is NO way to crawl the site without picking up tracking parameters.

So we're saying that there is literally no good/approved way to say "oh look, it's google. let's make sure we don't put any of these params on the URL."? Is that the consensus?

john4math

If these duplicate pages have URLs that are appearing in search results, then the canonicals aren't working or Google just hasn't tried to reindex those pages yet. If the pages are duplicates, and you've set the canonical correctly, and entered them in Google Webmaster Tools, over time those pages should drop out of the index as Google reindexes them. You could try submitting a few of these URLs with parameters to Google to reindex manually in Google Webmaster Tools, and see if afterward they disappear from the results pages. If they do, then it's just a matter of waiting for Googlebot to find them all.

If that doesn't work, you could try something tricky like adding meta noindex tags to the pages with URL parameters, wait until they fall out of the index, and then add canonical tags back on, and see if those pages come back into the SERPs. If they do, then Google is ignoring your canonical tags. I hate to temporarily noindex any pages like this... but if they're all appearing separately in the SERPs anyhow, then they're not pooling their link juice properly anyway.

KenShafer

Thank you for your response. Even if I tell them that the parameters don't alter content, which I have, that doesn't stop how many pages google has to crawl. That's my main concern...that googlebot is spending too much time on these alternate URLs.

Plus there are millions of these param-laden URLs in the index, regardless of the canonical tag. There is currently no way for google to crawl the site without parameters that change constantly throughout each visit. This can't be optimal.

john4math

You're doing the right thing by adding canonicals to those pages. You can also go into Google Webmaster Tools and let them know that those URL parameters don't change the content of the pages. This really is the bread and butter of canonical tags. This is the problem they're supposed to solve.

I wouldn't sniff out Googlebot just to 301 those URLs with parameters to the canonical versions. The canonicals should be sufficient. If you do want to sniff out Googlebot, Google's directions are here. You don't do it by user agent, you do a reverse DNS lookup. Again, I would not do this in your case.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Can I, in Google's good graces, check for Googlebot to turn on/off tracking parameters in URLs?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

Pages excluded from Google's index due to "different canonicalization than user"

What's wrong with the algorithm?

Blacklisted website no longer blacklisted, but will not appear on Google's search engine.

URL Parameter & crawl stats

Is 301 redirecting your index page to the root '/' safe to do or do you end up in an endless loop?

How important is it to clarify URL parameters?

Export list of urls in google's index?