Disallowing URL Parameters vs. Canonicalizing

Alces

Hi all,

I have a client that has a unique search setup. So they have Region pages (/state/city). We want these indexed and are using self-referential canonicals.

They also have a search function that emulates the look of the Region pages. When you search for, say, Los Angeles, the URL changes to _/search/los+angeles _and looks exactly like /ca/los-angeles.

These search URLs can also have parameters (/search/los+angeles?age=over-2&time[]=part-time), which we obviously don't want indexed.

Right now my concern is how best to ensure the /search pages don't get indexed and we don't get hit with duplicate content penalties. The options are this:

Self-referential canonicals for the Region pages, and disallow everything after the second slash in /search/ (so the main search page is indexed)
Self-referential canonicals for the Region pages, and write a rule that automatically canonicalizes all other search pages to /search.

Potential Concern: /search/ URLs are created even with misspellings.

Thanks!

effectdigital

Just so you know Meta no-index can be applied through the HTML but also through the HTTP header which might make it easier to implement on such a highly generated website

Alces

Yeah, I know the difference between the two, I've just been in a situation where canonicals were recommended as a means of controlling the preferred page _within an indexation context. _If that makes sense.

My biggest concern is with the creation of URLs from misspellings, which still return search results if it's close enough. The redirects could work. Honestly that wasn't something we considered.

I'm liking the noindex approach. They'd have to write a rule that applies it to every page created with a search parameter, which I think they should be able to do.

If it helps, almost the entire site is run by Javascript. Like...everything.

Thanks for the advice. Much appreciated.

-Brad

effectdigital

Robots.txt controls crawling, not indexation. Google will still sometimes index pages they cannot crawl. Canonical tags are for duplicate content consolidation, but are not a hard signal and Google frequently ignores them. Meta no-index tags (or X-robots no-index through the HTTP header, if you cannot apply Meta no-index in the HTML) is a harder signal and is meant to help you control indexation

To be honest if the pages are identical why not just 301 redirect the relevant searches (the top-line ones, which result in pages exactly the same as your regional ones) to the regional URLs? If the pages really are the same, it won't be any different for users except for a small delay during the redirect (which won't really be felt, especially if you are using Nginx redirects)

If you can't do that, you're really left with the Meta no-index tag and the canonical tag. Canonical tags avoid content duplication penalties but are a softer signal and they don't consolidate link equity like 301 redirects do (so in many way, there's not actually that much different between Meta no-index and canonicals, except canonical tags are more complex to set up in the first place as they require a destination field)

I'd probably just Meta no-index all the search URLs. Once Google had swallowed that, I would then (after 2-3 weeks) apply the relevant robots.txt rules

If you apply them both at the same time, Google won't be able to crawl the search URLs (since your robots.txt rule will block them) and therefore they will be blind to your canonical / Meta no index directive(s). So you have to handle de-indexation first, and THEN after that block the crawling to save your crawl allowance a bit

But don't do it all at once or you'll get in an unholy mess!

jasongmcmahon

Hi there

Canonical tags prevent problems caused by identical or "duplicate" content across multiple URLs. So in this instance implement the disallow rule on al of the URLs containing /search/

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Disallowing URL Parameters vs. Canonicalizing

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Sitemaps, 404s and URL structure

How to add 301 for many urls

Keyword in URL vs organization

Why are pages linked with URL parameters showing up as separate pages with duplicate content?

What are the SEO implications of URLs that use a # in them?

Shorter URLs

URL structure

Blank Canonical URL