Disallowing URL Parameters vs. Canonicalizing
-
Hi all,
I have a client that has a unique search setup. So they have Region pages (/state/city). We want these indexed and are using self-referential canonicals.
They also have a search function that emulates the look of the Region pages. When you search for, say, Los Angeles, the URL changes to _/search/los+angeles _and looks exactly like /ca/los-angeles.
These search URLs can also have parameters (/search/los+angeles?age=over-2&time[]=part-time), which we obviously don't want indexed.
Right now my concern is how best to ensure the /search pages don't get indexed and we don't get hit with duplicate content penalties. The options are this:
-
Self-referential canonicals for the Region pages, and disallow everything after the second slash in /search/ (so the main search page is indexed)
-
Self-referential canonicals for the Region pages, and write a rule that automatically canonicalizes all other search pages to /search.
Potential Concern: /search/ URLs are created even with misspellings.
Thanks!
-
-
Just so you know Meta no-index can be applied through the HTML but also through the HTTP header which might make it easier to implement on such a highly generated website
-
Yeah, I know the difference between the two, I've just been in a situation where canonicals were recommended as a means of controlling the preferred page _within an indexation context. _If that makes sense.
My biggest concern is with the creation of URLs from misspellings, which still return search results if it's close enough. The redirects could work. Honestly that wasn't something we considered.
I'm liking the noindex approach. They'd have to write a rule that applies it to every page created with a search parameter, which I think they should be able to do.
If it helps, almost the entire site is run by Javascript. Like...everything.
Thanks for the advice. Much appreciated.
-Brad
-
Robots.txt controls crawling, not indexation. Google will still sometimes index pages they cannot crawl. Canonical tags are for duplicate content consolidation, but are not a hard signal and Google frequently ignores them. Meta no-index tags (or X-robots no-index through the HTTP header, if you cannot apply Meta no-index in the HTML) is a harder signal and is meant to help you control indexation
To be honest if the pages are identical why not just 301 redirect the relevant searches (the top-line ones, which result in pages exactly the same as your regional ones) to the regional URLs? If the pages really are the same, it won't be any different for users except for a small delay during the redirect (which won't really be felt, especially if you are using Nginx redirects)
If you can't do that, you're really left with the Meta no-index tag and the canonical tag. Canonical tags avoid content duplication penalties but are a softer signal and they don't consolidate link equity like 301 redirects do (so in many way, there's not actually that much different between Meta no-index and canonicals, except canonical tags are more complex to set up in the first place as they require a destination field)
I'd probably just Meta no-index all the search URLs. Once Google had swallowed that, I would then (after 2-3 weeks) apply the relevant robots.txt rules
If you apply them both at the same time, Google won't be able to crawl the search URLs (since your robots.txt rule will block them) and therefore they will be blind to your canonical / Meta no index directive(s). So you have to handle de-indexation first, and THEN after that block the crawling to save your crawl allowance a bit
But don't do it all at once or you'll get in an unholy mess!
-
Hi there
Canonical tags prevent problems caused by identical or "duplicate" content across multiple URLs. So in this instance implement the disallow rule on al of the URLs containing /search/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Technical URL SEO question
Hi All, We sell a product on our site which is displayed in cubic metres, from an SEO perspective is it ok to have /3m³ in the URL or should I use 3m3. Thanks All
Technical SEO | | Redooo0 -
Http:// vs Https:// in Og:URL
Hi, Recently, we have migrated our website from http:// to https://. Now, every URL is in https:// and we have used 301 permanent redirection for redirecting OLD URL's to New Ones. We have planned to include http:// link in og:url instead of https:// due to some social share issues we are facing. My concern is, if Google finds the self http:// URL on every page of my blog, will Google gets confused with http and https:// as we are providing the old URL to Google for crawling. Please advice. Thanks
Technical SEO | | SameerBhatia0 -
URL Parameters as pagination
Hi guys, due to some changes to our category pages our paginated urls will change so they will look like this: ...category/bagger/2?q=Bagger&startDate=26.06.2017&endDate=27.06.2017 You see they include a query parameter as well as a start and end date which will change daily. All URLs with pagination are on noindex/follow. I am worrying that the products which are linked from the category pages will not get crawled well when the URLs on which they are linked from change on a daily basis. Do you have some experience with this? Are there other things we need to worry about with these pagination URLs? cheers
Technical SEO | | JKMarketing0 -
Using # in parameters?
I am trying to understand why a website would use # instead of a ? for its parameters? I have put an example of the URL below: http://www.warehousestationery.co.nz/office-supplies/adhesives-tapes-and-fastenings#prefn1=brand&prefn2=colour&prefv1=Command&prefv2=Clear Any help would be much appreciated.
Technical SEO | | CaitlinDW1 -
Question about creating friendly URLs
I am working on creating new SEO friendly URLs for my company website. The products are the items with the highest search volume and each is very geo-specific
Technical SEO | | theLotter
There is not a high search volume for the geo-location associated with the product, but the searches we do get convert well. Do you think it is preferable to leave the location out of the URL or include it?0 -
No Keyword in URL
SEOMoz (and other platforms) advise that I need to add my keyword to the page URL, however as far as I'm concerned it has been, so why don't these platforms see it. My home page URL is www.salesandinternetmarketing.com, but apparently I haven't added the keyword internet marketing to the URL, what advice can you give me please? Lindsay
Technical SEO | | lindsayjhopkins1 -
How do I use only one URL
my site can be reach by both www.site.com and site.com. How do I make it only use www?
Technical SEO | | Weblion0 -
301 Redirecting weird URLs with % in them
I've been working on redirecting links reported as 404 in Google webmaster tools. I've stumbled upon 41 URLs that Google is reporting as 404 that include a '%' in the URL, but I don't know how to redirect. Here is an example: URL: bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ Attempted redirect: redirect 301 /bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ http://www.mysite.com/ Unfortunately, after implementing the redirect, http://www.mysite.com/bond_information.htm%20Surety%20Bond%20Information,%20with%20FAQ still resolves a 404 error. Anyone successfully fix these errors using Apache .htaccess?
Technical SEO | | TheDude0