Disallowing URL Parameters vs. Canonicalizing
-
Hi all,
I have a client that has a unique search setup. So they have Region pages (/state/city). We want these indexed and are using self-referential canonicals.
They also have a search function that emulates the look of the Region pages. When you search for, say, Los Angeles, the URL changes to _/search/los+angeles _and looks exactly like /ca/los-angeles.
These search URLs can also have parameters (/search/los+angeles?age=over-2&time[]=part-time), which we obviously don't want indexed.
Right now my concern is how best to ensure the /search pages don't get indexed and we don't get hit with duplicate content penalties. The options are this:
-
Self-referential canonicals for the Region pages, and disallow everything after the second slash in /search/ (so the main search page is indexed)
-
Self-referential canonicals for the Region pages, and write a rule that automatically canonicalizes all other search pages to /search.
Potential Concern: /search/ URLs are created even with misspellings.
Thanks!
-
-
Just so you know Meta no-index can be applied through the HTML but also through the HTTP header which might make it easier to implement on such a highly generated website
-
Yeah, I know the difference between the two, I've just been in a situation where canonicals were recommended as a means of controlling the preferred page _within an indexation context. _If that makes sense.
My biggest concern is with the creation of URLs from misspellings, which still return search results if it's close enough. The redirects could work. Honestly that wasn't something we considered.
I'm liking the noindex approach. They'd have to write a rule that applies it to every page created with a search parameter, which I think they should be able to do.
If it helps, almost the entire site is run by Javascript. Like...everything.
Thanks for the advice. Much appreciated.
-Brad
-
Robots.txt controls crawling, not indexation. Google will still sometimes index pages they cannot crawl. Canonical tags are for duplicate content consolidation, but are not a hard signal and Google frequently ignores them. Meta no-index tags (or X-robots no-index through the HTTP header, if you cannot apply Meta no-index in the HTML) is a harder signal and is meant to help you control indexation
To be honest if the pages are identical why not just 301 redirect the relevant searches (the top-line ones, which result in pages exactly the same as your regional ones) to the regional URLs? If the pages really are the same, it won't be any different for users except for a small delay during the redirect (which won't really be felt, especially if you are using Nginx redirects)
If you can't do that, you're really left with the Meta no-index tag and the canonical tag. Canonical tags avoid content duplication penalties but are a softer signal and they don't consolidate link equity like 301 redirects do (so in many way, there's not actually that much different between Meta no-index and canonicals, except canonical tags are more complex to set up in the first place as they require a destination field)
I'd probably just Meta no-index all the search URLs. Once Google had swallowed that, I would then (after 2-3 weeks) apply the relevant robots.txt rules
If you apply them both at the same time, Google won't be able to crawl the search URLs (since your robots.txt rule will block them) and therefore they will be blind to your canonical / Meta no index directive(s). So you have to handle de-indexation first, and THEN after that block the crawling to save your crawl allowance a bit
But don't do it all at once or you'll get in an unholy mess!
-
Hi there
Canonical tags prevent problems caused by identical or "duplicate" content across multiple URLs. So in this instance implement the disallow rule on al of the URLs containing /search/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Http:// vs Https:// in Og:URL
Hi, Recently, we have migrated our website from http:// to https://. Now, every URL is in https:// and we have used 301 permanent redirection for redirecting OLD URL's to New Ones. We have planned to include http:// link in og:url instead of https:// due to some social share issues we are facing. My concern is, if Google finds the self http:// URL on every page of my blog, will Google gets confused with http and https:// as we are providing the old URL to Google for crawling. Please advice. Thanks
Technical SEO | | SameerBhatia0 -
Implications of Disallowing A LOT of Pages
Hey everyone, I just started working on a website and there are A LOT of pages that should not be crawled - probably in the thousands. Are there any SEO risks of disallowing them all at once, or should I go through systematically and take a few dozen down at a time?
Technical SEO | | rachelmeyer1 -
Which is better Title length vs. keywords?
We run a jobboard. The title tag on a page for a job is often over 70 characters. An example of one would be: " Supplier Quality Inspector (Electrical Manufacturing) Job in Orlando, FL 32809 at Pro Image Solutions | Orlando Jobs!" The company name 'Orlando Jobs!" comes at the end but is also a really good keyword e.g. 'Orlando' and 'Jobs' I am interested in suggestions as to how to make these titles better. For example take off the company name when we go over 70 characters? Move the company name to the front of the title because the company name is also good keywords? I am looking for the best way to handle the issue is all. Thanks.
Technical SEO | | JobBiz0 -
Updating content on URL or new URL
High Mozzers, We are an event organisation. Every year we produce like 350 events. All the events are on our website. A lot of these events are held every year. So i have an URL like www.domainname.nl/eventname So what would you do. This URL has some inbound links, some social mentions and so on. SO if the event will be held again in 2013. Would it be better to update the content on this URL or create a new one. I would keep this URL and update it because of the linkvalue and it is allready indexed and ranking for the desired keyword for that event. Cheers, Ruud
Technical SEO | | RuudHeijnen0 -
Approved Word Separators in URLs
Hi There, We are in the process of revamping our URL structure and my devs tell me they have a technical problem using a hyphen as a word separator. There's a whole lot of competing recommendations out there and at this point I'm just confused. Does anyone have any idea what character would be next-best to the hyphen for separating words in a URL? Any reason to prefer one over another? Some links I've found discussing the topic: This page says that "__Google has confirmed that the point (.), the comma (,) and the hyphen (-) are valid word separators in URL’s.": http://www.internetofficer.com/seo/google-word-separator/ This page suggests the plus (+) symbol would be best: http://labs.phurix.net/posts/word-separators-in-urls This guy says he's tested and there's a whole bunch of symbols that will work as word separators: http://www.webproguide.com/articles/Symbols-as-word-separators-a-look-inside-the-search-engine-logic/ I'm leaning towards the tilde (~) or the plus (+) sign. Usage would be like so: http://www.domain.com/shop/sterling~silver OR /shop/sterling+silver etc... Thanks in advance for your help!
Technical SEO | | Richline_Digital1 -
Disallowing https URLs
It there a problem disallowing all https URLs to be indexed in order to avoid duplication? This is the article recommending this practice - http://blog.leonardchallis.com/seo/serve-a-different-robots-txt-for-https/ Thanks!
Technical SEO | | theLotter0 -
URL rewriting from subcategory to category
Hello everybody! I have quite simple question about URL rewriting from subcategory to category, yet I can't find any solution to this problem (due to lack of my deeper apache programming knowledge). Here is my problem/question: we have two website url structures that causes dublicate problems: www.website.lt/language/category/ www.website.lt/language/category/1/ 1 and 2 pages are absolutely same (both also returns 200 OK). What we need is 301 redirect from 2 to 1 without any other deeper categories redirects (like www.website.com/language/category/1/169/ redirecting to .../category/1/ or .../category/). Here goes .htaccess URL rewrite rules: RewriteRule ^([^/]{1,3})/([^/]+)/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$ /index.php?lang=$1&idr=$2&par1=$3&par2=$4&par3=$5&par4=$6&%{QUERY_STRING} [L] RewriteRule ^([^/]{1,3})/([^/]+)/([^/]+)/([^/]+)/([^/]+)/$ /index.php?lang=$1&idr=$2&par1=$3&par2=$4&par3=$5&%{QUERY_STRING} [L] RewriteRule ^([^/]{1,3})/([^/]+)/([^/]+)/([^/]+)/$ /index.php?lang=$1&idr=$2&par1=$3&par2=$4&%{QUERY_STRING} [L] RewriteRule ^([^/]{1,3})/([^/]+)/([^/]+)/$ /index.php?lang=$1&idr=$2&par1=$3&%{QUERY_STRING} [L] RewriteRule ^([^/]{1,3})/([^/]+)/$ /index.php?lang=$1&idr=$2&%{QUERY_STRING} [L] RewriteRule ^([^/]{1,3})/$ /index.php?lang=$1&%{QUERY_STRING} [L] There are other redirects that handles non-www to www and related issues: RedirectMatch 301 ^/lt/$ http://www.domain.lt/ RewriteCond %{HTTP_HOST} ^domain.lt RewriteRule (.*) http://www.domain.lt/$1 [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_URI} !(.)/$RewriteRule ^(.)$ http://www.domain.lt/$1/ [R=301,L] At this moment we cannot solve this problem with rel canonical (due to our CMS limits). Thanks for your help guys! If You need any other details on our coding, just let me know.
Technical SEO | | jkundrotas0 -
Blocking Google from Crawling Parameters
Hi guys: What is the best way to keep Google from crawling certain urls with parameters? I used the setting in Webmaster Tools, but that doesn't seem to be helping at all. Can I use robots.txt or some other method? Thanks! Some examples are: <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/assistive-technology?manufacturer=179 www.mayer-johnson.com/category/assistive-technology?manufacturer=226 www.mayer-johnson.com/category/assistive-technology?manufacturer=227 <colgroup><col width="797"></colgroup> www.mayer-johnson.com/category/english-language-learners?condition=212 www.mayer-johnson.com/category/english-language-learners?condition=213 www.mayer-johnson.com/category/english-language-learners?condition=214 <colgroup><col width="797"></colgroup>
Technical SEO | | DanaDV
| www.mayer-johnson.com/category/english-language-learners?roles=164 |
| www.mayer-johnson.com/category/english-language-learners?roles=165 |
| www.mayer-johnson.com/category/english-language-learners?roles=197 | | |0