Similar pages: noindex or rel:canonical or disregard parameters?!
-
Hey all!
We have a hotel booking website that has search results pages per destinations (e.g. hotels in NYC is dayguest.com/nyc). Pages are also generated for destinations depending on various parameters, that can be star rating, amenities, style of the properties, etc. (e.g. dayguest.com/nyc/4stars, dayguest.com/nyc/luggagestorage, dayguest.com/nyc/luxury, etc.).
In general, all of these pages are very similar, as for example, there might be 10 hotels in NYC and all of them will offer luggage storage. Pages can be nearly identical. Come the problems of duplicate content and loss of juice by dilution.
I was wondering what was the best practice in such a situation: should I just put all pages except the most important ones (e.g. dayguest.com/nyc) as noindex? Or set it as canonical page for all variations? Or in google webmaster tool ask google to disregard the URLs for various parameters? Or do something else altogether?!
Thanks for the help!
-
Sorry, I don't think I explained (1) very well. What I mean is that you may want to gradually change the site architecture so that not all of the search options are crawlable pages. This could mean putting some filters in form variables, for example (instead of links). It could also mean making sure that certain paths always converge. There's no easy solution. This is a problem all big sites face, and it's very dependent on the platform/CMS.
With (2), a "level" could be anything. Maybe there are major cities you need to cover but everything else could stay out of the index. This really depends on your information architecture, but there's always something that's high priority and something that's low priority. If you can focus Google on the high-priority pages, it can definitely work in your favor. The trick is figuring out how to build the logic such that you can code that dynamically. I've found there's almost always an answer, but it can take some creative thinking. I definitely don't encourage doing it manually.
If the results are easy to group by city and you can code that logic, the canonical may be fine. Since the search results could be different in some cases, canonical isn't technically the best choice, but it does often work. It really depends on how different they can be, so it's a bit tricky.
-
Honestly, option 1 would be a nightmare. Imagine that we add one property in a city not covered. There are about 50 amenities, and most hotels feature most, so as much new pages generated. That would become quickly unmanageable, to handle manually.
Not sure I understand your second option. There are not several "level", only one under the "city" in which the property is. But mutliplied by several cities, they quickly become hundreds, if not thousands.
Why would it not be possible/desirable to code all such pages as canonical pages of each city?
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
These pages return the same results coincidentally, that's the issue... The more properties we get on board, the less likely it is that these pages will be similar. But it might take a long time to build that up, and we may never achieve it.
-
Ah, got it - yeah, I think rel=canonical would be fine there, but I'd want to understand your architecture better. Are these pages returning the same results coincidentally, or are these two URLs that basically land on the same combination of search options/filters. If it's the former, it's a lot tougher, because that's just a coincidence happening at large scale. If it's the latter, a solid canonical scheme could help a lot, but I'd also explore whether these paths are useful (or should be indexed at all). In other words, in the long term, it might be better to use one URL consistently, even if people navigate by different paths to reach it.
-
That's odd, they were supposed to be the same. And yeah, results come and go as properties are added/removed from our inventory.
The following is what I wanted to highlight:
http://www.dayguest.com/rome-dayuse/concierge
http://www.dayguest.com/rome-dayuse/air-conditioning
As you can see, the pages are identical, except that one has 5 properties and the other one has 6. Most overlap. There are so manies property "features" or "category", that some list have exactly the same list. Actually, SEOMOZ find that I have over 1700 pages with duplicate content, most being search results page with closely similar contents such as these.
Hence my issue...
-
Are they duplicates in the sense that there are currently no results? I wouldn't generally use rel=canonical on these, because the search results should (theoretically) be different. These are distinct regions and, I assume, have unique properties.
If they're just returning no results, I'd actually consider a META NOINDEX until there are results available. Otherwise, this is likely to be treated as a soft 404 by Google (not a disaster, honestly). It depends on whether results come and go or if you're just building out the site and there will be data later. If the data isn't ready, I think META NOINDEX is a good way to go. Until results are available, these pages have no search value.
-
Well, let me give you an example, look at this page: http://www.dayguest.com/milan-city-centre-dayuse?amenities=10
And this page: http://www.dayguest.com/milan-central-station-dayuse?amenities=10
Do you see what I'm talking about? The pages are identical but for the page title/description & a few words on the page.
So, you'd go for canonical?
-
The relation is more hierarchal then next/previous. Judging from the post you mentioned, canonical would be more appropriate...
-
Sorry, I'm not clear on whether these are paginated search results or actual property pages that vary only by a small amount. As @SEO5 said, if these are paginated search results, you could use rel=prev/next. It's a bit tricky to set up with search filters (you need rel=prev/next + rel=canonical).
If these are nearly identical property pages, then it depends on how they differ. If they only differ by one attribute, I'd probably lean toward the canonical tag.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Use 301 or rel=canonical
I have a page on my site that is showing in search results at #9. I created another page on my site with the search term in the url. Wondering if I 301 or rel=canonical. Thank you, Kerry
Technical SEO | | Hydraulicgirl0 -
Rel=canonical - Identical .com and .us Version of Site
We have a .us and a .com version of our site that we direct customers to based on location to servers. This is not changing for the foreseeable future. We had restricted Google from crawling the .us version of the site and all was fine until I started to see the https version of the .us appearing in the SERPs for certain keywords we keep an eye on. The .com still exists and is sometimes directly above or under the .us. It is occasionally a different page on the site with similar content to the query, or sometimes it just returns the exact same page for both the .com and the .us results. This has me worried about duplicate content issues. The question(s): Should I just get the https version of the .us to not be crawled/indexed and leave it at that or should I work to get a rel=canonical set up for the entire .us to .com (making the .com the canonical version)? Are there any major pitfalls I should be aware of in regards to the rel=canonical across the entire domain (both the .us and .com are identical and these newly crawled/indexed .us pages rank pretty nicely sometimes)? Am I better off just correcting it so the .us is no longer crawled and indexed and leaving it at that? Side question: Have any ecommerce guys noticed that Googlebot has started to crawl/index and serve up https version of your URLs in the SERPs even if the only way to get into those versions of the pages are to either append the https:// yourself to the URL or to go through a sign in or check out page? Is Google, in the wake of their https everywhere and potentially making it a ranking signal, forcing the check for the https of any given URL and choosing to index that? I just can't figure out how it is even finding those URLs to index if it isn't seeing http://www.example.com and then adding the https:// itself and checking... Help/insight on either point would be appreciated.
Technical SEO | | TLM0 -
Is there a tool or other way to see which of my website pages employ noindex tag?
Hi guys I am checking my website for possible technical issues and was wondering if there is a tool or other way to see which of my pages employ the head noindex tag if any. Do you happen to know? Thanks Lily
Technical SEO | | wspwsp0 -
Beginner - needs to better understand rel=canonical. What is the best resource?
I'm pretty sure I have pages/posts that are competing on the same keyword and would like to fix it. What is the best beginners guide to understanding rel=canonical and how to use it to improve our SEO?
Technical SEO | | JonnyBird10 -
Should I implement pagination(rel=next, rel=prev) if I have duplicate meta tags?
Hi, I just want to ask if it is necessary to implement pagination(rel=next, rel=prev) to my category pages because Google webmaster tools is telling me that these pages are having similar meta title and meta description. Ex. page1: http://www.site.com/iphone-resellers/1 meta title:Search for iphone resellers in US page2:http://www.site.com/iphone-resellers/2 meta title:Search for iphone resellers in US page3:http://www.site.com/iphone-resellers/3 meta title:Search for iphone resellers in US Thanks in advance. 🙂
Technical SEO | | esiow20130 -
According to 1 of my PRO campaigns - I have 250+ pages with Duplicate Content - Could my empty 'tag' pages be to blame?
Like I said, my one of my moz reports is showing 250+ pages with duplicate content. should I just delete the tag pages? Is that worth my time? how do I alert SEOmoz that the changes have been made, so that they show up in my next report?
Technical SEO | | TylerAbernethy0 -
Querystring params, rel canonical and SEO
I know ideally you should have as clean as possible url structures for optimal SEO. Our current site contains clean urls with very minimal use of query string params. There is a strong push, for business purposes to include click tracking on our site which will append a query string param to a large percentage of our internal links. Currently: http://www.oursite.com/section/content/ Will change to: http://www.oursite.com/section/content/?tg=zzzzwww We currently use rel canonical on all pages to properly define the true url in order to remove any possible duplicate content issues. Given we are already using rel canonical, if we implement the query string click tracking, will this negatively impact our SEO? If so, by how much? Could we run into duplicate content issues? We get crawled by Google a lot (very big site) and very large percent of our traffic is from Google, but there is a strong business need for this information so trying to weigh pros/cons.
Technical SEO | | NicB10 -
Rel=canonical issue
Re. http://www.appetise.com. We have been alerted that we are "not making appropriate use of the rel=canonical tag". Please could someone just clarify this for us and let us know the recommended remedial action we need to take to rectify the issue? Many Thanks, RB
Technical SEO | | E-resistible0