Similar pages: noindex or rel:canonical or disregard parameters?!
-
Hey all!
We have a hotel booking website that has search results pages per destinations (e.g. hotels in NYC is dayguest.com/nyc). Pages are also generated for destinations depending on various parameters, that can be star rating, amenities, style of the properties, etc. (e.g. dayguest.com/nyc/4stars, dayguest.com/nyc/luggagestorage, dayguest.com/nyc/luxury, etc.).
In general, all of these pages are very similar, as for example, there might be 10 hotels in NYC and all of them will offer luggage storage. Pages can be nearly identical. Come the problems of duplicate content and loss of juice by dilution.
I was wondering what was the best practice in such a situation: should I just put all pages except the most important ones (e.g. dayguest.com/nyc) as noindex? Or set it as canonical page for all variations? Or in google webmaster tool ask google to disregard the URLs for various parameters? Or do something else altogether?!
Thanks for the help!
-
Sorry, I don't think I explained (1) very well. What I mean is that you may want to gradually change the site architecture so that not all of the search options are crawlable pages. This could mean putting some filters in form variables, for example (instead of links). It could also mean making sure that certain paths always converge. There's no easy solution. This is a problem all big sites face, and it's very dependent on the platform/CMS.
With (2), a "level" could be anything. Maybe there are major cities you need to cover but everything else could stay out of the index. This really depends on your information architecture, but there's always something that's high priority and something that's low priority. If you can focus Google on the high-priority pages, it can definitely work in your favor. The trick is figuring out how to build the logic such that you can code that dynamically. I've found there's almost always an answer, but it can take some creative thinking. I definitely don't encourage doing it manually.
If the results are easy to group by city and you can code that logic, the canonical may be fine. Since the search results could be different in some cases, canonical isn't technically the best choice, but it does often work. It really depends on how different they can be, so it's a bit tricky.
-
Honestly, option 1 would be a nightmare. Imagine that we add one property in a city not covered. There are about 50 amenities, and most hotels feature most, so as much new pages generated. That would become quickly unmanageable, to handle manually.
Not sure I understand your second option. There are not several "level", only one under the "city" in which the property is. But mutliplied by several cities, they quickly become hundreds, if not thousands.
Why would it not be possible/desirable to code all such pages as canonical pages of each city?
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
Ugh - that's what I was afraid you'd say. Unfortunately, the coincidental problem can't really be easily solved with code, which makes it hard to use canonical tags. There's no good way to tell the site when to use them.
So, a couple of options:
(1) Try to gradually rework the structure so that there are less of these paths.
(2) Consider using META NOINDEX on some lower-value paths. Internal search results don't have great value for Google, so you could let the major categories/options be indexed, but the cut off a certain level (index nothing "below" it). That may be more feasible from a code standpoint.
(3) Use rel=prev/next, use unique TITLEs if possible (based on the query) and just clean things up the best you can, but leave everything indexed.
It depends a lot on your scope, structure, and your future plans. I'm not sure there's one "right" answer.
-
These pages return the same results coincidentally, that's the issue... The more properties we get on board, the less likely it is that these pages will be similar. But it might take a long time to build that up, and we may never achieve it.
-
Ah, got it - yeah, I think rel=canonical would be fine there, but I'd want to understand your architecture better. Are these pages returning the same results coincidentally, or are these two URLs that basically land on the same combination of search options/filters. If it's the former, it's a lot tougher, because that's just a coincidence happening at large scale. If it's the latter, a solid canonical scheme could help a lot, but I'd also explore whether these paths are useful (or should be indexed at all). In other words, in the long term, it might be better to use one URL consistently, even if people navigate by different paths to reach it.
-
That's odd, they were supposed to be the same. And yeah, results come and go as properties are added/removed from our inventory.
The following is what I wanted to highlight:
http://www.dayguest.com/rome-dayuse/concierge
http://www.dayguest.com/rome-dayuse/air-conditioning
As you can see, the pages are identical, except that one has 5 properties and the other one has 6. Most overlap. There are so manies property "features" or "category", that some list have exactly the same list. Actually, SEOMOZ find that I have over 1700 pages with duplicate content, most being search results page with closely similar contents such as these.
Hence my issue...
-
Are they duplicates in the sense that there are currently no results? I wouldn't generally use rel=canonical on these, because the search results should (theoretically) be different. These are distinct regions and, I assume, have unique properties.
If they're just returning no results, I'd actually consider a META NOINDEX until there are results available. Otherwise, this is likely to be treated as a soft 404 by Google (not a disaster, honestly). It depends on whether results come and go or if you're just building out the site and there will be data later. If the data isn't ready, I think META NOINDEX is a good way to go. Until results are available, these pages have no search value.
-
Well, let me give you an example, look at this page: http://www.dayguest.com/milan-city-centre-dayuse?amenities=10
And this page: http://www.dayguest.com/milan-central-station-dayuse?amenities=10
Do you see what I'm talking about? The pages are identical but for the page title/description & a few words on the page.
So, you'd go for canonical?
-
The relation is more hierarchal then next/previous. Judging from the post you mentioned, canonical would be more appropriate...
-
Sorry, I'm not clear on whether these are paginated search results or actual property pages that vary only by a small amount. As @SEO5 said, if these are paginated search results, you could use rel=prev/next. It's a bit tricky to set up with search filters (you need rel=prev/next + rel=canonical).
If these are nearly identical property pages, then it depends on how they differ. If they only differ by one attribute, I'd probably lean toward the canonical tag.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Customer Reviews on Product Page / Pagination / Crawl 3 review pages only
Hi experts, I present customer feedback, reviews basically, on my website for the products that are sold. And with this comes the ability to read reviews and obviously with pagination to display the available reviews. Now I want users to be able to flick through and read the reviews to help them satisfy whatever curiosity they have. My only thinking is that the page that contains the reviews, with each click of the pagination will present roughly the same content. The only thing that changes is the title tags which will contain the number in the H1 to display the page number. I'm thinking this could be duplication but i have yet to be notified by Google in my Search console... Should i block crawlers from crawling beyond page 3 of reviews? Thanks
Technical SEO | | Train4Academy.co.uk0 -
Very wierd pages. 2900 403 errors in page crawl for a site that only has 140 pages.
Hi there, I just made a crawl of the website of one of my clients with the crawl tool from moz. I have 2900 403 errors and there is only 140 pages on the website. I will give an exemple of what the crawl error gives me. | http://www.mysite.com/en/www.mysite.com/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | http://www.mysite.com/en/www.mysite.com/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/en/index.html#?lang=en | | | | | | | | | | There are 2900 pages like this. I have tried visiting the pages and they work, but they are only html pages without CSS. Can you guys help me to see what the problems is. We have experienced huge drops in traffic since Septembre.
Technical SEO | | H.M.N.0 -
Removing a canonical tag from Pagination pages
Hello, Currently on our site we have the rel=prev/next markup for pagination along with a self pointing canonical via the Yoast Plugin. However, on page 2 of our paginated series, (there's only 2 pages currently), the canonical points to page one, rather than page 2. My understanding is that if you use a canonical on paginated pages it should point to a viewall page as opposed to page one. I also believe that you don't need to use both a canonical and the rel=prev/next markup, one or the other will do. As we use the markup I wanted to get rid of the canonical, would this be correct? For those who use the Yoast Plugin have you managed to get that to work? Thanks!
Technical SEO | | jessicarcf0 -
Does Google add parameters to the URL parameters in webmaster tools/
I am seeing new parameters added (and sometimes removed) from the URL Parameter tool. Is there anything that would add parameters to the tool? Or does it have to be someone internally? FYI - They always have no date in the configured column, no effect set, and crawl is set to Let Google decide.
Technical SEO | | merch_zzounds0 -
If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect?
If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect? If this scenario requires a 301 redirect no matter what, I might as well update the URL to be a little more keyword rich for the page while I'm at it. However, since these pages are ranking well I'd rather not lose any authority in the process and keep the URL just stripped of the ".html" (if that's possible). Thanks for you help! [edited for formatting]
Technical SEO | | Booj0 -
Home page canonical issues
Hi, I've noticed I can access/view a client's site's home page using the following URL variations - http://example.com/
Technical SEO | | simon-145328
http://example/index.html
http://www.example.com/
http://www.example.com/index.html There's been no preference set in Google WMT but Google has indexed and features this URL - http://example.com/ However, just to complicate matters, the vast majority of external links point to the 'www' version. Obviously i would like to tidy this up and have asked the client's web development company if they can place 301 redirects on the domains we no longer want to work - I received this reply but I'm not sure whether this does take care of the duplicate issue - Understand what you're saying, but this shouldn't be an issue regarding SEO. Essentially all the domains listed are linking to the same index.html page hosted at 1 location My question is, do i need to place 301 redirects on the domains we don't want to work and do i stick with the 'non www' version Google has indexed and try to change the external links so they point to the 'non www' version or go with the 'www' version and set this as the preferred domain in Google WMT? My technical knowledge in this area is limited so any help would be most appreciated. Regards,
Simon.0 -
NOINDEX,NOFOLLOW - Any SEO benefit to these pages?
Hi I could use some advice on a site architecture decision. I am developing something akin to an affiliate scheme for my business. However it is not quite as simple as an affliate setup because the products sold through "affiliates" will be slightly different, as a result I intend to run the site from a subdomain of my main domain. I am intending to NOINDEX,NOFOLLOW the subdomained site because it will contain huge amounts of duplication from my main site (it is really a subset of the main site with some slightly different functionality in places). I don't really want or need this subdomain site indexed, hence my decision to NOINDEX,NOFOLLOW it. However given I will, hopefully, be having lots of people link into the subdomain I am hoping to come up with some sort of arrangement that will mean that my main domain derives some sort of benefit from the linking. They are, after all, votes for my business so they feel like "good links". I am assuming here that a direct link into my NOFOLLOW,NOINDEX subdomain is going to provide ZERO benefit to my main domain. Happy to be corrected! The best I can come up with is to have a "landing page" on my main domain which links into parts of my main domain and then provides a link through to the subdomain site. However this feels like a bad experience from the user's point of view (i.e. land on a page and then have to click to get to the real action) and feels a bit spammy, i.e. I don't really have a good reason for this page other than linking! Equally I could NOINDEX,FOLLOW the homepage of the affiliate site and link back to the main domain from there. However this also feels a bit spammy and would be far less beneficial, I guess, because the subdomain homepage would have many more outgoing links than I envisaged for my "landing page" idea above. Also, it also looks a bit spammy (i.e. why follow the homepage and nofollow everything else?)! The trouble, I guess, is that whatever I do feels a bit spammy. I suppose this is because IT IS spammy! 🙂 Has anyone got any good ideas how I could setup an arrangement like I described above and derive benefit to my main domain without it looking (or being) spammy? I just hate to think of all of those links being wasted (in an SEO sense). Thanks Gary
Technical SEO | | gtrotter6660 -
Is having "rel=canonical" on the same page it is pointing to going to hurt search?
i like the rel=canonical tag and i've seen matt cutts posts on google about this tag. for the site i'm working on, it's a great workaround because we often have two identical or nearly identical versions of pages: 1 for patients, 1 for doctors. the problem is this: the way our content management system is set up, certain pages are linked up in a number of places and when we publish, two different versions of the page are created, but same content. because they are both being made from the same content templates, if i put in the rel=canonical tag, both pages get it. so, if i have: http://www.myhospital.com/patient-condition.asp and http://www.myhospital.com/professional-condition.asp and they are both produced from the same template, and have the same content, and i'm trying to point search at http://www.myhospital.com/patient-condition.asp, but that tag appears on both pages similarly, we have various forms and we like to know where people are coming from on the site to use those forms. to the bots, it looks like there's 600 versions of particular pages, so again, rel=canonical is great. however, because it's actually all the same page, just a link with a variable tacked on (http://www.myhospital.com/makeanappointment.asp?id=211) the rel=canonical tag will appear on "all" of them. any insight is most appreciated! thanks! brett
Technical SEO | | brett_hss0