How to prevent duplicate content within this complex website?
-
I have a complex SEO issue I've been wrestling with and I'd appreciate your views on this very much. I have a sports website and most visitors are looking for the games that are played in the current week (I've studied this - it's true). We're creating a new website from scratch and I want to do this is as best as possible. We want to use the most elegant and best way to do this. We do not want to use work-arounds such as iframes, hiding text using AJAX etc. We need a solid solution for both users and search engines.
Therefor I have written down three options:
- Using a canonical URL;
- Using 301-redirects;
- Using 302-redirects.
Introduction
The page 'website.com/competition/season/week-8' shows the soccer games that are played in game week 8 of the season. The next week users are interested in the games that are played in that week (game week 9). So the content a visitor is interested in, is constantly shifting because of the way competitions and tournaments are organized. After a season the same goes for the season of course.
The website we're building has the following structure:
- Competition (e.g. 'premier league')
- Season (e.g. '2011-2012')
- Playweek (e.g. 'week 8')
- Game (e.g. 'Manchester United - Arsenal')
- Playweek (e.g. 'week 8')
- Season (e.g. '2011-2012')
This is the most logical structure one can think of. This is what users expect.
Now we're facing the following challenge: when a user goes to http://website.com/premier-league he expects to see a) the games that are played in the current week and b) the current standings. When someone goes to http://website.com/premier-league/2011-2012/ he expects to see the same: the games that are played in the current week and the current standings. When someone goes to http://website.com/premier-league/2011-2012/week-8/ he expects to the same: the games that are played in the current week and the current standings.
So essentially there's three places, within every active season within a competition, within the website where logically the same information has to be shown.
To deal with this from a UX and SEO perspective, we have the following options:
Option A - Use a canonical URL
Using a canonical URL could solve this problem. You could use a canonical URL from the current week page and the Season page to the competition page:
So:
- the page on 'website.com/$competition/$season/playweek-8' would have a canonical tag that points to 'website.com/$competition/'
- the page on 'website.com/$competition/$season/' would have a canonical tag that points to 'website.com/$competition/'
The next week however, you want to have the canonical tag on 'website.com/$competition/$season/playweek-9' and the canonical tag from 'website.com/$competition/$season/playweek-8' should be removed.
So then you have:
- the page on 'website.com/$competition/$season/playweek-9' would have a canonical tag that points to 'website.com/$competition/'
- the page on 'website.com/$competition/$season/' would still have a canonical tag that points to 'website.com/$competition/'
In essence the canonical tag is constantly traveling through the pages.
Advantages:
- UX: for a user this is a very neat solution. Wherever a user goes, he sees the information he expects. So that's all good.
- SEO: the search engines get very clear guidelines as to how the website functions and we prevent duplicate content.
Disavantages:
- I have some concerns regarding the weekly changing canonical tag from a SEO perspective. Every week, within every competition the canonical tags are updated. How often do Search Engines update their index for canonical tags? I mean, say it takes a Search Engine a week to visit a page, crawl a page and process a canonical tag correctly, then the Search Engines will be a week behind on figuring out the actual structure of the hierarchy. On top of that: what do the changing canonical URLs to the 'quality' of the website? In theory this should be working all but I have some reservations on this.
- If there is a canonical tag from 'website.com/$competition/$season/week-8', what does this do to the indexation and ranking of it's subpages (the actual match pages)
Option B - Using 301-redirects
Using 301-redirects essentially the user and the Search Engine are treated the same. When the Season page or competition page are requested both are redirected to game week page.
The same applies here as applies for the canonical URL: every week there are changes in the redirects.
So in game week 8:
- the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-8'
- the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-8'
A week goes by, so then you have:
- the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-9'
- the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-9'
Advantages
- There is no loss of link authority.
Disadvantages
- Before a playweek starts the playweek in question can be indexed. However, in the current playweek the playweek page 301-redirects to the competition page. After that week the page's 301-redirect is removed again and it's indexable.
- What do all the (changing) 301-redirects do to the overall quality of the website for Search Engines (and users)?
Option C - Using 302-redirects
Most SEO's will refrain from using 302-redirects. However, 302-redirect can be put to good use: for serving a temporary redirect.
Within my website there's the content that's most important to the users (and therefor search engines) is constantly moving. In most cases after a week a different piece of the website is most interesting for a user. So let's take our example above. We're in playweek 8.
If you want 'website.com/$competition/' to be redirecting to 'website.com/$competition/$season/week-8/' you can use a 302-redirect. Because the redirect is temporary
The next week the 302-redirect on 'website.com/$competition/' will be adjusted. It'll be pointing to 'website.com/$competition/$season/week-9'.
Advantages
- We're putting the 302-redirect to its actual use.
- The pages that 302-redirect (for instance 'website.com/$competition' and 'website.com/$competition/$season') will remain indexed.
Disadvantages
- Not quite sure how Google will handle this, they're not very clear on how they exactly handle a 302-redirect and in which cases a 302-redirect might be useful. In most cases they advise webmasters not to use it.
I'd very much like your opinion on this. Thanks in advance guys and galls!
-
Hi Andy and Peter, thanks for your response.
@Andy: the rel=next and rel=prev markup won't really help in solving the problem we had. We will use it though because it's very helpful.
@Peter: yeah it's been something we've been struggling with for a while but we've finally made a decision on it.
The /current solution wasn't really a good solution because at the start of a season all the gameweeks are planned and created so it would become quite complex. We've done some calculations on how much duplicate content we would have if we would not use any of the redirects of canonical tags and the percentage of DC is very small (below 1%) so we're going to put our faith in Google's hands and let them figure it out. It's a good quality website with loads of links we're talking about so I don't expect to much issues. We'll monitor it closely though and stand by to interfere when needed.
Anyways, thanks for your suggestions. Although it didn't solve my problem 1:1 it did make me think and make a decision.
Bye, Steven
-
Yeah, time-sensitive information is always tough. I think you're dead on about the disadvantages - the timing of Google's application of these rotating tags would always be off, and you could end up with some really weird search results that are not only bad for SEO but could create bad UX (people landing on old pages thinking they're new).
What about another option - could you take more of a news/blog approach and have a "/current" page that is always the current week? As the current week changes, roll that content into an archive page ("/week8", etc.). That way, the content lives on, but the current URL never changes.
In terms of duplication, is this really full duplication? It sounds like some pages (like the season) just have snippets of the current week. That's not necessarily a problem. If they are very similar, could you "widgetize" it somehow? Could be straight HTML, but use a condensed format for the season page that links to the full version on the current week page. This would be much like a snippet of a blog post - instead of repeating everything on all 3 pages, have one main chunk of content and two summaries.
-
Hi,
Does the rel=next, rel=prev markup help you out with this problem? See http://googlewebmastercentral.blogspot.co.uk/2011/09/pagination-with-relnext-and-relprev.html
Ive used it a couple of times to help stop pages been seen as dupe content where those pages are duplicate (meta, main content, images etc) except for example reviews, or comments e.g. /product_x /product_x_review_page1 /product_x_review_page2 /product_x_review_page_3
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content For Product Alternative listing
Hi I have a tricky one here. cloudswave is a directory of products and we are launching new pages called Alternatives to Product X This page displays 10 products that are an alternative to product X (Page A) Lets say now you want to have the alternatives to a similar product within the same industry, product Y (Page B), you will have 10 product alternatives, but this page will be almost identical to Page A as the products are in similar and in the same industry. Maybe one to two products will differ in the 2 listings. Now even SEO tags are different, aren't those two pages considered duplicate content? What are your suggestions to avoid this problem? thank you guys
Intermediate & Advanced SEO | | RSedrati0 -
Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
A company has a TLD (top-level-domain) which every single product: company.com/product/name.html The company also has subdomains (tailored to a range of products) which lists a choosen selection of the products from the TLD - sort of like a feed: subdomain.company.com/product/name.html The content on the TLD & subdomain product page are exactly the same and cannot be changed - CSS and HTML is slightly differant but the content (text and images) is exactly the same! My concern (and rightly so) is that Google will deem this to be duplicate content, therfore I'm going to have to add a rel cannonical tag into the header of all subdomain pages, pointing to the original product page on the TLD. Does this sound like the correct thing to do? Or is there a better solution? Moving on, not only are products fed onto subdomain, there are a handfull of other domains which list the products - again, the content (text and images) is exactly the same: other.com/product/name.html Would I be best placed to add a rel cannonical tag into the header of the product pages on other domains, pointing to the original product page on the actual TLD? Does rel cannonical work across domains? Would the product pages with a rel cannonical tag in the header still rank? Let me know if there is a better solution all-round!
Intermediate & Advanced SEO | | iam-sold0 -
Ticket Industry E-commerce Duplicate Content Question
Hey everyone, How goes it? I've got a bunch of duplicate content issues flagged in my Moz report and I can't figure out why. We're a ticketing site and the pages that are causing the duplicate content are for events that we no longer offer tickets to, but that we will eventually offer tickets to again. Check these examples out: http://www.charged.fm/mlb-all-star-game-tickets http://www.charged.fm/fiba-world-championship-tickets I realize the content is thin and that these pages basically the same, but I understood that since the Title tags are different that they shouldn't appear to the Goog as duplicate content. Could anyone offer me some insight or solutions to this? Should they be noindexed while the events aren't active? Thanks
Intermediate & Advanced SEO | | keL.A.xT.o1 -
SEO Audit Strategy For A Complex Website?
I am looking for a list of SEO audit tools and strategies for a complex website. The things I am looking for include (but not limited to): finding all the subdomains of the website listing all the 301's, 302's, 404's, etc finding current canonical tags suggesting canonical tags for certain links listing / finding all current rel=nofollow's on the website listing internal links which use & don't use 'www.' finding duplicate content on additional domains owned by this website I know how to find some of the items above, but not sure if my methods are optimal and/or the most accurate. Thank you in advance for your input!
Intermediate & Advanced SEO | | CTSupp0 -
If other websites implement our RSS feed sidewide on there website, can that hurt our own website?
Think about the switching anchors from the backlinks and the 100s of sidewide inlinks... I gues Google will understand that it's just a RSS feed right?
Intermediate & Advanced SEO | | Zanox0 -
How best to handle (legitimate) duplicate content?
Hi everyone, appreciate any thoughts on this. (bit long, sorry) Am working on 3 sites selling the same thing...main difference between each site is physical location/target market area (think North, South, West as an example) Now, say these 3 sites all sell Blue Widgets, and thus all on-page optimisation has been done for this keyword. These 3 sites are now effectively duplicates of each other - well the Blue Widgets page is at least, and whist there are no 'errors' in Webmaster Tools am pretty sure they ought to be ranking better than they are (good PA, DA, mR etc) Sites share the same template/look and feel too AND are accessed via same IP - just for good measure 🙂 So - to questions/thoughts. 1 - Is it enough to try and get creative with on-page changes to try and 'de-dupe' them? Kinda tricky with Blue Widgets example - how many ways can you say that? I could focus on geographical element a bit more, but would like to rank well for Blue Widgets generally. 2 - I could, i guess, no-index, no-follow, blue widgets page on 2 of the sites, seems a bit drastic though. (or robots.txt them) 3 - I could even link (via internal navigation) sites 2 and 3 to site 1 Blue Widgets page and thus make 2 blue widget pages redundant? 4 - Is there anything HTML coding wise i could do to pull in Site 1 content to sites 2 and 3, without cloaking or anything nasty like that? I think 1- is first thing to do. Anything else? Many thanks.
Intermediate & Advanced SEO | | Capote0 -
Wordpress Duplicate Content
We have recently moved our company's blog to Wordpress on a subdomain (we utilize the Yoast SEO plugin). We are now experiencing an ever-growing volume of crawl errors (nearly 300 4xx now) for pages that do not exist to begin with. I believe it may have something to do with having the blog on a subdomain and/or our yoast seo plugin's indexation archives (author, category, etc) --- we currently have Subpages of archives and taxonomies, and category archives in use. I'm not as familiar with Wordpress and the Yoast SEO plugin as I am with other CMS' so any help in this matter would be greatly appreciated. I can PM further info if necessary. Thank you for the help in advance.
Intermediate & Advanced SEO | | BethA0 -
Duplicate content - canonical vs link to original and Flash duplication
Here's the situation for the website in question: The company produces printed publications which go online as a page turning Flash version, and as a separate HTML version. To complicate matters, some of the articles from the publications get added to a separate news section of the website. We want to promote the news section of the site over the publications section. If we were to forget the Flash version completely, would you: a) add a canonical in the publication version pointing to the version in the news section? b) add a link in the footer of the publication version pointing to the version in the news section? c) both of the above? d) something else? What if we add the Flash version into the mix? As Flash still isn't as crawlable as HTML should we noindex them? Is HTML content duplicated in Flash as big an issue as HTML to HTML duplication?
Intermediate & Advanced SEO | | Alex-Harford0