Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing duplicate content
-
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs.
1. We've instituted canonical tags site wide.
2. We are using the parameters function in Webmaster Tools.
3. We are using 301 redirects on all of the obsolete URLs
4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals.
5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed.
None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical.
Any ideas on how to clean up the mess?
-
Where this is appearing the most is on cross domain canonicals. We have duplicate content across 2 websites, and we've canonicaled some pages from Site A to Site B, and some from Site B to Site A. In theory, pages that were canonicaled to the other domain should be deindexed. When I run a rankings report, I see pages for the wrong domain ranking, a month later. They are pages with parameters, or old URLs that we've changed. It's like a game of whack a mole. Every time we get a page deindexed, a duplicate with a different parameter takes its place. And this is in spite of calling out these parameters in GWT.
What I imagine is happening is that we have several URLs for the same page indexed. When Google crawls our site, it is correctly canonicaling the page it crawls. In the rankings, however, Google is probably pulling a duplicate page out of its index, and ranking it without crawling it. If it was crawling it, Google would see the canonical tag, and not rank it. So we have an ongoing battle to get Google to crawl the page it just pulled out of its index to see the the canonical tag.
The reason for all this is that when a page cross domain canonicals correctly, the rankings for the duplicate page on the other site goes up dramatically. As long as Google keeps ranking the wrong pages, we don't get the rankings bump on the other site.
-
Are you basing this on a site: search? It's fairly common for URLs to appear in a site: search that otherwise will not appear for any actual searches. Are the undesirable versions of the URLs getting any search traffic?
-
Yes, as Patrick said, surprisingly often something like this is a result of a simple oversight because we have been looking at the same code over and over...
Do you have access to Screaming Frog? You could crawl your site and see whether redirects/canonicals are behaving as you expected.
Have you taken a look at the html of one of the incorrectly indexed pages when it is loaded in your browser? Can you see the canonical? If you try going to a redirected page, does it redirect? [I know--way to obvious, but sometimes it is good to start at the beginning again when we can't root out an issue.]
Another culprit in these cases can be internal links. Do you link internally using any of the undesirable URLs? That can send a message to Google that those URLs are still in play. Again, you can use Screaming Frog to find those strings.
-
It sounds like part of the problem may be the sitemaps you're sending. By including duplicates in a sitemap, you're basically telling Google that each version of the page is valid. I would remove them and resubmit a sitemap with only the canonical versions you want indexed and see if that helps.
-
Hi there
Are you sure you are using all of the tools above properly? Not saying you're not but people make mistakes and it's just something to look into.
When did you implement all of the changes? Was it recently or was it a long time ago?
How is your organic traffic and rankings? Did you check if you have a manual action at all?
Let me know - thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I avoid duplicate content for a new landing page which is the same as an old one?
Hello mozers! I have a question about duplicate content for you... One on my clients pages have been dropping in search volume for a while now, and I've discovered it's because the search term isn't as popular as it used to be. So... we need to create a new landing page using a more popular search term. The page which is losing traffic is based on the search query "Can I put a solid roof on my conservatory" this only gets 0-10 searches per month according to the keyword explorer tool. However, if we changed this to "replacing conservatory roof with solid roof" this gets up to 500 searches per month. Muuuuch better! The issue is, I don't want to close down and re-direct the old page because it's got a featured snippet and sits in position 1. So I'd like to create another page instead... however, as the two are effectively the same content, I would then land myself in a duplicate content issue. If I were to put a rel="canonical" tag in the original "can I put a solid roof...." page but say the master page is now the new one, would that get around the issue?
Intermediate & Advanced SEO | | Virginia-Girtz0 -
Same product in different categories and duplicate content issues
Hi,I have some questions related to duplicate content on e-commerce websites. 1)If a single product goes to multiple categories (eg. A black elegant dress could be listed in two categories like "black dresses" and "elegant dresses") is it considered duplicate content even if the product url is unique? e.g www.website.com/black-dresses/black-elegant-dress duplicated> same content from two different paths www.website.com/elegant-dresses/black-elegant-dress duplicated> same content from two different paths www.website.com/black-elegant-dress unique url > this is the way my products urls look like Does google perceive this as duplicated content? The path to the content is only one, so it shouldn't be seen as duplicated content, though the product is repeated in different categories.This is the most important concern I actually have. It is a small thing but if I set this wrong all website would be affected and thus penalised, so I need to know how I can handle it. 2- I am using wordpress + woocommerce. The website is built with categories and subcategories. When I create a product in the product page backend is it advisable to select just the lowest subcategory or is it better to select both main category and subcategory in which the product belongs? I usually select the subcategory alone. Looking forward to your reply and suggestions. thanks
Intermediate & Advanced SEO | | cinzia091 -
Directory with Duplicate content? what to do?
Moz keeps finding loads of pages with duplicate content on my website. The problem is its a directory page to different locations. E.g if we were a clothes shop we would be listing our locations: www.sitename.com/locations/london www.sitename.com/locations/rome www.sitename.com/locations/germany The content on these pages is all the same, except for an embedded google map that shows the location of the place. The problem is that google thinks all these pages are duplicated content. Should i set a canonical link on every single page saying that www.sitename.com/locations/london is the main page? I don't know if i can use canonical links because the page content isn't identical because of the embedded map. Help would be appreciated. Thanks.
Intermediate & Advanced SEO | | nchlondon0 -
Duplicate content on recruitment website
Hi everyone, It seems that Panda 4.2 has hit some industries more than others. I just started working on a website, that has no manual action, but the organic traffic has dropped massively in the last few months. Their external linking profile seems to be fine, but I suspect usability issues, especially the duplication may be the reason. The website is a recruitment website in a specific industry only. However, they posts jobs for their clients, that can be very similar, and in the same time they can have 20 jobs with the same title and very similar job descriptions. The website currently have over 200 pages with potential duplicate content. Additionally, these jobs get posted on job portals, with the same content (Happens automatically through a feed). The questions here are: How bad would this be for the website usability, and would it be the reason the traffic went down? Is this the affect of Panda 4.2 that is still rolling What can be done to resolve these issues? Thank you in advance.
Intermediate & Advanced SEO | | iQi0 -
Case Sensitive URLs, Duplicate Content & Link Rel Canonical
I have a site where URLs are case sensitive. In some cases the lowercase URL is being indexed and in others the mixed case URL is being indexed. This is leading to duplicate content issues on the site. The site is using link rel canonical to specify a preferred URL in some cases however there is no consistency whether the URLs are lowercase or mixed case. On some pages the link rel canonical tag points to the lowercase URL, on others it points to the mixed case URL. Ideally I'd like to update all link rel canonical tags and internal links throughout the site to use the lowercase URL however I'm apprehensive! My question is as follows: If I where to specify the lowercase URL across the site in addition to updating internal links to use lowercase URLs, could this have a negative impact where the mixed case URL is the one currently indexed? Hope this makes sense! Dave
Intermediate & Advanced SEO | | allianzireland0 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
PDF for link building - avoiding duplicate content
Hello, We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product. We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful. My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content? Thanks.
Intermediate & Advanced SEO | | BobGW0 -
Duplicate content on ecommerce sites
I just want to confirm something about duplicate content. On an eCommerce site, if the meta-titles, meta-descriptions and product descriptions are all unique, yet a big chunk at the bottom (featuring "why buy with us" etc) is copied across all product pages, would each page be penalised, or not indexed, for duplicate content? Does the whole page need to be a duplicate to be worried about this, or would this large chunk of text, bigger than the product description, have an effect on the page. If this would be a problem, what are some ways around it? Because the content is quite powerful, and is relavent to all products... Cheers,
Intermediate & Advanced SEO | | Creode0