Do you bother cleaning duplicate content from Googles Index?
-
Hi,
I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...
For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.
Do you think this is right?
Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?
Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?
Thanks
FashionLux
-
One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.
For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.
-
Hi Highland/Dr Pete,
My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).
However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.
My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.
I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.
Thanks guys
FashionLux
-
I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.
Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?
-
Your options are
- De-index the duplicate pages yourself and save yourself the crawl budget
- 301 the duplicates to the pages you want to keep (preferred)
- Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website Redesign - Duplicate Content?
I hired a company to redesign our website.there are many pages like the example below that we are downsizing content by 80%.(believe me, not my decision)Current page: https://servicechampions.com/air-conditioning/New page (on test server):https://servicechampions.mymwpdesign.com/air-conditioning/My question to you is, that 80% of content that i am losing in the redesign, can i republish it as a blog?I know that google has it indexed. The old page has been live for 5 years, but now 80% of it will no longer be live. so can it be a blog and gain new (keep) seo value?What should i do with the 80% of content i am losing?
Intermediate & Advanced SEO | | CamiloSC0 -
Google does not want to index my page
I have a site that is hundreds of page indexed on Google. But there is a page that I put in the footer section that Google seems does not like and are not indexing that page. I've tried submitting it to their index through google webmaster and it will appear on Google index but then after a few days it's gone again. Before that page had canonical meta to another page, but it is removed now.
Intermediate & Advanced SEO | | odihost0 -
Noindexing Duplicate (non-unique) Content
When "noindex" is added to a page, does this ensure Google does not count page as part of their analysis of unique vs duplicate content ratio on a website? Example: I have a real estate business and I have noindex on MLS pages. However, is there a chance that even though Google does not index these pages, Google will still see those pages and think "ah, these are duplicate MLS pages, we are going to let those pages drag down value of entire site and lower ranking of even the unique pages". I like to just use "noindex, follow" on those MLS pages, but would it be safer to add pages to robots.txt as well and that should - in theory - increase likelihood Google will not see such MLS pages as duplicate content on my website? On another note: I had these MLS pages indexed and 3-4 weeks ago added "noindex, follow". However, still all indexed and no signs Google is noindexing yet.....
Intermediate & Advanced SEO | | khi50 -
Partial duplicate content and canonical tags
Hi - I am rebuilding a consumer website, and each product page will contain a unique product image, and a sentence or two about the product (and we tend to use a lot of the same words in different ways across products). I'd like to have a tabbed area below the product info that talks about the overall product line, and this content would be duplicate across all the product pages (a "Why use our products" type of thing). I'd have this duplicate content also living on its own URL's so they can be found alone in the SERP's. Question is, do I need to add the canonical tag to this page, since there's partial duplicate content on the product pages? And if I did that, would my product pages go un-indexed?? I understand how to handle completely duplicated content, it's the partial duplicate that I'm having difficulty figuring out.
Intermediate & Advanced SEO | | Jenny10 -
Best practice with duplicate content. Cd
Our website has recently been updated, now it seems that all of our products pages look like this cdnorigin.companyname.com/catagory/product Google is showing these pages within the search. rather then companyname.com/catagory/product Each product page does have a canaonacal tag on that points to the cdnorigin page. Is this best practice? i dont think that cdnorigin.companyname etc looks very goon in the search. is there any reason why my designer would set the canonical tags up this way?
Intermediate & Advanced SEO | | Alexogilvie0 -
K3 duplicate page content and title tags
I'm running a Joomla site, have just installed k2 as our blogging platform. Our Crawl Report with SEOMOZ shows a good bit of duplicate content and duplicate title tags with our K2 blog. We've installed sh404SEF. Will I need to go into sh404SEF each time we generate a blog entry to point the titles to one URL? If there is something simpler please advise. Thank you, Don
Intermediate & Advanced SEO | | donaldmoore0 -
Google indexing issue?
Hey Guys, After a lot of hard work, we finally fixed the problem on our site that didn't seem to show Meta Descriptions in Google, as well as "noindex, follow" on tags. Here's my question: In our source code, I am seeing both Meta descriptions on pages, and posts, as well as noindex, follow on tag pages, however, they are still showing the old results and tags are also still showing in Google search after about 36 hours. Is it just a matter of time now or is something else wrong?
Intermediate & Advanced SEO | | ttb0 -
Duplicate content that looks unique
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on? http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011 http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
Intermediate & Advanced SEO | | neooptic0