Do you bother cleaning duplicate content from Googles Index?
-
Hi,
I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...
For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.
Do you think this is right?
Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?
Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?
Thanks
FashionLux
-
One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.
For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.
-
Hi Highland/Dr Pete,
My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).
However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.
My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.
I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.
Thanks guys
FashionLux
-
I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.
Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?
-
Your options are
- De-index the duplicate pages yourself and save yourself the crawl budget
- 301 the duplicates to the pages you want to keep (preferred)
- Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting Google to index our sitemap
Hi, We have a sitemap on AWS that is retrievable via a url that looks like ours http://sitemap.shipindex.org/sitemap.xml. We have notified Google it exists and it found our 700k urls (we are a database of ship citations with unique urls). However, it will not index them. It has been weeks and nothing. The weird part is that it did do some of them before, it said so, about 26k. Then it said 0. Now that I have redone the sitemap, I can't get google to look at it and I have no idea why. This is really important to us, as we want not just general keywords to find our front page, but we also want specific ship names to show links to us in results. Does anyone have any clues as to how to get Google's attention and index our sitemap? Or even just crawl more of our site? It has done 35k pages crawling, but stopped.
Intermediate & Advanced SEO | | shipindex0 -
Same content, different languages. Duplicate content issue? | international SEO
Hi, If the "content" is the same, but is written in different languages, will Google see the articles as duplicate content?
Intermediate & Advanced SEO | | chalet
If google won't see it as duplicate content. What is the profit of implementing the alternate lang tag?Kind regards,Jeroen0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Indexed Pages in Google, How do I find Out?
Is there a way to get a list of pages that google has indexed? Is there some software that can do this? I do not have access to webmaster tools, so hoping there is another way to do this. Would be great if I could also see if the indexed page is a 404 or other Thanks for your help, sorry if its basic question 😞
Intermediate & Advanced SEO | | JohnPeters0 -
Wordpress Duplicate Content
We have recently moved our company's blog to Wordpress on a subdomain (we utilize the Yoast SEO plugin). We are now experiencing an ever-growing volume of crawl errors (nearly 300 4xx now) for pages that do not exist to begin with. I believe it may have something to do with having the blog on a subdomain and/or our yoast seo plugin's indexation archives (author, category, etc) --- we currently have Subpages of archives and taxonomies, and category archives in use. I'm not as familiar with Wordpress and the Yoast SEO plugin as I am with other CMS' so any help in this matter would be greatly appreciated. I can PM further info if necessary. Thank you for the help in advance.
Intermediate & Advanced SEO | | BethA0 -
Duplicate content question? thanks
Hi, Im my time as an SEO I have never come across the following two scenarios, I am an advocate of using unique content, therefore always suggest and in cases demand that all content is written or re-written. This is the scenarios I am facing right now. For Example we have www.abc.com (has over 200 original recipes) and then we have www.xyz.com with the recipes but they are translated into another language as they are targeting different audiences, will Google penalize for duplicate content? The other issue is that the client got the recipes from www.abc.com (that have been translated) and use them in www.xyz.com aswell, both sites owned by the same company so its not pleagurism they have legal rights but I am not sure how Google will see it and if it will penalize the sites. Thanks!
Intermediate & Advanced SEO | | M_81 -
The system shows duplicate content for the same page (main domain and index.html). Is this an error of SEOMOZ?
Should I be worried that this will affect SEO? Most sites redirect to the index.html page, right? [edited by staff to remove toolbar data]
Intermediate & Advanced SEO | | moskowman0 -
Duplicate Content Through Sorting
I have a website that sells images. When you search you're given a page like this: http://www.andertoons.com/search-cartoons/santa/ I also give users the option to resort results by date, views and rating like this: http://www.andertoons.com/search-cartoons/santa/byrating/ I've seen in SEOmoz that Google might see these as duplicate content, but it's a feature I think is useful. How should I address this?
Intermediate & Advanced SEO | | andertoons0