Do you bother cleaning duplicate content from Googles Index?
-
Hi,
I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...
For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.
Do you think this is right?
Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?
Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?
Thanks
FashionLux
-
One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.
For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.
-
Hi Highland/Dr Pete,
My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).
However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.
My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.
I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.
Thanks guys
FashionLux
-
I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.
Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?
-
Your options are
- De-index the duplicate pages yourself and save yourself the crawl budget
- 301 the duplicates to the pages you want to keep (preferred)
- Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicated Content with Index.php
Good Afternoon, My website uses Joomla CMS and has the htaccess rewrite code enabled to ensure the use of search engine friendly URLs (SEF's). While browsing the crawl diagnostics I have found that Moz considers the /index.php URL a duplicate to our root. I will always under the impression that the htaccess rewrite took care of that issue and obviously I would like to address it. I attempted to create a 301 redirect from the index.php URL to the root but ran into an issue when attempting to login to the admin portion of the website as the redirect sent me back to the homepage. I was curious if anyone had advice for handling the index.php duplication issue, specifically with Joomla. Additionally, I have confirmed that in Google Webmasters, under URL parameters, the index.php parameter is set as 'Representative URL'.
Intermediate & Advanced SEO | | BrandonEML0 -
Could this be seen as duplicate content in Google's eyes?
Hi I'm an in-house SEO and we've recently seen Panda related traffic loss along with some of our main keywords slipping down the SERPs. Looking for possible Panda related issues I was wondering if the following could be seen as duplicate content. We've got some very similar holidays (travel company) on our website. While they are different I'm concerned it may be seen as creating content that is too similar: http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/the-wildlife-and-beaches-of-kenya.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/ultimate-kenya-wildlife-and-beaches.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/wildlife-and-beach-family-safari.aspx They do all have unique text but as you can see from the titles, they are very similar (note from an SEO point of view the tabbed content is all within the same page at source level). At the top level of the holiday pages we have a filtered search:
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays.aspx These pages have a unique introduction but the content snippets being pulled into the boxes is drawn from each of the individual holiday pages. I'm just concerned that these could be introducing some duplicating issues. Any thoughts?0 -
Magento Duplicate Content Recovery
Hi, we switched platforms to Magento last year. Since then our SERPS rankings have declined considerably (no sudden drop on any Panda/Penguin date lines). After investigating, it appeared we neglected to No index, follow all our filter pages and our total indexed pages rose sevenfold in a matter of weeks. We have since fixed the no index issue and the pages indexed are now below what we had pre switch to Magento. We've seen some positive results in the last week. Any ideas when/if our rankings will return? Thanks!
Intermediate & Advanced SEO | | Jonnygeeuk0 -
Duplicate Content From Indexing of non- File Extension Page
Google somehow has indexed a page of mine without the .html extension. so they indexed www.samplepage.com/page, so I am showing duplicate content because Google also see's www.samplepage.com/page.html How can I force google or bing or whoever to only index and see the page including the .html extension? I know people are saying not to use the file extension on pages, but I want to, so please anybody...HELP!!!
Intermediate & Advanced SEO | | WebbyNabler0 -
Frequent FAQs vs duplicate content
It would be helpful for our visitors if we were to include an expandable list of FAQs on most pages. Each section would have its own list of FAQs specific to that section, but all the pages in that section would have the same text. It occurred to me that Google might view this as a duplicate content issue. Each page _does _have a lot of unique text, but underneath we would have lots of of text repeated throughout the site. Should I be concerned? I guess I could always load these by AJAX after page load if might penalize us.
Intermediate & Advanced SEO | | boxcarpress0 -
Duplicate Content Question
My understanding of duplicate content is that if two pages are identical, Google selects one for it's results... I have a client that is literally sharing content real-time with a partner...the page content is identical for both sites, and if you update one page, teh otehr is updated automatically. Obviously this is a clear cut case for canonical link tags, but I'm cuious about something: Both sites seem to show up in search results but for different keywords...I would think one domain would simply win out over the other, but Google seems to show both sites in results. Any idea why? Also, could this duplicate content issue be hurting visibility for both sites? In other words, can I expect a boost in rankings with the canonical tags in place? Or will rankings remain the same?
Intermediate & Advanced SEO | | AmyLB0 -
Duplicate Content
Hi everyone, I have a TLD in the UK with a .co.uk and also the same site in Ireland (.ie). The only differences are the prices and different banners maybe. The .ie site pulls all of the content from the .co.uk domain. Is this classed as content duplication? I've had problems in the past in which Google struggles to index the website. At the moment the site appears completely fine in the UK SERPs but for Ireland I just have the Title and domain appearing in the SERPs, with no extended title or description because of the confusion I caused Google last time. Does anybody know a fix for this? Thanks
Intermediate & Advanced SEO | | royb0 -
Cross-Domain Canonical and duplicate content
Hi Mozfans! I'm working on seo for one of my new clients and it's a job site (i call the site: Site A).
Intermediate & Advanced SEO | | MaartenvandenBos
The thing is that the client has about 3 sites with the same Jobs on it. I'm pointing a duplicate content problem, only the thing is the jobs on the other sites must stay there. So the client doesn't want to remove them. There is a other (non ranking) reason why. Can i solve the duplicate content problem with a cross-domain canonical?
The client wants to rank well with the site i'm working on (Site A). Thanks! Rand did a whiteboard friday about Cross-Domain Canonical
http://www.seomoz.org/blog/cross-domain-canonical-the-new-301-whiteboard-friday0