Do you bother cleaning duplicate content from Googles Index?
-
Hi,
I'm in the process of instructing developers to stop producing duplicate content, however a lot of duplicate content is already in Google's Index and I'm wondering if I should bother getting it removed... I'd appreciate it if you could let me know what you'd do...
For example one 'type' of page is being crawled thousands of times, but it only has 7 instances in the index which don't rank for anything. For this example I'm thinking of just stopping Google from accessing that page 'type'.
Do you think this is right?
Do you normally meta NoIndex,follow the page, wait for the pages to be removed from Google's Index, and then stop the duplicate content from being crawled?
Or do you just stop the pages from being crawled and let Google sort out its own Index in its own time?
Thanks
FashionLux
-
One tricky point - you don't necessarily want to fix the duplicate URLs before you 301-redirect and clear out the index. This is counter-intuitive and throws many people off. If you cut the crawl paths to the bad URLs, then Google will never crawl them and process the 301-redirects (since those exist on the page level). Same is try for canonical tags. Clear out the duplicates first, THEN clean up the paths. I know it sounds weird, but it's important.
For malformed URLs and usability, you could still dynamically 301-redirect. In most cases, those bad URLs shouldn't get indexed, because they have no crawl path in your site. Someone would have to link to them. Google will never mis-type, in other words.
-
Hi Highland/Dr Pete,
My apologies I wasn't very clear - fixing the duplicate problem... or rather stopping our site from generating further duplicate content isn't an issue at all, I'm going to instruct our developers to stop generating dupe content by doing things like no longer passing variables in the URL's (mysite.com/page2?previouspage=page1).
However the problem is that for a lot of instances duplicate URL's work and they need to work - for example if a user types in the URL but gets one character wrong ('1q' rather than '1') then from a usability perspective its the correct thing to serve the content they wanted. You don't want to make the user have to stop, figure out what they did wrong and redo it - not when you can make it work seamlessly.
My question relates to 'once my site is no longer generating unnecessary duplicate content, what should I to do about the duplicate pages that have already made their way into the Index?' and you have both answered the question very well, thank you.
I can manually set-up 301 redirects for all of the duplicate pages that I find in the index, once they disappear from the index I can probably remove those 301's. I was thinking of going down the noindex meta tag route which is harder to develop.
Thanks guys
FashionLux
-
I DO NOT believe in letting Google sort it out - they don't do it well, and, since Panda (and really even before), they basically penalize sites for their inability to sort out duplicates. I think it's very important to manage your index.
Unfortunately, how to do that can be very complex and depends a lot on the situation. Highland's covered the big ones, but the details can get messy. I wrote a mega-post about it:
http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world
Without giving URLs, can you give us a sense of what kind of duplicates they are (or maybe some generic URL examples)?
-
Your options are
- De-index the duplicate pages yourself and save yourself the crawl budget
- 301 the duplicates to the pages you want to keep (preferred)
- Canonical the duplicate pages, which lets you pick which page remains in the index. The duplicate pages will still be crawled, however.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing of Images
Our site is experiencing an issue with indexation of images. The site is real estate oriented. It has 238 listings with about 1190 images. The site submits two version (different sizes) of each image to Google, so there are about 2,400 images. Only several hundred are indexed. Can adding Microdata improve the indexation of the images? Our site map is submitting images that are on no-index listing pages to Google. As a result more than 2000 images have been submitted but only a few hundred have been indexed. How should the site map deal with images that reside on no-index pages? Do images that are part of pages that are set up as "no-index" need a special "no-index" label or special treatment? My concern is that so many images that not indexed could be a red flag showing poor quality content to Google. Is it worth investing in correcting this issue, or will correcting it result in little to no improvement in SEO? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Duplicate content based on filters
Hi Community, There have probably been a few answers to this and I have more or less made up my mind about it but would like to pose the question or as that you post a link to the correct article for this please. I have a travel site with multiple accommodations (for example), obviously there are many filter to try find exactly what you want, youcan sort by region, city, rating, price, type of accommodation (hotel, guest house, etc.). This all leads to one invevitable conclusion, many of the results would be the same. My question is how would you handle this? Via a rel canonical to the main categories (such as region or town) thus making it the successor, or no follow all the sub-category pages, thereby not allowing any search to reach deeper in. Thanks for the time and effort.
Intermediate & Advanced SEO | | ProsperoDigital0 -
Content not indexed
How come i google content that resides on my website and on my homepage and my site doesn't come up? I know the content is unique i wrote that. I have a feeling i have some kind of a crawling issue but cannot determine what it is. I ran the crawling test and other tools and didn't find anything. Google shows me that pages are indexed but yet its weird try googling snippets of content and you'll see my site isnt anywhere. Have you experienced that before? First i thought it was penalized but i submitted the reconsideration request and it came back clear, No manual spam action found. And i did not get any message in my GWMT either. Any thoughts?
Intermediate & Advanced SEO | | CMTM0 -
Google Sitemap only indexing 50% Is that a problem?
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem? We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
Intermediate & Advanced SEO | | EcommerceSite0 -
How much (%) of the content of a page is considered too much duplication?
Google is not fond of duplication, I have been very kindly told. So how much would you suggest is too much?
Intermediate & Advanced SEO | | simonberenyi0 -
Duplicate content that looks unique
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on? http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011 http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
Intermediate & Advanced SEO | | neooptic0 -
Guest blogging and duplicate content
I have a guest blog prepared and several sites I can submit it to, would it be considered duplicate content if I submitted one guest blog post to multipul blogs? and if so this content is not on my site but is linking to it. What will google do? Lets say 5 blogs except the same content and post it up, I understand that the first blog to have it up will not be punished, what about the rest of the blogs? can they get punished for this duplicate content? can I get punished for having duplicate content linking to me?
Intermediate & Advanced SEO | | SEODinosaur0 -
Google replacing subpages in index with home page?
Hi! I run a backlink building company. Recently, we had a customer who had us build targeted backlinks to certain subpages on his site. Then something really bizarre happened...all of a sudden, their subpages that were indexed in Google (the ones we were building links to) disappeared from the index, to be replaced with their home page. They haven't lost their rank, per se--it's just now their home page instead of their subpages. At this point, we are tracking literally thousands of keywords for our link building customers, and we've never run into this issue before. Have you ever run into it? If so, what's the best way to handle it from an SEO company perspective? They have a sitemap.xml and their GWT account reports no crawl errors, so it doesn't seem to be a site issue.
Intermediate & Advanced SEO | | ownlocal0