Duplicate content across multiple domains

jmsobe

I have come across a situation where we have discovered duplicate content between multiple domains. We have access to each domain and have recently within the past 2 weeks added a 301 redirect to redirect each page dynamically to the proper page on the desired domain.

My question relates to the removal of these pages. There are thousands of these duplicate pages.

I have gone back and looked at a number of these cached pages in google and have found that the cached pages that are roughly 30 days old or older. Will these pages ever get removed from google's index? Will the 301 redirect even be read by google to be redirected to the proper domain and page? If so when will that happen?

Are we better off submitting a full site removal request of the sites that carries the duplicate content at this point? These smaller sites do bring traffic on their own but I'd rather not wait 3 months for the content to be removed since my assumption is that this content is competing with the main site.

I suppose another option would be to include no cache meta tag for these pages.

Any thoughts or comments would be appreciated.

jmsobe

I went ahead and added the links to the sitemap, however when google crawled the links I receieve this message.

When we tested a sample of URLs from your Sitemap, we found that some URLs redirect to other locations. We recommend that your Sitemap contain URLs that point to the final destination (the redirect target) instead of redirecting to another URL.

However I do not understand how adding the redirected links to the sitemap will remove the old links.

dunklea

Worth a shot. Crawl bots usually work by following links from page to the next. If links links no longer exist to those pages, then Google will have a tough time finding those pages and de-indexing them in favor or the correct pages.

Good luck!

jmsobe

One of the previous developers left a hole that caused this issue. The system shares code between sites.

jmsobe

Andrew,

The links were removed from the offending sites, but If I understand the gist of your suggestion Google won't remove them as quickly if they are no longer linked and yes I am using canonical tags. So I should create a sitemap with the previous links and once Google follows these links to the main site remove the sitemap. Is that your recommendation?

I suppose I can try this first before filing a request to remove the entire site.

SteveOllington

Ah, I thought he was saying the dupe content does still exists but no more duplication is taking place after the fix. That's where I was going wrong then lol.

dunklea

As long as the duplicate content pages no longer exist and you've set up the 301 redirects properly, this shouldn't be a long term problem. It can sometimes take Google a while to crawl through 1000's of pages to index the correct pages. You might want to include these pages in a Sitemap to speed up the process, particularly if there are no longer any links to these pages from anywhere else. Are you using canonical tags? They might also help point Google in the right direction.

I don't think a no cache meta tag would help. This is assuming the page will be crawled and by that point Google should follow the 301 and cace that page.

Hope this helps! Let me know how the situation progresses.

Andrew

SteveOllington

Do you want the smaller sites to still exist? If they don't matter at all then you could always take them offline though that's not recommended for obvious reasons (but it would get them out of the index fairly quick).

If they still need to exist then we're just back to the same thing, changing the content on them. If the problem has been fixed to stop further duplication then that's fine... you could limit the damage by having all of those smaller sites be dupes of each other but not of the main site by rewriting the smaller ones with one lot of content, or the main one. At least that way they will only be competing with each other and not the main site any more.

Or have I still got the wrong end of the stick?

jmsobe

I am referring to an e-commerce site, so yes its dynamic. The hole has been plugged (so to speak) but the content still exists in the google cache.

SteveOllington

Ah I see, so it's a CMS which pumps out content then?

But it pumps it to other sites?

jmsobe

Steve, Maybe I haven't explained the issue in enough detail. The duplicate content issue is related to a technical issue with the site causing the content to be duplicated when it should not have been. Its not a matter of rewriting content. My issue deals with purging this content from these other domains so that the main domain can be indexed with this content.

SteveOllington

You could always just rewrite the content so it's not duplicate, that way you get to keep them cached and maybe focus on some different but still targeted long tail traffic... turn a negative into a positive. I accept thousands of pages is a lot of work, but there's a million and one online copywriters who are pretty good (and cheap) that you could assign projects to for it. Google copywriters for hire or freelance copywriters... could have it done in no time and not spend that much

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate content across multiple domains

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Recurring events and duplicate content

Duplicate content - working with CMS constraints

PR / News stories across multiple sites - is it still duplicate content?

Duplicate content and rel canonicals?

Duplicate page/Title content - Where?

How do I deal with Duplicate content?

Duplicate Content Issue

CGI Parameters: should we worry about duplicate content?