Duplicate site (disaster recovery) being crawled and creating two indexed search results

OpenTable

I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.

Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.

There seem to be two potential fixes. Which is best for this case?

use the robots.txt to block Google from crawling the .gtm site

2) canonicalize the the gtm urls to toptable.co.uk

In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.

Thanks in advance to the SEOmoz community!

Dr-Pete

It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.

Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.

josh-riley

First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.

OlegKorneitchouk

If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.

Otherwise, I would noindex the backup site.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate site (disaster recovery) being crawled and creating two indexed search results

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)

Site still indexed after request 'change of address' search console

Something happened within the last 2 weeks on our WordPress-hosted site that created "duplicates" by counting www.company.com/example and company.com/example (without the 'www.') as separate pages. Any idea what could have happened, and how to fix it?

Merging Two Unrelated Sites into a Third Site

After adding a ssl certificate to my site I encountered problems with duplicate pages and page titles

Question about duplicate listings on site for product listings.

Why my site it's not being indexed?

How to let Search engines index login-first SNS sites?