Duplicate site (disaster recovery) being crawled and creating two indexed search results
-
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.
Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.
There seem to be two potential fixes. Which is best for this case?
- use the robots.txt to block Google from crawling the .gtm site
2) canonicalize the the gtm urls to toptable.co.uk
In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.
Thanks in advance to the SEOmoz community!
-
It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.
Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.
-
First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.
-
If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.
Otherwise, I would noindex the backup site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unknown index.html links coming to my site.
I'm getting a lot of domain/index.html urls on my site which I didn't create initially. We recently transfered to a new site so those links could come from the old site. Does any know how to get a comprehensive list of all the urls that lead to 404?
Intermediate & Advanced SEO | | greenshinenewenergy0 -
Search queries results Wrong
Hey there, My website shows Wrong Search Queries in Google Search Console, Also Shows URLS which are not there in my website, Which Shows in crawl Errors. here i have attached Screenshot . http://prntscr.com/egzl88 Please Help me out how i can Deindex This type of URLs From Google index, & make My main pages crawl First in Google Search. because of this my website Ranking Also lost, Please any Expert can help out.. Thanx in advance.
Intermediate & Advanced SEO | | pooja.verify030 -
Site: search showing funny results
Hi When i do a site: search on my domain the very last result it returns is a URL which is listed as my domain but does not exist on my website. When clicked it redirects to a really spammy page. If im not being clear just let me know, quite hard to explain the situation! Any thoughts to get rid of this?
Intermediate & Advanced SEO | | TheZenAgency0 -
Will a GEO Localization site create thousands of duplicates?
Hi mozzers, We are about to launch a new site and right now I am worried that this new site may create thousands of duplicate content which will harm all the SEO that has been done in the last few years. Here is a situation: You land on the example.com/Los-angeles page (geo located) but if you modify URI to example.com/chico then a pop up appears and ask you for the location you want to be in (pop up attached). When choosing chico the URI switches to example.com/chico?franchise=chico instead of /chico only. This site has over 40 different microsites so my question are all these arguments ?franchise=city going to be indexed and create thousands of dups? or are we safe because this geo localization happens thanks to javascript? Thanks! GopRinh.png
Intermediate & Advanced SEO | | Ideas-Money-Art0 -
Page indexed but not showing up at all in search results
I am currently working on the SEO for a roofing company. I have developed GEO targeted pages for both commercial and residential roofing (as well as attic insulation and gutters) and have hundreds of 1st page placements for the GEO targeted keywords. What is baffling me is that they are performing EXTREMELY poorly on the bigger cities, to the point of not evening showing up in the first 5 pages. I also target a page specifically for roof repair in Phoenix and it is not coming up AT ALL. This is not typically the results I get when directly targeting keywords. I'm working on implementing keyword variations as well as adding about 10 or so information pages (@ 700 words) regarding different roofing systems which I plan to cross link on the site, etc. I'm just wondering if there is a simple answer as to why the pages I want to be showing up the most are performing so poorly and what I would need to do to improve their rankings.
Intermediate & Advanced SEO | | dogstarweb0 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
SIte Redesign - Disaster for Organic Traffic
A client just redesigned their site and launched it around May 30. The organic traffic has had a MAJOR drop and has not returned yet. All of the old pages have been 301 redirected to the new pages. Any thoughts on what could be causing this to www.brickhousesecurity.com? In Google Webmaster Tools, before the redesign we were receiving about 300,000 impressions and 10-12,000 clicks. Now the impressions are only 100,000 with half as many clicks. Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0 -
Is linking to search results bad for SEO?
If we have pages on our site that link to search results is that a bad thing? Should we set the links to "nofollow"?
Intermediate & Advanced SEO | | nicole.healthline0