Duplicate site (disaster recovery) being crawled and creating two indexed search results
-
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.
Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.
There seem to be two potential fixes. Which is best for this case?
- use the robots.txt to block Google from crawling the .gtm site
2) canonicalize the the gtm urls to toptable.co.uk
In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.
Thanks in advance to the SEOmoz community!
-
It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.
Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.
-
First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.
-
If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.
Otherwise, I would noindex the backup site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging/Development Site Indexed?
So, my company's site has been pretty tough to try to get moving in the right direction on Google's SERPs. I had believed that it was mainly due to having a shortage of back links and a horrible home page load time. Everything else seems to be set up pretty well. I was messing around and used the site: Google search operator for our staging site. I found stage.site.com and a lot of our other staging pages in the search results. I have to think that this is the problem and causing a duplicate content penalty of the entire site. I guess I now need to 301 redirect the entire site? Has anyone every had this issue before and have fixed it? Thanks for any help.
Intermediate & Advanced SEO | | aua0 -
Google Search Console indexes website for www but images for non www.
On the google search console, the website data is all showing for the www.promierproducts.com. The images however are indexed on the non www version. I'm not sure why.
Intermediate & Advanced SEO | | MikeSab1 -
In Google Search Results ....Is it a site link or what? How to get this?
Hello Experts, When I search in google any keyword like abcd in search results for one website after meta description there are showing few links of website ( image attached ) Can you please let me know what is this & how to achieve such type of links? Thanks! mdJBLYb
Intermediate & Advanced SEO | | wright3350 -
On-site Search - Revisited (again, *zZz*)
Howdy Moz fans! Okay so there's a mountain of information out there on the webernet about internal search results... but i'm finding some contradiction and a lot of pre-2014 stuff. Id like to hear some 2016 opinion and specifically around a couple of thoughts of my own, as well as some i've deduced from other sources. For clarity, I work on a large retail site with over 4 million products (product pages), and my predicament is thus - I want Google to be able to find and rank my product pages. Yes, I can link to a number of the best ones by creating well planned links via categorisation, silos, efficient menus etc (done), but can I utilise site search for this purpose? It was my understanding that Google bots don't/can't/won't use a search function... how could it? It's like expeciting it to find your members only area, it can't login! How can it find and index the millions of combinations of search results without typing in "XXXXL underpants" and all the other search combinations? Do I really need to robots.txt my search query parameter? How/why/when would googlebot generate that query parameter? Site Search is B.A.D - I read this everywhere I go, but is it really? I've read - "It eats up all your search quota", "search results have no content and are classed as spam", "results pages have no value" I want to find a positive SEO output to having a search function on my website, not just try and stifle Mr Googlebot. What I am trying to learn here is what the options are, and what are their outcomes? So far I have - _Robots.txt - _Remove the search pages from Google _No Index - _Allow the crawl but don't index the search pages. _No Follow - _I'm not sure this is even a valid idea, but I picked it up somewhere out there. _Just leave it alone - _Some of your search results might get ranked and bring traffic in. It appears that each and every option has it's positive and negative connotations. It'd be great to hear from this here community on their experiences in this practice.
Intermediate & Advanced SEO | | Mark_Elton0 -
Webmaster Tools: Total Indexed VS Ever Crawled
Ok, In WMT's under health > index status I have both total indexed and ever crawled ticked - It also looks like the data is broken up weekly. As an example say you have the following: Total Indexed: 1000 Ever Crawled: 5000 What is this say? It found 5000 pages but only indexed 1000 (20%). Thanks
Intermediate & Advanced SEO | | Bondara0 -
A sneaky site? Two URLs with a similar layout linking back and forth.
Hello. I have a competitor that is on the front page of Google (and often at or near the top) for many desirable keywords - almost unbelievably so. I notice that their site has a blog. When I click the blog button, I am taken to a different URL that has a very similar layout with a similar navigation bar, etc. When I click one of the navigation buttons on the blog site, I am taken back to the other URL. This seems strange. Is there some ranking benefit to having two URLs set up like this? Is this a sneaky site? Thank you!
Intermediate & Advanced SEO | | nyc-seo0 -
How does the crawl find duplicate pages that don't exist on the site?
It looks like I have a lot of duplicate pages which are essentially the same url with some extra ? parameters added eg: http://www.merlin.org.uk/10-facts-about-malnutrition http://www.merlin.org.uk/10-facts-about-malnutrition?page=1 http://www.merlin.org.uk/10-facts-about-malnutrition?page=2 These extra 2 pages (and there's loads of pages this happens to) are a mystery to me. Not sure why they exist as there's only 1 page. Is this a massive issue? It's built on Drupal so I wonder if it auto generates these pages for some reason? Any help MUCH appreciated. Thanks
Intermediate & Advanced SEO | | Deniz0 -
How to handle a server outage if I have two sites
I operate a web application. It consists of two sites, www.mysite.com and app.mysite.com. As you might imagine, www is used for marketing purposes, and it's our main organic search entry point. The app.mysite.com domain is where our application portal is for customers, and it is also where our login and registration pages are located. Currently, www.mysite.com is experiencing a catastrophic outage and is returning 504 errors, but app.mysite.com is on a totally separate system with a lot redundancy, and is doing just fine. If we get traffic from referrals or search, we want that traffic to be able to login and register, so we've replaced the 504 error with a 302 redirect to app.mysite.com until the situation is resolved. This provides the best possible experience for users (nothing's worse than a 504). How will this affect SEO? Is there something other than a 302 that I should be doing with the broken www.mysite.com domain?
Intermediate & Advanced SEO | | Ehren0