Duplicate site (disaster recovery) being crawled and creating two indexed search results
-
I have a primary domain, toptable.co.uk, and a disaster recovery site for this primary domain named uk-www.gtm.opentable.com. In the event of a disaster, toptable.co.uk would get CNAMEd (DNS alias) to the .gtm site. Naturally the .gtm disaster recover domian is an exact match to the toptable.co.uk domain.
Unfortunately, Google has crawled the uk-www.gtm.opentable site, and it's showing up in search results. In most cases the gtm urls don't get redirected to toptable they actually appear as an entirely separate domain to the user. The strong feeling is that this duplicate content is hurting toptable.co.uk, especially as .gtm.ot is part of the .opentable.com domain which has significant authority. So we need a way of stopping Google from crawling gtm.
There seem to be two potential fixes. Which is best for this case?
- use the robots.txt to block Google from crawling the .gtm site
2) canonicalize the the gtm urls to toptable.co.uk
In general Google seems to recommend a canonical change but in this special case it seems robot.txt change could be best.
Thanks in advance to the SEOmoz community!
-
It's a little tricky. While Andrea is right about Robots.txt - it's not great for removal once pages/domains are indexed, you can block the sub-domain with robots.txt and then request removal in Google Webmaster Tools (you need to create a separate account for the sub-domain itself). That's often the fastest way to remove something from the index, and if it has no search value, I might go that route. Just proceed with caution - it's a delicate procedure.
Doing 1-to-1 canonicalization or adding 301 redirects may be the next strongest signal (NOINDEX is a bit weaker, IMO). However, Google will have to re-crawl the sub-domain to do that, so you'll need to keep the paths open.
-
First, if the pages are already indexed then a robots.txt won't make them go away. A meta tag no index on the pages is the better solution. This allows search engines to "read" you page, see the no index tag and then work to remove the pages from index. A robots.txt doesn't necessarily accomplish the same result.
-
If you can do a 1-to-1 page canonicalization (each page on .co.uk is canonicaled to the equivalent page on the .com) then I would do that.
Otherwise, I would noindex the backup site.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to go about merging 2 sites with significant search volume?
Hi everyone! A client of ours ('Company A') recently acquired another company ('Company B') - both brands carry weight within their industry. Company A's brand name currently registers over 6,500 searches per month, while Company B's brand name draws about 2,500 searches per month. While Company B is smaller, their search volume isn't insignificant. The powers that be plan to discontinue Company B's site at an unspecified date in the future, but it's on the backburner. We'd obviously like to transfer as much of their current ranking as possible, but we also don't want to confuse users. There's additional search volume for term variations such as 'Company B jobs' & 'Company B locations' that we'd like to capture for as long as there's still volume there. Would a microsite with Company B's look & feel (to make it easier to house pages built to capture careers/locations searches) justify its inherent cost, or would it be just as valuable to build a series of landing pages on Company A's site? (Obviously assuming that valid redirects would be in place once Company B's site is taken down.) Thanks in advance!
Intermediate & Advanced SEO | | wilcoxcm0 -
Bolded words in search results
are those synonyms or semantically related keywords ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Javascript search results & Pagination for SEO
Hi On this page http://www.key.co.uk/en/key/workbenches we have javascript on the paginated pages to sort the results, the URL displayed and the URL linked to are different. e.g. The paginated pages link to for example: page2 http://www.key.co.uk/en/key/workbenches#productBeginIndex:30&orderBy:5&pageView:list& The list is then sorted by javascript. Then the arrows either side of pagination link to e.g. http://www.key.co.uk/en/key/workbenches?page=3 - this is where the rel/prev details are - done for SEO But when clicking on this arrow, the URL loaded is different again - http://www.key.co.uk/en/key/workbenches#productBeginIndex:60&orderBy:5&pageView:list& I did not set this up, but I am concerned that the URL http://www.key.co.uk/en/key/workbenches?page=3 never actually loads, but it's linked to Google can crawl it. Is this a problem? I am looking to implement a view all option. Thank you
Intermediate & Advanced SEO | | BeckyKey0 -
Google Search Console Crawl Errors?
We are using Google Search Console to monitor Crawl Errors. It seems Google is listing errors that are not actual errors. For instance, it shows this as "Not found": https://tapgoods.com/products/tapgoods__8_ft_plastic_tables_11_available So the page does not exist, but we cannot find any pages linking to it. It has a tab that shows Linked From, but if I look at the source of those pages, the link is not there. In this case, it is showing the front page (listed twice, both for http and https). Also, one of the pages it shows as linking to the non-existant page above is a non-existant page. We marked all the errors as fixed last week and then this week they came up again. 2/3 are the same pages we marked as fixed last week. Is this an issue with Google Search Console? Are we getting penalized for a non existant issue?
Intermediate & Advanced SEO | | TapGoods0 -
One site, two blogs, URL structure?
I address a two sided market: consumer research and school fundraising. Essentially parents answer research surveys to generate proceeds for their school. My site will have a landing page at www.centiment.co that directs users to two different sub-landing pages, one related to research and one related to school fundraising. I am going to create two blogs and I am wondering if I should run off one installation of wordpress.org or two? The goal here is to optimize SEO. Separate URL paths by topic are clean but they require two installations of wordpress.org www.centiment.co/research/blog www.centiment.co/fundraising/blog If were to use one installation of wordpress it would be www.centiment.co/blog and then I would have a category for fundraising and a category for research. This is a little simpler. My concern is that it will confuse google and damage my SEO given general blog posts about fundraising are far different then those about research. Any suggestions? Again I don't want to compromise my SEO as I'm creating a blog to improve my SEO. Any insights are much appreciated. Thank you!
Intermediate & Advanced SEO | | kurtw14
Kurt0 -
URL Spoof Issue in Search Results
Hello! We could use some assistance diagnosing an issue. In order to avoid asking a convoluted question, I will try to break it down below: 1. A random foreign site is hacked and a subdirectory is added that is completely irrelevant to the root. a). i.e. http://www.um.org/prom_dresses/ 2. http://www.um.org/prom_dresses/ is just a phishing prom dress page 3. When you search "prom dress shop", the website that used to rank first (for good reason) was www.promdressshop.com. 4. www.promdressshop.com's home page has now been replaced by: um.org/prom_dresses/ – who is using prom dress shop's title tag and meta description. How is it possible that this hacked page (on um.org) is not only ranking above us, but is also starting to replace www.promdressshop.com's pages in search results. We do not believe www.promdressshop.com has been hacked but are open to any ideas. Please let me know if you would like any additional info. Thanks in advance! new
Intermediate & Advanced SEO | | LogicalMediaGroup0 -
Google de-indexed a page on my site
I have a site which is around 9 months old. For most search terms we rank fine (including top 3 rankings for competitive terms). Recently one of our pages has been fluctuating wildly in the rankings and has now disappeared altogether from the rankings for over 1 week. As a test I added a similar page to one of my other sites and it ranks fine. I've checked webmaster tools and there is nothing of note there. I'm not really sure what to do at this stage. Any advice would me much appreciated!
Intermediate & Advanced SEO | | deelo5550 -
SIte Redesign - Disaster for Organic Traffic
A client just redesigned their site and launched it around May 30. The organic traffic has had a MAJOR drop and has not returned yet. All of the old pages have been 301 redirected to the new pages. Any thoughts on what could be causing this to www.brickhousesecurity.com? In Google Webmaster Tools, before the redesign we were receiving about 300,000 impressions and 10-12,000 clicks. Now the impressions are only 100,000 with half as many clicks. Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0