Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate content on subdomains.
-
Hi Mozer's,
I have a site www.xyz.com and also geo targeted sub domains www.uk.xyz.com, www.india.xyz.com and so on. All the sub domains have the content which is same as the content on the main domain that is www.xyz.com.
So, I want to know how can i avoid content duplication.
Many Thanks!
-
It would probably be better (and more likely to get you responses) if you started a new question - this one is three years old. Generally, I think it depends on your scope. If you need some kind of separation (corporate, legal, technical), then separate domains or sub-domains may make sense. They're also easier to target, in some ways. However, you're right that authority may be diluted and you'll need more marketing effort against each one.
If resources are limited and you don't need each country to be a fully separate entity, then you'll probably have less headaches with sub-folders. I'm speaking in broad generalities, though - this is a big decision that depends a lot on the details.
-
Dear all,
I have bought 30 geo top level domains. This is for an ecommerce project that has not launcehd yet (and isn't indexed by Google).
I am now at a point where I can change/consolidate all domains as sub domains or sub folders or keep things as they are.
I just worry that link building would be scattered and not focused and that it might be better to concentrate the efforts on one domain.
What are your views on this?
Many thanks.
-
Yeah - I'm really afraid that stacking all those sub-domains is going to cause you long-term issues with your link-building, and that some of those sub-domains could fragment. If the country needs to be in a sub-domain, then I think the hybrid approach (with "/shop" as a sub-folder) may cause you less trouble.
I will warn, though, that any change like this carries some risk. You'll have to put proper 301-redirects in place.
I might try the href lang tags first, though, and see if it helps the current problem (it may take a few weeks). Changing too many aspects of the on-page SEO at once could cause you a lot of grief.
-
shop. pages are simply new pages which are added for products to be sold with ease. I think that i might move shop.uk.xyz.com pages to uk.xyz.com/shop/product as in a sub folder. Do you think this will help in passing on the link juice to those pages after the change and would be easy for me to include them in the sitemap as well??
-
If you have separate GWT profiles, then I think the XML sitemap may have to be under the sub-domain - Google has to be able to access it from a sub-domain URL. It doesn't have to be in the root of the sub-domain.
I'm not clear on what the "shop." pages are, but stacking sub-domains like that sounds like it's getting pretty messy. Why the separation?
-
I have already created separate profiles for the subdomains, but my only worry is where to place the sitemap on the server eg in the root directory of the root domain or in the root directory of the sub domain.
Coming to the (2) the pages which i want to include in the site map are my product pages. so want to know if shop.uk.xyz.com can be included in the sitemap which will be for uk.xyz.com and also if does that count as a internal page of uk.xyz.com
-
It is probably best to create separate profiles in Google Webmaster Tools, because then you can target the sub-domains to the countries in question. At that point, you could also set up separate sitemaps. It'll give you a cleaner view of how each sub-domain is indexed and ranking.
I'm not sure I understand (2) - why wouldn't you include those pages in the sitemap?
-
Thank you for your inputs. I has relly helped me understand the situation.
I will try to implement this and let you know how I have done on this. Also I had few more things on this:
1. do i require a separate sitemap and robots file for all the sub domains and where shall i place it on the server?
2. in the sub domain there are pages like shop.uk.xyz.com/product1. so can i include that in the sitemaps as those are the pages which i really want to rank for.
-
There's no perfect answer. Canonical tags would keep the sub-domains from ranking, in many cases. The cross-TLD stuff is weird, though - Google can, in some cases, ignore the canonical if they think that one sub-domain is more appropriate for the country/ccTLD the searcher is using.
Sub-domains can be tricky in and of themselves, unfortunately, because they sometimes fragment and don't pass link "juice" fully to the root domain. I generally still think sub-folders are better for cases like this, but obviously that would be a big change (and potentially risky).
You could try the rel="alternate" hreflang tags. They're similar to canonical (a bit weaker), but basically are designed to handle the same content in different languages and regions:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=189077
They're basically designed for exactly this problem. You can set the root domain to "en-US", the UK sub-domain to "en-UK", etc. I've heard generally good things, and they're low-risk, but you have to try it and see. They can be a little tricky to implement properly.
-
No, 301 and canonicals are completely different
A 301 will redirect a page and a canonical is setting the preferred version of the page. For example:
301 - you have an old version of the page that looks like this www.example.com/p?=153 and you want it to look like www.example.com/red-apples. You would use a 301 from the old page (www.example.com/p?=153) to the new page (www.example.com/red-apples)
Canonical - Lets go back to the red apples example. Lets say you have a ecommerce site and you have different ways to search for products. One way is to search by fruit and the other by color. So what you'll have is two versions of the end result. For example. You'll have www.example.com/fruit/red-apples and you might have www.example.com/red/red-apples. Since both of those pages show the same information you don't want the engines to think its duplicate content so you can add a rel=canonical link element to both pages to the preferred version of the two. (ie you might want to have the canonical be www.example.com/red-apples) That's all it does. It tells the engines your preferred version of the pages that may be the same.
Back to your original post, you really don't need to "noindex" but I thought you were having a duplicate content issue and that would solve the issue. (Generally, Google won't penalize you this sort of duplicate content)
Here is what I would do.
If you don't have Google Webmaster tools already set up then do so. Verify each version of your subdomain, (ie. india.xyz.com, uk.xyz.com, etc)(let me know if you need help) and then set your Geo Target for each them manually (You'll have to set this up manually because you have a gTLD and not a ccTLD)
How to set your Geo Target manually.
To to a particular version of your site in WMT (ie. india.xyz.com) and click on "configuration" then "settings". Under "settings" the first sections says "Geographical Target". "Check" the box and then use the drop down to select "india".
Repeat this for all of your subdomains for each specific country.
This will let Google know that you are trying to target users in a specific country.
If you have the money to invest in it, I would also try to have those subdomains hosted by a server in each particular country. (strong signal for Google)
Hope it helps.
-
Thanx Darin!
I have few doubts on this:
1. is rel canonical like a 301 redirect? As my concern is if my user goes to www.uk.xyz.com/productx , will he be redirected to to www.xyz.com/product
2. my sub domain pages are ranking in the country specific search engine. For ex, www.uk.xyz.com is ranking for keywords in google.co.uk. So if i noindex then i will loose my search engine presence in the country specific search engine.
PS the content on the pages is all same apart from the product currency.
-
I disagree. I said "noindex" not "nofollow". Link juice will be passed but not show up in the Serps. I do agree with you though that the strategy as a whole, if there is in-fact exact/duplicate content, seems to be a waste. Unless these pages are in another language, I don't see the point of this subdomain strategy.
-
Canonical will help to remove duplicate issues and also to consolidate your link values. I didn't see any issue with cross domain implementation.
If you add "noindex" to any of these pages, you won't get any link credit.
-
Short Answer: Set a canonical url on the pages to the root domain version and noindex the subdomain pages.
What this does is avoid the duplicate content problem. Generally, those subdomain pages won't rank anyway because the same information is on the "main" site. You can still build links to those subdomain pages and do a strong internal link structure to help the "main" site rankings.
The only negative to this is that the pages in your subdomain won't rank. That's not necessarily a bad thing but just know they won't. But, if the pages are truly duplicate content, they won't rank anyway.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I avoid duplicate content for a new landing page which is the same as an old one?
Hello mozers! I have a question about duplicate content for you... One on my clients pages have been dropping in search volume for a while now, and I've discovered it's because the search term isn't as popular as it used to be. So... we need to create a new landing page using a more popular search term. The page which is losing traffic is based on the search query "Can I put a solid roof on my conservatory" this only gets 0-10 searches per month according to the keyword explorer tool. However, if we changed this to "replacing conservatory roof with solid roof" this gets up to 500 searches per month. Muuuuch better! The issue is, I don't want to close down and re-direct the old page because it's got a featured snippet and sits in position 1. So I'd like to create another page instead... however, as the two are effectively the same content, I would then land myself in a duplicate content issue. If I were to put a rel="canonical" tag in the original "can I put a solid roof...." page but say the master page is now the new one, would that get around the issue?
Intermediate & Advanced SEO | | Virginia-Girtz0 -
Same site serving multiple countries and duplicated content
Hello! Though I browse MoZ resources every day, I've decided to directly ask you a question despite the numerous questions (and answers!) about this topic as there are few specific variants each time: I've a site serving content (and products) to different countries built using subfolders (1 subfolder per country). Basically, it looks like this:
Intermediate & Advanced SEO | | GhillC
site.com/us/
site.com/gb/
site.com/fr/
site.com/it/
etc. The first problem was fairly easy to solve:
Avoid duplicated content issues across the board considering that both the ecommerce part of the site and the blog bit are being replicated for each subfolders in their own language. Correct me if I'm wrong but using our copywriters to translate the content and adding the right hreflang tags should do. But then comes the second problem: how to deal with duplicated content when it's written in the same language? E.g. /us/, /gb/, /au/ and so on.
Given the following requirements/constraints, I can't see any positive resolution to this issue:
1. Need for such structure to be maintained (it's not possible to consolidate same language within one single subfolders for example),
2. Articles from one subfolder to another can't be canonicalized as it would mess up with our internal tracking tools,
3. The amount of content being published prevents us to get bespoke content for each region of the world with the same spoken language. Given those constraints, I can't see a way to solve that out and it seems that I'm cursed to live with those duplicated content red flags right up my nose.
Am I right or can you think about anything to sort that out? Many thanks,
Ghill0 -
Duplicate content due to parked domains
I have a main ecommerce website with unique content and decent back links. I had few domains parked on the main website as well specific product pages. These domains had some type in traffic. Some where exact product names. So main main website www.maindomain.com had domain1.com , domain2.com parked on it. Also had domian3.com parked on www.maindomain.com/product1. This caused lot of duplicate content issues. 12 months back, all the parked domains were changed to 301 redirects. I also added all the domains to google webmaster tools. Then removed main directory from google index. Now realize few of the additional domains are indexed and causing duplicate content. My question is what other steps can I take to avoid the duplicate content for my my website 1. Provide change of address in Google search console. Is there any downside in providing change of address pointing to a website? Also domains pointing to a specific url , cannot provide change of address 2. Provide a remove page from google index request in Google search console. It is temporary and last 6 months. Even if the pages are removed from Google index, would google still see them duplicates? 3. Ask google to fetch each url under other domains and submit to google index. This would hopefully remove the urls under domain1.com and doamin2.com eventually due to 301 redirects. 4. Add canonical urls for all pages in the main site. so google will eventually remove content from doman1 and domain2.com due to canonical links. This wil take time for google to update their index 5. Point these domains elsewhere to remove duplicate contents eventually. But it will take time for google to update their index with new non duplicate content. Which of these options are best best to my issue and which ones are potentially dangerous? I would rather not to point these domains elsewhere. Any feedback would be greatly appreciated.
Intermediate & Advanced SEO | | ajiabs0 -
Duplicate Content through 'Gclid'
Hello, We've had the known problem of duplicate content through the gclid parameter caused by Google Adwords. As per Google's recommendation - we added the canonical tag to every page on our site so when the bot came to each page they would go 'Ah-ha, this is the original page'. We also added the paramter to the URL parameters in Google Wemaster Tools. However, now it seems as though a canonical is automatically been given to these newly created gclid pages; below https://www.google.com.au/search?espv=2&q=site%3Awww.mypetwarehouse.com.au+inurl%3Agclid&oq=site%3A&gs_l=serp.3.0.35i39l2j0i67l4j0i10j0i67j0j0i131.58677.61871.0.63823.11.8.3.0.0.0.208.930.0j3j2.5.0....0...1c.1.64.serp..8.3.419.nUJod6dYZmI Therefore these new pages are now being indexed, causing duplicate content. Does anyone have any idea about what to do in this situation? Thanks, Stephen.
Intermediate & Advanced SEO | | MyPetWarehouse0 -
No-index pages with duplicate content?
Hello, I have an e-commerce website selling about 20 000 different products. For the most used of those products, I created unique high quality content. The content has been written by a professional player that describes how and why those are useful which is of huge interest to buyers. It would cost too much to write that high quality content for 20 000 different products, but we still have to sell them. Therefore, our idea was to no-index the products that only have the same copy-paste descriptions all other websites have. Do you think it's better to do that or to just let everything indexed normally since we might get search traffic from those pages? Thanks a lot for your help!
Intermediate & Advanced SEO | | EndeR-0 -
Avoiding Duplicate Content with Used Car Listings Database: Robots.txt vs Noindex vs Hash URLs (Help!)
Hi Guys, We have developed a plugin that allows us to display used vehicle listings from a centralized, third-party database. The functionality works similar to autotrader.com or cargurus.com, and there are two primary components: 1. Vehicle Listings Pages: this is the page where the user can use various filters to narrow the vehicle listings to find the vehicle they want.
Intermediate & Advanced SEO | | browndoginteractive
2. Vehicle Details Pages: this is the page where the user actually views the details about said vehicle. It is served up via Ajax, in a dialog box on the Vehicle Listings Pages. Example functionality: http://screencast.com/t/kArKm4tBo The Vehicle Listings pages (#1), we do want indexed and to rank. These pages have additional content besides the vehicle listings themselves, and those results are randomized or sliced/diced in different and unique ways. They're also updated twice per day. We do not want to index #2, the Vehicle Details pages, as these pages appear and disappear all of the time, based on dealer inventory, and don't have much value in the SERPs. Additionally, other sites such as autotrader.com, Yahoo Autos, and others draw from this same database, so we're worried about duplicate content. For instance, entering a snippet of dealer-provided content for one specific listing that Google indexed yielded 8,200+ results: Example Google query. We did not originally think that Google would even be able to index these pages, as they are served up via Ajax. However, it seems we were wrong, as Google has already begun indexing them. Not only is duplicate content an issue, but these pages are not meant for visitors to navigate to directly! If a user were to navigate to the url directly, from the SERPs, they would see a page that isn't styled right. Now we have to determine the right solution to keep these pages out of the index: robots.txt, noindex meta tags, or hash (#) internal links. Robots.txt Advantages: Super easy to implement Conserves crawl budget for large sites Ensures crawler doesn't get stuck. After all, if our website only has 500 pages that we really want indexed and ranked, and vehicle details pages constitute another 1,000,000,000 pages, it doesn't seem to make sense to make Googlebot crawl all of those pages. Robots.txt Disadvantages: Doesn't prevent pages from being indexed, as we've seen, probably because there are internal links to these pages. We could nofollow these internal links, thereby minimizing indexation, but this would lead to each 10-25 noindex internal links on each Vehicle Listings page (will Google think we're pagerank sculpting?) Noindex Advantages: Does prevent vehicle details pages from being indexed Allows ALL pages to be crawled (advantage?) Noindex Disadvantages: Difficult to implement (vehicle details pages are served using ajax, so they have no tag. Solution would have to involve X-Robots-Tag HTTP header and Apache, sending a noindex tag based on querystring variables, similar to this stackoverflow solution. This means the plugin functionality is no longer self-contained, and some hosts may not allow these types of Apache rewrites (as I understand it) Forces (or rather allows) Googlebot to crawl hundreds of thousands of noindex pages. I say "force" because of the crawl budget required. Crawler could get stuck/lost in so many pages, and my not like crawling a site with 1,000,000,000 pages, 99.9% of which are noindexed. Cannot be used in conjunction with robots.txt. After all, crawler never reads noindex meta tag if blocked by robots.txt Hash (#) URL Advantages: By using for links on Vehicle Listing pages to Vehicle Details pages (such as "Contact Seller" buttons), coupled with Javascript, crawler won't be able to follow/crawl these links. Best of both worlds: crawl budget isn't overtaxed by thousands of noindex pages, and internal links used to index robots.txt-disallowed pages are gone. Accomplishes same thing as "nofollowing" these links, but without looking like pagerank sculpting (?) Does not require complex Apache stuff Hash (#) URL Disdvantages: Is Google suspicious of sites with (some) internal links structured like this, since they can't crawl/follow them? Initially, we implemented robots.txt--the "sledgehammer solution." We figured that we'd have a happier crawler this way, as it wouldn't have to crawl zillions of partially duplicate vehicle details pages, and we wanted it to be like these pages didn't even exist. However, Google seems to be indexing many of these pages anyway, probably based on internal links pointing to them. We could nofollow the links pointing to these pages, but we don't want it to look like we're pagerank sculpting or something like that. If we implement noindex on these pages (and doing so is a difficult task itself), then we will be certain these pages aren't indexed. However, to do so we will have to remove the robots.txt disallowal, in order to let the crawler read the noindex tag on these pages. Intuitively, it doesn't make sense to me to make googlebot crawl zillions of vehicle details pages, all of which are noindexed, and it could easily get stuck/lost/etc. It seems like a waste of resources, and in some shadowy way bad for SEO. My developers are pushing for the third solution: using the hash URLs. This works on all hosts and keeps all functionality in the plugin self-contained (unlike noindex), and conserves crawl budget while keeping vehicle details page out of the index (unlike robots.txt). But I don't want Google to slap us 6-12 months from now because it doesn't like links like these (). Any thoughts or advice you guys have would be hugely appreciated, as I've been going in circles, circles, circles on this for a couple of days now. Also, I can provide a test site URL if you'd like to see the functionality in action.0 -
Tabs and duplicate content?
We own this site http://www.discountstickerprinting.co.uk/ and just a little concerned as I right clicked open in new tab on the tab content section and it went to a new page For example if you right click on the price tab and click open in new tab you will end up with the url
Intermediate & Advanced SEO | | BobAnderson
http://www.discountstickerprinting.co.uk/#tabThree Does this mean that our content is being duplicated onto another page? If so what should I do?0 -
International SEO - cannibalisation and duplicate content
Hello all, I look after (in house) 3 domains for one niche travel business across three TLDs: .com .com.au and co.uk and a fourth domain on a co.nz TLD which was recently removed from Googles index. Symptoms: For the past 12 months we have been experiencing canibalisation in the SERPs (namely .com.au being rendered in .com) and Panda related ranking devaluations between our .com site and com.au site. Around 12 months ago the .com TLD was hit hard (80% drop in target KWs) by Panda (probably) and we began to action the below changes. Around 6 weeks ago our .com TLD saw big overnight increases in rankings (to date a 70% averaged increase). However, almost to the same percentage we saw in the .com TLD we suffered significant drops in our .com.au rankings. Basically Google seemed to switch its attention from .com TLD to the .com.au TLD. Note: Each TLD is over 6 years old, we've never proactively gone after links (Penguin) and have always aimed for quality in an often spammy industry. **Have done: ** Adding HREF LANG markup to all pages on all domain Each TLD uses local vernacular e.g for the .com site is American Each TLD has pricing in the regional currency Each TLD has details of the respective local offices, the copy references the lacation, we have significant press coverage in each country like The Guardian for our .co.uk site and Sydney Morning Herlad for our Australia site Targeting each site to its respective market in WMT Each TLDs core-pages (within 3 clicks of the primary nav) are 100% unique We're continuing to re-write and publish unique content to each TLD on a weekly basis As the .co.nz site drove such little traffic re-wrting we added no-idex and the TLD has almost compelte dissapread (16% of pages remain) from the SERPs. XML sitemaps Google + profile for each TLD **Have not done: ** Hosted each TLD on a local server Around 600 pages per TLD are duplicated across all TLDs (roughly 50% of all content). These are way down the IA but still duplicated. Images/video sources from local servers Added address and contact details using SCHEMA markup Any help, advice or just validation on this subject would be appreciated! Kian
Intermediate & Advanced SEO | | team_tic1