Subdomains - duplicate content - robots.txt
-
Our corporate site provides MLS data to users, with the end goal of generating leads. Each registered lead is assigned to an agent, essentially in a round robin fashion. However we also give each agent a domain of their choosing that points to our corporate website. The domain can be whatever they want, but upon loading it is immediately directed to a subdomain. For example, www.agentsmith.com would be redirected to agentsmith.corporatedomain.com. Finally, any leads generated from agentsmith.easystreetrealty-indy.com are always assigned to Agent Smith instead of the agent pool (by parsing the current host name). In order to avoid being penalized for duplicate content, any page that is viewed on one of the agent subdomains always has a canonical link pointing to the corporate host name (www.corporatedomain.com). The only content difference between our corporate site and an agent subdomain is the phone number and contact email address where applicable.
Two questions:
-
Can/should we use robots.txt or robot meta tags to tell crawlers to ignore these subdomains, but obviously not the corporate domain?
-
If question 1 is yes, would it be better for SEO to do that, or leave it how it is?
-
-
Sorry, god only knows how I missed that.
Well in that case I think you are doing what is recomended, I generally think of the canonical tag as similar to a 301 redirect. You are telling the search engines that the two pages should be treated as one and then specifying the page that is to be the front-man of the two.
I think the normal proceedure is to have robot.txt for private/personal information, nofollow and noindex for duplicate content however the canonical tag is an easy solution to duplicate content as it is simply one line in the header.
-
Thanks SeoStallion.
That is how we are handling it currently.
-
I would personally suggest using the canonical tag to identify the original content. For example place this into the header of the pages with duplicate content:
This will ensure that the search engines know that it is not the original content and that the page in the link is where the original content is found.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt wildcards - the devs had a disagreement - which is correct?
Hi – the lead website developer was assuming that this wildcard: Disallow: /shirts/?* would block URLs including a ? within this directory, and all the subdirectories of this directory that included a “?” The second developer suggested that this wildcard would only block URLs featuring a ? that come immediately after /shirts/ - for example: /shirts?minprice=10&maxprice=20 BUT argued that this robots.txt directive would not block URLS featuring a ? in sub directories - e.g. /shirts/blue?mprice=100&maxp=20 So which of the developers is correct? Beyond that, I assumed that the ? should feature a * on each side of it – for example - /? - to work as intended above? Am I correct in assuming that?
Intermediate & Advanced SEO | | McTaggart0 -
Tools to scan entire site for duplicate content?
HI guys, Just wondering if anyone knows of any tools to scan a site for duplicate content (with other sites on the web). Looking to quickly identify product pages containing duplicate content/duplicate product descriptions for E-commerce based websites. I know copy scape can which can check up to 10,000 pages in a single operation with Batch Search. But just wondering if there is anything else on the market i should consider looking at? Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Robots.txt assistance
I want to block all the inner archive news pages of my website in robots.txt - we don't have R&D capacity to set up rel=next/prev or create a central page that all inner pages would have a canonical back to, so this is the solution. The first page I want indexed reads:
Intermediate & Advanced SEO | | theLotter
http://www.xxxx.news/?p=1 all subsequent pages that I want blocked because they don't contain any new content read:
http://www.xxxx.news/?p=2
http://www.xxxx.news/?p=3
etc.... There are currently 245 inner archived pages and I would like to set it up so that future pages will automatically be blocked since we are always writing new news pieces. Any advice about what code I should use for this? Thanks!0 -
Dealing with close content - duplicate issue for closed products
Hello I'm dealing with some issues. Moz analyses is telling me that I have duplicate on some of my products pages. My issue is that: Concern very similar products IT products are from the same range Just the name and pdf are different Do you think I should use canonical url ? Or it will be better to rewrite about 80 descriptions (but description will be almost the same) ? Best regards.
Intermediate & Advanced SEO | | AymanH0 -
How to Avoid Duplicate Content Issues with Google?
We have 1000s of audio book titles at our Web store. Google's Panda de-valued our site some time ago because, I believe, of duplicate content. We get our descriptions from the publishers which means a good
Intermediate & Advanced SEO | | lbohen
deal of our description pages are the same as the publishers = duplicate content according to Google. Although re-writing each description of the products we offer is a daunting, almost impossible task, I am thinking of re-writing publishers' descriptions using The Best Spinner software which allows me to replace some of the publishers' words with synonyms. I have re-written one audio book title's description resulting in 8% unique content from the original in 520 words. I did a CopyScape Check and it reported "65 duplicates." CopyScape appears to be reporting duplicates of words and phrases within sentences and paragraphs. I see very little duplicate content of full sentences
or paragraphs. Does anyone know whether Google's duplicate content algorithm is the same or similar to CopyScape's? How much of an audio book's description would I have to change to stay away from CopyScape's duplicate content algorithm? How much of an audio book's description would I have to change to stay away from Google's duplicate content algorithm?0 -
How to Fix Duplicate Page Content?
Our latest SEOmoz crawl reports 1138 instances of "duplicate page content." I have long been aware that our duplicate page content is likely a major reason Google has de-valued our Web store. Our duplicate page content is the result of the following: 1. We sell audio books and use the publisher's description (narrative) of the title. Google is likely recognizing the publisher as the owner / author of the description and our description as duplicate content. 2. Many audio book titles are published in more than one format (abridged, unabridged CD, and/or unabridged MP3) by the same publisher so the basic description on our site would be the same at our Web store for each format = more duplicate content at our Web store. Here's are two examples (one abridged, one unabridged) of one title at our Web store. Kill Shot - abridged Kill Shot - unabridged How much would the body content of one of the above pages have to change so that a SEOmoz crawl does NOT say the content is duplicate?
Intermediate & Advanced SEO | | lbohen0 -
Should I redirect all my subdomains to a single unique subdomain to eliminate duplicate content?
Hi there! I've been working on http://duproprio.com for a couple of years now. In the early stages of the website, we've put into place a subdomain wildcard, that allowed us to create urls like this on the fly : http://{some-city}.duproprio.com This brought us instantly a lot of success in terms of traffic due to the cities being great search keywords. But now, business has grown, and as we all know, duplicate content is the devil so I've been playing with the idea of killing (redirecting) all those urls to their equivalent on the root domain. http://some-city.duproprio.com/some-listing-1234 would redirect to equivalent page at : http://duproprio.com/some-listing-1234 Even if my redirections are 301 permanent, there will be some juice lost for each link redirected that are actually pointing to my old subdomains This would also imply to redirect http://www.duproprio.com to http://duproprio.com. Which is probably the part I'm most anxious about since the incoming links are almost 50/50 between those 2 subdomains... Bringing everything back into a single subdomain is the thing to do in order to get all my seo juice together, this part is obvious... But what can I do to make sure that I don't end up actually losing traffic instead of gaining authority? Can you help me get the confidence I need to make this "move" without risking to lose tons of traffic? Thanks a big lot!
Intermediate & Advanced SEO | | DuProprio.com0 -
Avoiding duplicate content on an ecommerce site
Hi all, I have an ecommerce site which has a standard block of text on 98% of the product pages. The site also has a blog. Because these cause duplicate content and duplicate title issues respectively, how can I ever get around this? Would having the standard text on the product pages displayed as an image help? And how can I stop the blog being listed as duplicate titles without a nofollow? We already have the canonical attribute applied to some areas where this is appropriate e.g. blog and product categories. Thanks for your help 🙂
Intermediate & Advanced SEO | | CMoore850