Using robots.txt to deal with duplicate content
-
I have 2 sites with duplicate content issues.
One is a wordpress blog.
The other is a store (Pinnacle Cart).
I cannot edit the canonical tag on either site. In this case, should I use robots.txt to eliminate the duplicate content?
-
It will be any part of the URL that doesn't handle navigation, so look at what you can delete off the URL without breaking the link to the product page.
Take a look at this: http://googlewebmastercentral.blogspot.com/2009/10/new-parameter-handling-tool-helps-with.html
Remember, this will only work with Google!
This is another interesting video from Matt Cutts about removing content from Google: http://googlewebmastercentral.blogspot.com/2008/01/remove-your-content-from-google.html
-
If the urls look like this...
Would I tell Google to ignore p, mode, parent, or CatalogSetSortBy? Just one of those or all of those?
Thanks!!!
-
For Wordpress try : http://wordpress.org/extend/plugins/canonical/
also look at Yoast's Wordpress SEO plugin referenced on that page - I love it!
and for the duplicate content caused by the dymanic content on the pinnacle cart you can use the Google Webmasters tool to tell the Google to ignore certain parameters - go to Site configuration - Settings - Parameter handling and add the variables you wish to ignore to this list.
-
Hi,
The two sites are unrelated to each other so my concern is not duplicate content between the two, there is none.
However, on each of the sites I have the duplicate content issues. I do have admin privileges to both sites.
If there is a Wordpress plug in that would be great. Do you have one that you would recommend?
For my ecommerce site using pinnacle cart, I have duplicates because of the way people can search on the site. For example:
|
http://www.domain.com/accessories/
http://www.domain.com/accessories/?p=catalog&mode=catalog&parent=17&pg=1&CatalogSetSortBy=date
http://www.domain.com/accessories/?p=catalog&mode=catalog&parent=17&pg=1&CatalogSetSortBy=name
http://www.domain.com/accessories/?p=catalog&mode=catalog&parent=17&pg=1&CatalogSetSortBy=price
|
These all show as duplicate content in my webmaster tools reports. I don't have the ability to edit each head tag of pages in order to add a canonical link on this site.
-
What are your intentions here? Do you intend to leave both sites running? Can you give us more information on the sites? Are they aged domains, is one/any/both of them currently attracting any inbound links, are they ranking? What is the purpose of the duplicate content?
Are you looking to redirect traffic from one of the sites to the other using 301 redirect?
Or do you want both sites visible - using the Canonical link tag?
(I am concerned that you say you 'cannot edit the tag'? Do you not have full Admin access to either site?
There are dedicated Canonical management plugins for Wordpress (if you have access to the wp-admin area)
You are going to need some admin priviledges to make any alterations to the site so that you can correct this.
Let us know a bit more please!
These articles may be useful as they provide detailed best practice info on redirects:
http://www.google.com/support/webmasters/bin/answer.py?answer=66359
http://www.seomoz.org/blog/duplicate-content-block-redirect-or-canonical
Check out this article on redirects
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you use no-index to counter duplicate content across separate domains?
Hi Moz Community, I have a client who is splitting out a sub brand from a company website to its own domain. They have lots of content around the theme and they want to migrate most of the content out to the new domain, but they also wanted to keep that content on the main site as the main site gets lots of traffic. My question is, as they want search traffic to go to the new site, but want to keep the best content on the original site too, so it can be found in the nav, if they no-index identical content on main site and index content on the new site will they still be penalised for duplicate content? Our advice has been to keep the thematic content on both sites but make them different enough so they are not considered duplicate - we routinely write the same blog post in 50 different ways for them but their Head of Web asked if the no-index is a route, which means they don't need to pay for and wait for brand new content? They are comfortable in losing traffic until the new domain gets traction. In theory, if they are telling Google not to index or rank the main site content, the new site shouldn't be penalised but I'm not confident giving that advice as I've never been asked to do this before. Thoughts?
Technical SEO | | Algorhythm_jT0 -
Crawl Diagnostics: Duplicate Content Issues
The Moz crawl diagnostic is showing that I have some duplicate content issues on my site. For the most part, these are variations of the same product that are listed individually (i.e size/color). What would be the best way to deal with this? Choose one variation of the product and add a canonical tag? Thanks
Technical SEO | | inhouseseo0 -
How to fix duplicate content caused by tags?
I use SEMRush, and the issue they are finding is I have 30 duplicate content issues. All seem to be caused by the tags I add in my portfolio pieces. I have looked at my SEO settings (taxonomies, etc) in the Wordpress site, and don't know what I am doing wrong....any advice how to fix? I have attached a screen shot VsYv2wY
Technical SEO | | cschwartzel0 -
How to avoid duplicate content when blogging from a site
I have a wordpress plastic surgery website. I have a wordpress blog on the site. My concern is avoiding duplicate content penalties when I blog. I use my blog to add new information about procedures that have pages on the same topic on the main site. Invariably same keywords and phrases can appear in the blog-will this be considered Duplicate content? Also is it black hat to insert anchor text in a blog linking back to site content-ie internal link or is one now and then helpful
Technical SEO | | wianno1680 -
How different does content need to be to avoid a duplicate content penalty?
I'm implementing landing pages that are optimized for specific keywords. Some of them are substantially the same as another page (perhaps 10-15 words different). Are the landing pages likely to be identified by search engines as duplicate content? How different do two pages need to be to avoid the duplicate penalty?
Technical SEO | | WayneBlankenbeckler0 -
Is this considered Duplicate Content?
Good Morning, Just wondering if these pages are considered duplicate content? http://goo.gl/t9lkm http://goo.gl/mtfbf Can you please take a look and advise if it is considered duplicate and if so, what should i do to fix... Thanks
Technical SEO | | Prime850 -
Robots.txt for subdomain
Hi there Mozzers! I have a subdomain with duplicate content and I'd like to remove these pages from the mighty Google index. The problem is: the website is build in Drupal and this subdomain does not have it's own robots.txt. So I want to ask you how to disallow and noindex this subdomain. Is it possible to add this to the root robots.txt: User-agent: *
Technical SEO | | Partouter
Disallow: /subdomain.root.nl/ User-agent: Googlebot
Noindex: /subdomain.root.nl/ Thank you in advance! Partouter0 -
Strange duplicate content issue
Hi there, SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve. For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career. The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS. The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?
Technical SEO | | CreativeChoices0