How to get rid of duplicate content

Mishelm

I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?

Cyrus-Shepard

Hi Michelle,

In addition to what Alan said, I might take a couple of more actions on this page. Since it sounds like you're a beginner, don't worry if you don't understand all this stuff, but I wanted to include it for anyone else reading this question.

I've also tried to include links to relevant sources where you can learn about each topic addressed.

1. Yes, add the canonical. This basically tells search engines that even those these pages all have different URL addresses, they are meant to be the same page.

http://www.seomoz.org/learn-seo/canonicalization

2. The "numbers at the end" are called URL parameters, and there is a setting in Google Webmaster Tools that you can use to tell them to ignore parameter settings. This is advanced stuff, and Google does a pretty good job these days of figuring this stuff out on their own, so it's best not to adjust these settings unless you're comfortable doing so.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687

3. Honestly, there's no reason for this page to appear in search results, or waste search engine resources crawling the page. So, if possible, I'd add a meta robots "NO INDEX, FOLLOW" tag to the head element of the HTML.

http://www.robotstxt.org/meta.html

4. Additionally, I'd slap a nofollow on any links pointing these pages, and/or block crawling of this page via robots.txt, because there is no reason to waste your search engine crawl allowance on these pages.

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569

5. And finally, I think it's perfectly legitimate to block these thowaway pages using robots.txt. Alan has good point about link juice - it's usually best not to block pages using robots.txt, but in this particular case I think it would be fine.

http://www.seomoz.org/learn-seo/robotstxt

Honestly, addressing all of these issues in this particular case probably won't make a huge impact on your SEO. But as you can see, there are multiple ways of dealing with the problem that touch on many of the fundamental techniques of Search Engine Optimization.

Finally, to answer your question in a straitforward answer, to dissallow this directory in robots.txt, your file would look something like this.

User-agent: *
Disallow: *mailto/

Which will block anything in the /mailto/ directory.

Hope this helps. Best of luck with your SEO!

ZacharyRussell

Michelle,

I agree with Alan, if your confused with the Rel=cannonical tag, I recommend your read the SEOmoz beginners guide to seo. More specifically this page: http://www.seomoz.org/beginners-guide-to-seo/search-engine-tools-and-services, the whole book/guide goes through a lot of best practices, and even advanced SEOs can kind of use this guide as a "bible"

Hope this helps

Chenzo

100% best move forward

AlanMosley

Link juice flows though links only if the linked page is in the index, if not then the link juice just goines up in smoke, it is wasted, so you dont want to link to a page that is not indexed.

A canonical tag tells the search engine to give the credit to teh page in the canonical tag.

so with a canonical tag pointing to page.html from page.html?id5 with tell the search engine they are the same page, and to give credit to teh canonical.

this is how to createa canonical tag
http://mycanonialpage.com/page.html/" />

Mishelm

link juice leaks?? canonical tag? ummmmm I thought I was farily smart until just this minute- I have NO idea what you are talking about

AlanMosley

dont use robots.txt

You will cause link juice leaks for each link that points to a page behind a rebots.txt exclude

The best thing to do is use a canonical tag pointing to http://deceptionbytes.com/component/mailto

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to get rid of duplicate content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

I'm getting duplicate content created with a random string of character added to the end of my blog post permalinks?

Fixing Duplicate Pages Titles/Content

Determining where duplicate content comes from...

404 and Duplicate Content.

Duplicate Content Issue

Duplicate Content within Website - problem?

Why are my pages getting duplicate content errors?

About duplicate content