How to get rid of duplicate content
-
I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?
-
Hi Michelle,
In addition to what Alan said, I might take a couple of more actions on this page. Since it sounds like you're a beginner, don't worry if you don't understand all this stuff, but I wanted to include it for anyone else reading this question.
I've also tried to include links to relevant sources where you can learn about each topic addressed.
1. Yes, add the canonical. This basically tells search engines that even those these pages all have different URL addresses, they are meant to be the same page.
http://www.seomoz.org/learn-seo/canonicalization
2. The "numbers at the end" are called URL parameters, and there is a setting in Google Webmaster Tools that you can use to tell them to ignore parameter settings. This is advanced stuff, and Google does a pretty good job these days of figuring this stuff out on their own, so it's best not to adjust these settings unless you're comfortable doing so.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
3. Honestly, there's no reason for this page to appear in search results, or waste search engine resources crawling the page. So, if possible, I'd add a meta robots "NO INDEX, FOLLOW" tag to the head element of the HTML.
http://www.robotstxt.org/meta.html
4. Additionally, I'd slap a nofollow on any links pointing these pages, and/or block crawling of this page via robots.txt, because there is no reason to waste your search engine crawl allowance on these pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
5. And finally, I think it's perfectly legitimate to block these thowaway pages using robots.txt. Alan has good point about link juice - it's usually best not to block pages using robots.txt, but in this particular case I think it would be fine.
http://www.seomoz.org/learn-seo/robotstxt
Honestly, addressing all of these issues in this particular case probably won't make a huge impact on your SEO. But as you can see, there are multiple ways of dealing with the problem that touch on many of the fundamental techniques of Search Engine Optimization.
Finally, to answer your question in a straitforward answer, to dissallow this directory in robots.txt, your file would look something like this.
User-agent: *
Disallow: *mailto/Which will block anything in the /mailto/ directory.
Hope this helps. Best of luck with your SEO!
-
Michelle,
I agree with Alan, if your confused with the Rel=cannonical tag, I recommend your read the SEOmoz beginners guide to seo. More specifically this page: http://www.seomoz.org/beginners-guide-to-seo/search-engine-tools-and-services, the whole book/guide goes through a lot of best practices, and even advanced SEOs can kind of use this guide as a "bible"
Hope this helps
-
100% best move forward
-
Link juice flows though links only if the linked page is in the index, if not then the link juice just goines up in smoke, it is wasted, so you dont want to link to a page that is not indexed.
A canonical tag tells the search engine to give the credit to teh page in the canonical tag.
so with a canonical tag pointing to page.html from page.html?id5 with tell the search engine they are the same page, and to give credit to teh canonical.
this is how to createa canonical tag
http://mycanonialpage.com/page.html/" /> -
link juice leaks?? canonical tag? ummmmm I thought I was farily smart until just this minute- I have NO idea what you are talking about
-
dont use robots.txt
You will cause link juice leaks for each link that points to a page behind a rebots.txt exclude
The best thing to do is use a canonical tag pointing to http://deceptionbytes.com/component/mailto
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm getting duplicate content created with a random string of character added to the end of my blog post permalinks?
In an effort to clean up my blog content I noticed that I have a lot of posts getting tagged for duplicate content. It looks like ... http://carwoo.com/blog/october-sales-robust-stateside-european-outlook-poor-for-ford http://carwoo.com/blog/october-sales-robust-stateside-european-outlook-poor-for-ford/954bf0df0a0d02b700a06816f2276fa5/ Any thoughts on how and why this would be happening?
Technical SEO | | editabletext0 -
Fixing Duplicate Pages Titles/Content
I have a DNN site, which I created friendly URL's for; however, the creation of the friendly URL's then created duplicate page content and titles. I was able to fix all but two URL's with rel="canonical" links. BUT The two that are giving me the most issues are pointing to my homepage. When I added the rel = "canonical" link the page then becomes not indexable. And for whatever reason, I can't add a 301 redirect to the homepage because it then gives me "can't display webpage" error message. I am new to SEO and to DNN, so any help would be greatly appreciated.
Technical SEO | | VeronicaCFowler0 -
Determining where duplicate content comes from...
I am getting duplicate content warnings on the SEOMOZ crawl. I don't know where the content is duplicated. Is there a site that will find duplicate content?
Technical SEO | | JML11790 -
404 and Duplicate Content.
I just submitted my first campaign. And it's coming up with a LOT of errors. Many of them I feel are out of my control as we use a CMS for RV dealerships. But I have a couple of questions. I got a 404 error and SEO Moz tells me the link, but won't tell me where that link originated from, so I don't know where to go to fix it. I also got a lot of duplicate content, and it seems a lot of them are coming from "tags" on my blog. Is that something I should be concerned about? I will have a lot more question probably as I'm new to using this tool Thanks for the responses! -Brandon here is my site: floridaoutdoorsrv.com I welcome any advice or input!
Technical SEO | | floridaoutdoorsrv0 -
Duplicate Content Issue
Very strange issue I noticed today. In my SEOMoz Campaigns I noticed thousands of Warnings and Errors! I noticed that any page on my website ending in .php can be duplicated by adding anything you want to the end of the url, which seems to be causing these issues. Ex: Normal URL - www.example.com/testing.php Duplicate URL - www.example.com/testing.php/helloworld The duplicate URL displays the page without the images, but all the text and information is present, duplicating the Normal page. I Also found that many of my PDFs seemed to be getting duplicated burried in directories after directories, which I never ever put in place. Ex: www.example.com/catalog/pdfs/testing.pdf/pdfs/another.pdf/pdfs/more.pdfs/pdfs/ ... when the pdfs are only located in a pdfs directory! I am very confused on how to fix this problem. Maybe with some sort of redirect?
Technical SEO | | hfranz0 -
Duplicate Content within Website - problem?
Hello everyone, I am currently working on a big site which sells thousands of widgets. However each widget has ten sub widgets (1,2,3... say) My strategy with this site is to target the long tail search so I'm creating static pages for each possibly variation. So I'll have a main product page on widgets in general, and also a page on widget1, page on widget2 etc etc. I'm anticipating that because there's so much competition for searches relating to widgets in general, I'll get most of my traffic from people being more specific and searching for widget1 or widget 7 etc. Now here's the problem - I am getting a lot of content written for this website - a few hundred words for each widget. However I can't go to the extreme of writing unique content for each sub widget - that would mean 10's of 1,000's of articles. So... what do I do with the content. Put it on the main widget page was the plan but what do I do about the sub pages. I could put it there and it would make perfect sense to a reader and be relevant to people specifically looking for widget1, say, but could there be a issue with it being viewed as duplicate content. One idea was to just put a snippet (first 100 words) on each sub page with a link back to the main widget page where the full copy would be. Not sure whether I've made myself clear at all but hopefully I have - or I can clarify. Thanks so much in advance David
Technical SEO | | OzDave0 -
Why are my pages getting duplicate content errors?
Studying the Duplicate Page Content report reveals that all (or many) of my pages are getting flagged as having duplicate content because the crawler thinks there are two versions of the same page: http://www.mapsalive.com/Features/audio.aspx http://www.mapsalive.com/Features/Audio.aspx The only difference is the capitalization. We don't have two versions of the page so I don't understand what I'm missing or how to correct this. Anyone have any thoughts for what to look for?
Technical SEO | | jkenyon0 -
About duplicate content
Hi i'm a new guy around here, but i'm having this problem in my website. Using de Seomoz tools i ran a camping to my website, in results i get to many errors for duplicate conten, for example, http://www.mysite/blue/ http://www.mysite/blue/index.html, so my question is, what is the best way to resolve this problem, use a 301 or use the rel canonical tag? Wich url will be consider for main url, Thanks for yor help.
Technical SEO | | NorbertoMM0