How to get rid of duplicate content
-
I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?
-
Hi Michelle,
In addition to what Alan said, I might take a couple of more actions on this page. Since it sounds like you're a beginner, don't worry if you don't understand all this stuff, but I wanted to include it for anyone else reading this question.
I've also tried to include links to relevant sources where you can learn about each topic addressed.
1. Yes, add the canonical. This basically tells search engines that even those these pages all have different URL addresses, they are meant to be the same page.
http://www.seomoz.org/learn-seo/canonicalization
2. The "numbers at the end" are called URL parameters, and there is a setting in Google Webmaster Tools that you can use to tell them to ignore parameter settings. This is advanced stuff, and Google does a pretty good job these days of figuring this stuff out on their own, so it's best not to adjust these settings unless you're comfortable doing so.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
3. Honestly, there's no reason for this page to appear in search results, or waste search engine resources crawling the page. So, if possible, I'd add a meta robots "NO INDEX, FOLLOW" tag to the head element of the HTML.
http://www.robotstxt.org/meta.html
4. Additionally, I'd slap a nofollow on any links pointing these pages, and/or block crawling of this page via robots.txt, because there is no reason to waste your search engine crawl allowance on these pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
5. And finally, I think it's perfectly legitimate to block these thowaway pages using robots.txt. Alan has good point about link juice - it's usually best not to block pages using robots.txt, but in this particular case I think it would be fine.
http://www.seomoz.org/learn-seo/robotstxt
Honestly, addressing all of these issues in this particular case probably won't make a huge impact on your SEO. But as you can see, there are multiple ways of dealing with the problem that touch on many of the fundamental techniques of Search Engine Optimization.
Finally, to answer your question in a straitforward answer, to dissallow this directory in robots.txt, your file would look something like this.
User-agent: *
Disallow: *mailto/Which will block anything in the /mailto/ directory.
Hope this helps. Best of luck with your SEO!
-
Michelle,
I agree with Alan, if your confused with the Rel=cannonical tag, I recommend your read the SEOmoz beginners guide to seo. More specifically this page: http://www.seomoz.org/beginners-guide-to-seo/search-engine-tools-and-services, the whole book/guide goes through a lot of best practices, and even advanced SEOs can kind of use this guide as a "bible"
Hope this helps
-
100% best move forward
-
Link juice flows though links only if the linked page is in the index, if not then the link juice just goines up in smoke, it is wasted, so you dont want to link to a page that is not indexed.
A canonical tag tells the search engine to give the credit to teh page in the canonical tag.
so with a canonical tag pointing to page.html from page.html?id5 with tell the search engine they are the same page, and to give credit to teh canonical.
this is how to createa canonical tag
http://mycanonialpage.com/page.html/" /> -
link juice leaks?? canonical tag? ummmmm I thought I was farily smart until just this minute- I have NO idea what you are talking about
-
dont use robots.txt
You will cause link juice leaks for each link that points to a page behind a rebots.txt exclude
The best thing to do is use a canonical tag pointing to http://deceptionbytes.com/component/mailto
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do mobile and desktop sites that pull content from the same source count as duplicate content?
We are about to launch a mobile site that pulls content from the same CMS, including metadata. They both have different top-level domains, however (www.abcd.com and www.m.abcd.com). How will this affect us in terms of search engine ranking?
Technical SEO | | ovenbird0 -
URL Mixed Cases and Duplicate Content
Hi There, I have a question for you. I am working on a website where by typing any letter of the URL in lower or upper case, it will give a 200 code. Examples www.examples.com/page1/product www.examples.com/paGe1/Product www.examples.com/PagE1/prOdUcT www.examples.com/pAge1/proODUCt and so on… Although I cannot find evidence of backlinks pointing to my page with mixed cases, shall I redirect or rel=canonical all the possible combination of the cases to a lower version of them in order to prevent duplicate content? And if so, do you have any advice on how to complete such a massive job? Thanks a lot
Technical SEO | | Midleton0 -
Duplicate page/Title content - Where?
Hi, I have just run a crawl on a new clients site, and there is several 'duplicate page content' and 'Duplicate Page Title'' issues. But I cannot find any duplicate content. And to make matters worse. The actual report has confused me. Just for example the about us page is showing in both reports and for both under 'Other URLs' it is showing 1? Why? Does this mean there is 1 other page with duplicate page title? or duplicate page content? Where are the pages that have the duplicate page titles, or duplicate page content? I have run scans using other software and a copyscape scan. And apart from missing page titles, I cannot find any page that has duplicate titles or content. I can find % percentages of pages with similar/same page titles/content. But this is only partial and contextually correct. So I understand that SEO Moz may pick percentage of content, which is fine, and therefore note that there is duplicate content/page titles. But I cannot seem to figure out where I would the source of the duplicate content/page titles. As there is only 1 listed in both reports for 'Other URLs' Hopefully my long question, has not confused. many thanks in advance for any help
Technical SEO | | wood1e20 -
Duplicate content by php id,page=... problem
Hi dear friends! How can i resolve this duplicate problem with edit the php code file? My trouble is google find that : http://vietnamfoodtour.com/?mod=booking&act=send_booking&ID=38 and http://vietnamfoodtour.com/.....booking.html are different page, but they are one but google indexed both of them. And the Duplcate content is raised 😞 how can i notice to google that they are one?
Technical SEO | | magician0 -
Over 700+ duplicate content pages -- help!
I just signed up for SEO Moz pro for my site. The initial report came back with over 700+ duplicate content pages. My problem is that while I can see why some of the content is duplicated on some of the pages I have no idea why it's coming back as duplicated. Is there a tutorial for a novie on how to read the duplicate content report and what steps to take? It's an e-commerce website and there is some repetitive content on all the product pages like our "satisfaction guaranteed" text and the fabric material... and not much other text. There's not a unique product description because an image speaks for itself. Could this be causing the problem? I have lots of URLs with over 50+ duplicates. Thx for any help.
Technical SEO | | Santaur0 -
Duplicate content problem from an index.php file
Hi One of my sites is flagging a duplicate content problem which is affecting the search rankings. The duplicate problem is caused by http://www.mydomain.com/index.php which has a page rank of 26 How can I sort the duplicate content problem, as the main page should just be http://www.mydomain.com which has a page rank of 42 and is the stronger page with stronger links etc Many Thanks
Technical SEO | | ocelot0 -
Squarespace Duplicate Content Issues
My site is built through squarespace and when I ran the campaign in SEOmoz...its come up with all these errors saying duplicate content and duplicate page title for my blog portion. I've heard that canonical tags help with this but with squarespace its hard to add code to page level...only site wide is possible. Was curious if there's someone experienced in squarespace and SEO out there that can give some suggestions on how to resolve this problem? thanks
Technical SEO | | cmjolley0 -
Duplicate Content Issues - Should I build a new site?
I'm currently working on a site which is built using Zen Cart. The client also has another version which has the same products on it. The product descriptions and the vast majority of the text has been re-written. I've used the duplicate content tool and these are the results: HTML fingerprint: 0000a7ee1f07a131 0000a7ec1f07a931 92.31% Total HTML similarity: 76.33% Standard text similarity: 66.72% Smart text similarity: 45.81% Total text similarity 56.27% I considered using a different eCommerce system like Magento or Volusion. So I had a look at a few templates, chose one and then used the tool again and got the following: HTML fingerprint: 0000a7e41b012111 0000a7ec1f07a931 72.00% Total HTML similarity: 64.65% Standard text similarity: 11.69% Smart text similarity: 17.90% Total text similarity 14.80% Do you think its worth doing this? thanks Dan
Technical SEO | | TheYeti0