How to get rid of duplicate content
-
I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?
-
Hi Michelle,
In addition to what Alan said, I might take a couple of more actions on this page. Since it sounds like you're a beginner, don't worry if you don't understand all this stuff, but I wanted to include it for anyone else reading this question.
I've also tried to include links to relevant sources where you can learn about each topic addressed.
1. Yes, add the canonical. This basically tells search engines that even those these pages all have different URL addresses, they are meant to be the same page.
http://www.seomoz.org/learn-seo/canonicalization
2. The "numbers at the end" are called URL parameters, and there is a setting in Google Webmaster Tools that you can use to tell them to ignore parameter settings. This is advanced stuff, and Google does a pretty good job these days of figuring this stuff out on their own, so it's best not to adjust these settings unless you're comfortable doing so.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
3. Honestly, there's no reason for this page to appear in search results, or waste search engine resources crawling the page. So, if possible, I'd add a meta robots "NO INDEX, FOLLOW" tag to the head element of the HTML.
http://www.robotstxt.org/meta.html
4. Additionally, I'd slap a nofollow on any links pointing these pages, and/or block crawling of this page via robots.txt, because there is no reason to waste your search engine crawl allowance on these pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
5. And finally, I think it's perfectly legitimate to block these thowaway pages using robots.txt. Alan has good point about link juice - it's usually best not to block pages using robots.txt, but in this particular case I think it would be fine.
http://www.seomoz.org/learn-seo/robotstxt
Honestly, addressing all of these issues in this particular case probably won't make a huge impact on your SEO. But as you can see, there are multiple ways of dealing with the problem that touch on many of the fundamental techniques of Search Engine Optimization.
Finally, to answer your question in a straitforward answer, to dissallow this directory in robots.txt, your file would look something like this.
User-agent: *
Disallow: *mailto/Which will block anything in the /mailto/ directory.
Hope this helps. Best of luck with your SEO!
-
Michelle,
I agree with Alan, if your confused with the Rel=cannonical tag, I recommend your read the SEOmoz beginners guide to seo. More specifically this page: http://www.seomoz.org/beginners-guide-to-seo/search-engine-tools-and-services, the whole book/guide goes through a lot of best practices, and even advanced SEOs can kind of use this guide as a "bible"
Hope this helps
-
100% best move forward
-
Link juice flows though links only if the linked page is in the index, if not then the link juice just goines up in smoke, it is wasted, so you dont want to link to a page that is not indexed.
A canonical tag tells the search engine to give the credit to teh page in the canonical tag.
so with a canonical tag pointing to page.html from page.html?id5 with tell the search engine they are the same page, and to give credit to teh canonical.
this is how to createa canonical tag
http://mycanonialpage.com/page.html/" /> -
link juice leaks?? canonical tag? ummmmm I thought I was farily smart until just this minute- I have NO idea what you are talking about
-
dont use robots.txt
You will cause link juice leaks for each link that points to a page behind a rebots.txt exclude
The best thing to do is use a canonical tag pointing to http://deceptionbytes.com/component/mailto
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content issue
Hi, A client of ours has one URL for the moment (https://aalst.mobilepoint.be/) and wants to create a second one with exactly the same content (https://deinze.mobilepoint.be/). Will that mean Google punishes the second one because of duplicate content? What are the recommendations?
Technical SEO | | conversal0 -
Does duplicate content not concern Rand?
Hello all, I'm a new SEOer and I'm currently trying to navigate the layman's minefield that is trying to understand duplicate content issues in as best I can. I'm working on a website at the moment where there's a duplicate content issue with blog archives/categories/tags etc. I was planning to beat this by implementing a noindex meta tag on those pages where there are duplicate content issues. Before I go ahead with this I thought: "Hey, these Moz guys seem to know what they're doing! What would Rand do?" Blogs on the website in question appear in full and in date order relating to the tag/category/what-have-you creating the duplicate content problem. Much like Rand's blog here at Moz - I thought I'd have a look at the source code to see how it was dealt with. My amateur eyes could find nothing to help answer this question: E.g. Both the following URLs appear in SERPs (using site:moz,com and very targeted keywords, but they're there): https://moz.com/rand/does-making-a-website-mobile-friendly-have-a-universally-positive-impact-on-mobile-traffic/ https://moz.com/rand/category/moz/ Both pages have a rel="canonical" pointing to themselves. I can understand why he wouldn't be fussed about the category not ranking, but the blog? Is this not having a negative effect? I'm just a little confused as there are so many conflicting "best practice" tips out there - and now after digging around in the source code on Rand's blog I'm more confused than ever! Any help much appreciated, Thanks
Technical SEO | | sbridle1 -
Duplicate Content in Dot Net Nuke
Our site is built on Dot Net Nuke. SEOmoz shows a very large amount of duplicate content because at the beginning each page got an extension in the following format: www.domain.com/tabid/110/Default.aspx The site additionally exists without the tabid... part. Our web developer says an easy fix with a canonical tag or 301 redirect is not possible. Does anyone have DNN experience and can point us in the right direction? Thanks, Ricarda
Technical SEO | | jsillay0 -
Duplicate content due to credit card testing
I recently launched a site - http://www.footballtriviaquestions.co.uk and the site uses Paypal. In order to test the PayPal functionality I set up a zapto.org domain via a permanent IP service that points directly to the computer I've written the website on. It appears that Google has now indexed the zapto.org website. Will this cause problems to my main website, as the zapto.org website will pretty much contain content that is an exact duplicate of what is held on the main website. I've looked in Google webmaster tools for the main website and it doesn't mention any duplicate content, but I'm currently not in the top 50 ranking for "football trivia questions' on Google despite SEOMoz ranking my home page with an A rating. The page does rank at position 16 in Yahoo and Bing. This seems odd to me, although I do have very few back links pointing to my site. If the duplicate content is likely to be causing me problems what would be the best way to knock the zapto.org results out of Google
Technical SEO | | ipr1010 -
Avoiding Cannibalism and Duplication with content
Hi, For the example I will use a computers e-commerce store... I'm working on creating guides for the store -
Technical SEO | | BeytzNet
How to choose a laptop
How to choose a desktop I believe that each guide will be great on its own and that it answers a specific question (meaning that someone looking for a laptop will search specifically laptop info and the same goes for desktop). This is why I didn't creating a "How to choose a computer" guide. I also want each guide to have all information and not to start sending the user to secondary pages in order to fill in missing info. However, even though there are several details that are different between the laptops and desktops, like importance of weight, screen size etc., a lot of things the checklist (like deciding on how much memory is needed, graphic card, core etc.) are the same. Please advise on how to pursue it. Should I just write two guides and make sure that the same duplicated content ideas are simply written in a different way?0 -
Category URL Duplicate Content
I've recently been hired as the web developer for a company with an existing web site. Their web architecture includes category names in product urls, and of course we have many products in multiple categories thus generating duplicate content. According to the SEOMoz Site Crawl, we have roughly 1600 pages of duplicate content, I expect primarily from this issue. This is out of roughly 3600 pages crawled. My questions are: 1. Fixing this for the long term will obviously mean restructuring the URLs for the site. Is this worthwhile and what will the ramifications be of performing such a move? 2. How can I determine the level and extent of the effects of this duplicated content? 3. Is it possible the best course of action is to do nothing? The site has many, many other issues, and I'm not sure how highly to prioritize this problem. In addition, the IT man is highly doubtful this is causing an SEO issue, and I'm going to need to be able to back up any action I request. I do feel I will need to strongly justify any possible risks this level of site change could cause. Thanks in advance, and please let me know if any more information is needed.
Technical SEO | | MagnetsUSA0 -
API for testing duplicate content
Does anyone know a service or API or php lib to compare two (or more) pages and to return their similiarity (Level-3-Shingles). API would be greatly prefered.
Technical SEO | | Sebes0 -
Duplicate Page Content and Titles
A few weeks ago my error count went up for Duplicate Page Content and Titles. 4 errors in all. A week later the errors were gone... But now they are back. I made changes to the Webconfig over a month ago but nothing since. SEOmoz is telling me the duplicate content is this http://www.antiquebanknotes.com/ and http://www.antiquebanknotes.com Thanks for any advise! This is the relevant web.config. <rewrite><rules><rule name="CanonicalHostNameRule1"><match url="(.*)"><conditions><add input="{HTTP_HOST}" pattern="^www.antiquebanknotes.com$" negate="true"></add></conditions>
Technical SEO | | Banknotes
<action type="Redirect" url="<a href=" http:="" www.antiquebanknotes.com="" {r:1"="">http://www.antiquebanknotes.com/{R:1}" />
</action></match></rule>
<rule name="Default Page" enabled="true" stopprocessing="true"><match url="^default.aspx$"><conditions logicalgrouping="MatchAll"><add input="{REQUEST_METHOD}" pattern="GET"></add></conditions>
<action type="Redirect" url="/"></action></match></rule></rules></rewrite>0