How to get rid of duplicate content
-
I have duplicate content that looks like http://deceptionbytes.com/component/mailto/?tmpl=component&link=932fea0640143bf08fe157d3570792a56dcc1284 - however I have 50 of these all with different numbers on the end. Does this affect the search engine optimization and how can I disallow this in my robots.txt file?
-
Hi Michelle,
In addition to what Alan said, I might take a couple of more actions on this page. Since it sounds like you're a beginner, don't worry if you don't understand all this stuff, but I wanted to include it for anyone else reading this question.
I've also tried to include links to relevant sources where you can learn about each topic addressed.
1. Yes, add the canonical. This basically tells search engines that even those these pages all have different URL addresses, they are meant to be the same page.
http://www.seomoz.org/learn-seo/canonicalization
2. The "numbers at the end" are called URL parameters, and there is a setting in Google Webmaster Tools that you can use to tell them to ignore parameter settings. This is advanced stuff, and Google does a pretty good job these days of figuring this stuff out on their own, so it's best not to adjust these settings unless you're comfortable doing so.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1235687
3. Honestly, there's no reason for this page to appear in search results, or waste search engine resources crawling the page. So, if possible, I'd add a meta robots "NO INDEX, FOLLOW" tag to the head element of the HTML.
http://www.robotstxt.org/meta.html
4. Additionally, I'd slap a nofollow on any links pointing these pages, and/or block crawling of this page via robots.txt, because there is no reason to waste your search engine crawl allowance on these pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=96569
5. And finally, I think it's perfectly legitimate to block these thowaway pages using robots.txt. Alan has good point about link juice - it's usually best not to block pages using robots.txt, but in this particular case I think it would be fine.
http://www.seomoz.org/learn-seo/robotstxt
Honestly, addressing all of these issues in this particular case probably won't make a huge impact on your SEO. But as you can see, there are multiple ways of dealing with the problem that touch on many of the fundamental techniques of Search Engine Optimization.
Finally, to answer your question in a straitforward answer, to dissallow this directory in robots.txt, your file would look something like this.
User-agent: *
Disallow: *mailto/Which will block anything in the /mailto/ directory.
Hope this helps. Best of luck with your SEO!
-
Michelle,
I agree with Alan, if your confused with the Rel=cannonical tag, I recommend your read the SEOmoz beginners guide to seo. More specifically this page: http://www.seomoz.org/beginners-guide-to-seo/search-engine-tools-and-services, the whole book/guide goes through a lot of best practices, and even advanced SEOs can kind of use this guide as a "bible"
Hope this helps
-
100% best move forward
-
Link juice flows though links only if the linked page is in the index, if not then the link juice just goines up in smoke, it is wasted, so you dont want to link to a page that is not indexed.
A canonical tag tells the search engine to give the credit to teh page in the canonical tag.
so with a canonical tag pointing to page.html from page.html?id5 with tell the search engine they are the same page, and to give credit to teh canonical.
this is how to createa canonical tag
http://mycanonialpage.com/page.html/" /> -
link juice leaks?? canonical tag? ummmmm I thought I was farily smart until just this minute- I have NO idea what you are talking about
-
dont use robots.txt
You will cause link juice leaks for each link that points to a page behind a rebots.txt exclude
The best thing to do is use a canonical tag pointing to http://deceptionbytes.com/component/mailto
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content Issues: Duplicate Content
Hi there
Technical SEO | | Kingagogomarketing
Moz flagged the following content issues, the page has duplicate content and missing canonical tags.
What is the best solution to do? Industrial Flooring » IRL Group Ltd
https://irlgroup.co.uk/industrial-flooring/ Industrial Flooring » IRL Group Ltd
https://irlgroup.co.uk/index.php/industrial-flooring Industrial Flooring » IRL Group Ltd
https://irlgroup.co.uk/index.php/industrial-flooring/0 -
Duplicate content issue
Hi, A client of ours has one URL for the moment (https://aalst.mobilepoint.be/) and wants to create a second one with exactly the same content (https://deinze.mobilepoint.be/). Will that mean Google punishes the second one because of duplicate content? What are the recommendations?
Technical SEO | | conversal0 -
Woocommerce Duplicate Page Content Issue
Hi, I'm receiving a duplicate content error. It says that this url: https://kidsinministry.org/childrens-ministry-curriculum/?option=com_content&task=view&id=20&Itemid=41 is a duplicate of this: http://kidsinministry.org/childrens-ministry-curriculum I'm using wordpress, woocommerce, and not really sure how to even address this. I tried adding this to .htaccess but it didn't redirect the url: 301 Redirects Redirect 301 https://kidsinministry.org/childrens-ministry-curriculum/?option=com_content&task=view&id=20&Itemid=41 http://kidsinministry.org/childrens-ministry-curriculum/ Anyone have any ideas? Thanks!
Technical SEO | | a_toohill0 -
How to deal with duplicated content on product pages?
Hi, I have a webshop with products with different sizes and colours. For each item I have a different URL, with almost the same content (title tag, product descriptions, etc). In order to prevent duplicated content I'am wondering what is the best way to solve this problem, keeping in mind: -Impossible to create one page/URL for each product with filters on colour and size -Impossible to rewrite the product descriptions in order to be unique I'm considering the option to canonicolize the rest of de colours/size variations, but the disadvantage is that in case the product is not in stock it disappears from the website. Looking forward to your opinions and solutions. Jeroen
Technical SEO | | Digital-DMG0 -
Duplicate Content within Site
I'm very new here... been reading a lot about Panda and duplicate content. I have a main website and a mobile site (same domain - m.domain.com). I've copied the same text over to those other web pages. Is that okay? Or is that considered duplicate content?
Technical SEO | | CalicoKitty20000 -
Does turning website content into PDFs for document sharing sites cause duplicate content?
Website content is 9 tutorials published to unique urls with a contents page linking to each lesson. If I make a PDF version for distribution of document sharing websites, will it create a duplicate content issue? The objective is to get a half decent link, traffic to supplementary opt-in downloads.
Technical SEO | | designquotes0 -
301ed Pages Still Showing as Duplicate Content in GWMT
I thank anyone reading this for their consideration and time. We are a large site with millions of URLs for our product pages. We are also a textbook company, so by nature, our products have two separate ISBNs: a 10 digit and a 13 digit form. Thus, every one of our books has at least two pages (10 digit and 13 digit ISBN page). My issue is that we have established a 301 for all the 10 digit URLs so they automatically redirect to the 13 digit page. This fix has been in place for months. However, Google still reports that they are detecting thousands of pages with duplicate title and meta tags. Google is referring to these page URLs that I already have 301ed to the canonical version many months ago! Is there anything that I can do to fix this issue? I don't understand what I am doing wrong. Example:
Technical SEO | | dfinn
http://www.bookbyte.com/product.aspx?isbn=9780321676672
http://www.bookbyte.com/product.aspx?isbn=032167667X As you can see the 10 digit ISBN page 301s to 13 digit canonical version. Google reports that they have detected duplicate title and meta tags between the two pages and there are thousands of these duplicate pages listed. To add some further context: The ISBN is just a parameter that allows us to provide content when someone searches for a product with the 10 or 13 digit ISBN. The 13 digit version of the page is the only physical page that exists, the 10 digit is only a part of the virtual URL structure of the website. This is why I cannot simply change the title and meta tags of the 10 digit pages because they only exist in the sense that the URL redirects to the 13 digit version. Also, we submit a sitemap every day of all the 13 digit pages so Google knows exactly what our physical URL structure is. I have submitted this question to GWMT forums and received no replies.0 -
Duplicate Content Home Page
Hello, I am getting Duplicate Content warning from SEOMoz for my home page: http://www.teacherprose.com http://www.teacherprose.com/index html I tried code below in .htaccess: redirect 301 /index.html http://www.teacherprose.com This caused error "too many re-directs" in browser Any thoughts? Thank You, Eric
Technical SEO | | monthelie10