Masses (5,168 issues found) of Duplicate content.
-
Hi Mozzers,
I have a site that has returned 5,168 issues with duplicate content.
Where would you start?
I started sorting via High page Authority first the highest being 28 all the way down to 1. I did want to use the rel=canonical tag as the site has many redirects already.
The duplicates are caused by various category and cross category pages and search results such as ....page/1?show=2&sort=rand.
I was thinking of going down the lines of a URL rewrite and changing the search anyway. Is it work redirecting everything in terms of results versus the effort of changing all the 5,168 issues?
Thanks
sm
-
Hi Guys,
Thanks for the responses I'm going to have a look at the issue again, with your suggestions in mind. And I'll keep you posted. Thanks again.
-
Don't look at individual URLs - at the scale of 5K plus, look at your site architecture and what kind of variants you're creating. For example, if you know that the show= and sort= parameter are a possible issue, you could go to Google and enter something like:
site:example.com inurl:show=
(warning: it will return pages with the word "show" in the URL, like "example.com/show-times" - not usually an issue, but it can be on rare occasion).
That'll give you a sense of how many cases that one parameter is creating. Odds are, you'll find a couple that are causing 500+ of the 5K duplicates, so start with those.
Search pagination is very tricky - you could canonicalize to "View All" as Chris Hill said, you could NOINDEX pages 2+, or you could try Google's new (but very complicated way):
http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
Problem is, that doesn't work on Bing and it's pretty easy to mess up.
The rel-canonical tag can scoop up sorts pretty well. You can also tell Google in Google Webmaster Tools what those parameters do, and whether to index them, but I've had mixed luck with that. If you're not having any serious problems, GWT is easy and worth a shot.
-
Have a look at your pagination too. If you've not got a 'show all' link it might be worth putting one in and making that the canonical. Should eliminate some of your duplicate content issues.
-
Last I came accross such an issue I mostly started with making the 'easy' changes that reduced the number the most.
In the last case, it was implimenting a 301 to the www version of the site (cutting the errors in half) and putting a canonical on one search page.
This got the number down to the point where it was easyer to make decisions on 'Is it worth making friendlyer urls' and discover more intresting places dup content was being generated.
It's one of these things I would always aim for 0 where I can. It usualy means that the url or site structure can be improved sugnificantly, or it's such an easy fix that it's hard to justify not doing.
-
If it really is a URL issue then you should just be able to easily canonical the root pages and the rest should sort itself out. Start there and let the next spidering tell you where you stand.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Crawl -> Duplicate Page Content -> Same pages showing up with duplicates that are not
These, for example: | https://im.tapclicks.com/signup.php/?utm_campaign=july15&utm_medium=organic&utm_source=blog | 1 | 2 | 29 | 2 | 200 |
Technical SEO | | writezach
| https://im.tapclicks.com/signup.php?_ga=1.145821812.1573134750.1440742418 | 1 | 1 | 25 | 2 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=blog&utm_campaign=brightpod-article | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=tapclicks&utm_medium=marketplace&utm_campaign=homepage | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?utm_source=blog&utm_campaign=first-3-must-watch-videos | 1 | 119 | 40 | 4 | 200 |
| https://im.tapclicks.com/signup.php?_ga=1.159789566.2132270851.1418408142 | 1 | 5 | 31 | 2 | 200 |
| https://im.tapclicks.com/signup.php/?utm_source=vocus&utm_medium=PR&utm_campaign=52release | Any suggestions/directions for fixing or should I just disregard this "High Priority" moz issue? Thank you!0 -
Duplicate Content Brainstorming
Hi, New here in the SEO world. Excellent resources here. We have an ecommerce website that sells presentation templates. Today our templates come in 3 flavours - for PowerPoint, for Keynote and both - called Presentation Templates. So we've ended up with 3 URLS with similar content. Same screenshots, similar description.. Example: https://www.improvepresentation.com/keynote-templates/social-media-keynote-template https://www.improvepresentation.com/powerpoint-templates/social-media-powerpoint-template https://www.improvepresentation.com/presentation-templates/social-media-presentation-template I know what you're thinking. Why not make a website with a template and give 3 download options right? But what about https://www.improvepresentation.com/powerpoint-templates/ https://www.improvepresentation.com/keynote-templates/ These are powerfull URL's in my opinion taking into account that the strongest keyword in our field is "powerpoint templates" How would you solve this "problem" or maybe there is no problem at all.
Technical SEO | | slidescamp0 -
Duplicate Content within Site
I'm very new here... been reading a lot about Panda and duplicate content. I have a main website and a mobile site (same domain - m.domain.com). I've copied the same text over to those other web pages. Is that okay? Or is that considered duplicate content?
Technical SEO | | CalicoKitty20000 -
174 Duplicate Content Errors
How do I go about fixing these errors? There are all related to my tags. Thank you in advance for any help! Lisa
Technical SEO | | lisarein0 -
Avoiding duplicate content on product pages?
Hi, I'm creating a bunch of product pages for courses for a university and I'm concerned about duplicate content penalties. While the page names are different and some of the test is different, much of the text is the same between pairs of pages. I.e. a BA and an MA in a particular subject (say 'hairdressing' will have the same subject descriptions, school introduction paragraph, industry overview paragraph etc. 1. Is this a problem? In a site with 100 pages, if sets of 2 pages have about 50% identical content... 2. If it is a problem, is there anything I can do, other than rewrite the text? 3. From a search perspective, would both pages show up in search results in searches related to 'hairdressing courses' 'study hairdressing' etc? Thanks!
Technical SEO | | AISFM0 -
Duplicate content issue. Delete index.html and replace with www.?
I have a duplicate content issue. On my site the home button goes to the index.html and not the www. If I change it to the www will it impact my SERPS? I don't think anyone links to the index.html.
Technical SEO | | bronxpad1 -
I have a ton of "duplicated content", "duplicated titles" in my website, solutions?
hi and thanks in advance, I have a Jomsocial site with 1000 users it is highly customized and as a result of the customization we did some of the pages have 5 or more different types of URLS pointing to the same page. Google has indexed 16.000 links already and the cowling report show a lot of duplicated content. this links are important for some of the functionality and are dynamically created and will continue growing, my developers offered my to create rules in robots file so a big part of this links don't get indexed but Google webmaster tools post says the following: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools." here is an example of the links: | | http://anxietysocialnet.com/profile/edit-profile/salocharly http://anxietysocialnet.com/salocharly/profile http://anxietysocialnet.com/profile/preferences/salocharly http://anxietysocialnet.com/profile/salocharly http://anxietysocialnet.com/profile/privacy/salocharly http://anxietysocialnet.com/profile/edit-details/salocharly http://anxietysocialnet.com/profile/change-profile-picture/salocharly | | so the question is, is this really that bad?? what are my options? it is really a good solution to set rules in robots so big chunks of the site don't get indexed? is there any other way i can resolve this? Thanks again! Salo
Technical SEO | | Salocharly0 -
Is 100% duplicate content always duplicate?
Bit of a strange question here that would be keen on getting the opinions of others on. Let's say we have a web page which is 1000 lines line, pulling content from 5 websites (the content itself is duplicate, say rss headlines, for example). Obviously any content on it's own will be viewed by Google as being duplicate and so will suffer for it. However, given one of the ways duplicate content is considered is a page being x% the same as another page, be it your own site or someone elses. In the case of our duplicate page, while 100% of the content is duplicate, the page is no more than 20% identical to another page so would it technically be picked up as duplicate. Hope that makes sense? My reason for asking is I want to pull latest tweets, news and rss from leading sites onto a site I am developing. Obviously the site will have it's own content too but also want to pull in external.
Technical SEO | | Grumpy_Carl0