Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Selectively 301 redirects
Hi there: We are developing a pretty typical 301 redirection strategy. We basically are moving blog posts from a former sub-domain to the top level of our new designed site. We've pulled a site crawl of the old sub-domain and want to make sure we redirect any posts with a significant backlink profile to their current counterparts. Most other posts are just going to be redirected to the main 'front door' of our new blog. Is there a way to selectively redirect a certain number of posts and then 'globally' redirect everything else to a single URL? I would assume this would be a pretty common task, but can't find an easy way to do what we want to do.
Intermediate & Advanced SEO | | Daaveey0 -
How and When Should I use Canonical Url Tags?
Pretty new to the SEO universe. But I have not used any canonical tags, just because there is not definitive source explaining exactly when and why you should use them??? Am I the only one who feels this way?
Intermediate & Advanced SEO | | greenrushdaily0 -
Pagination and View All Pages Question. We currently don't have a canonical tag pointing to View all as I don't believe it's a good user experience so how best we deal with this.
Hello All, I have an eCommerce site and have implemented the use rel="prev" and rel="next" for Page Pagination. However, we also have a View All which shows all the products but we currently don't have a canonical tag pointing to this as I don't believe showing the user a page with shed loads of products on it is actually a good user experience so we havent done anything with this page. I have a sample url from one of our categories which may help - http://goo.gl/9LPDOZ This is obviously causing me duplication issues as well . Also , the main category pages has historically been the pages which ranks better as opposed to Page 2, Page 3 etc etc. I am wondering what I should do about the View All Page and has anyone else had this same issue and how did they deal with it. Do we just get rid of the View All even though Google says it prefers you to have it ? I also want to concentrate my link juice on the main category pages as opposed being diluted between all my paginated pages ? - Does anyone have any tips on how to best do this and have you seen any ranking improvement from this ? Any ideas greatly appreciated. thanks Peter
Intermediate & Advanced SEO | | PeteC120 -
Confusion about forums and canonical links
Like many people, I get a lot of alerts about duplicate content, etc. I also don't know if I am hurting my domain authority because of the forum. It is a pretty active forum, so it is important to the site. So my question is, right now there could be 50 pages like this <domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/
Intermediate & Advanced SEO | | BrickPicker
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-1
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-2
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-3
all the way to:
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/page-50</domain></domain></domain></domain></domain> So right now the rel canonical links are set up just like above, including the page numbers. I am not sure if that is the best way or not. I really thought that all the of links for that topic should be
<domain>/forum/index.php/topic/6043-new-modular-parisian-restaurant-10243-is-here/ that way it would passing "juice" to the main topic/link. </domain> I do have other links setup for:
link rel='next',link rel='up',link rel='last' Overall is this correct, or is there a better way to do it?0 -
Duplicate Page Content - Shopify
Moz reports that there are 1,600+ pages on my site (Sportiqe.com) that qualify as Duplicate Page Content. The website sells licensed apparel, causing shirts to go into multiple categories (ie - LA Lakers shirts would be categorized in three areas: Men's Shirts, LA Lakers Shirts and NBA Shirts)It looks like "tags" are the primary cause behind the duplicate content issues: // Collection Tags_Example: : http://www.sportiqe.com/collections/la-clippers-shirts (Preferred URL): http://www.sportiqe.com/collections/la-clippers-shirts/la-clippers (URL w/ tag): http://sportiqe.com/collections/la-clippers-shirts/la-clippers (URL w/ tag, w/o the www.): http://sportiqe.com/collections/all-products/clippers (Different collection, w/ tag and same content)// Blog Tags_Example: : http://www.sportiqe.com/blogs/sportiqe/7902801-dispatch-is-back: http://www.sportiqe.com/blogs/sportiqe/tagged/elias-fundWould it make sense to do 301 redirects for the collection tags and use the Parameter Tool in Webmaster Tools to exclude blog post tags from their crawl? Or, is there a possible solution with the rel=cannonical tag?Appreciate any insight from fellow Shopify users and the Moz community.
Intermediate & Advanced SEO | | farmiloe0 -
Similar page titles but not quite duplicate
Howdy Mozzers, I have a problem with the way Google now tries not to show more than one search result per site on the first page. As in it is a lot harder to be ranked number 1 - 10 twice with different pages. Some of my pages have similar yet different page titles so they use the same first two keywords and then a variable such as '(keyword) (keyword) installations' '(keyword) (keyword) surveys'. Then when I search for '(keyword) (keyword)' they all appear at the start of page two with only ever one of them moving onto the end of page one. Now, it could just be that they are not quite optimised for page 1 but I think it would be more holding back of pages so they don't flood page 1. Any help on this? And also is there a problem with having similar page titles for pages? Cheers
Intermediate & Advanced SEO | | Hughescov0 -
Should we Use rel=canonical in ccTLDs websites
We have multilingual eCommerce websites with some content variations but majority of the content remains the same We have used rel=alternate hreflang on corresponding ccTLDs respective countries. for example on example.com -which is the oldest of these sites- we have used Now should we also use link rel="canonical" href="example.com" on all ccTLDs? What are the advantages and disadvantages of using it?
Intermediate & Advanced SEO | | CyrilWilson0 -
Any experience regarding what % is considered duplicate?
Some sites (including 1 or two I work with) have a legitimate reason to have duplicate content, such as product descriptions. One way to deal with duplicate content is to add other unique content to the page. It would be helpful to have guidelines regarding what percentage of the content on a page should be unique. For example, if you have a page with 1,000 words of duplicate content, how many words of unique content should you add for the page to be considered OK? I realize that a) Google will never reveal this and b) it probably varies a fair bit based on the particular website. However... Does anyone have any experience in this area? (Example: You added 300 words of unique content to all 250 pages on your site, that each had 100 words of duplicate content before, and that worked to improve your rankings.) Any input would be appreciated! Note: Just to be clear, I am NOT talking about "spinning" duplicate content to make it "unique". I am talking about adding unique content to a page that has legitimate duplicate content.
Intermediate & Advanced SEO | | AdamThompson0