Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Geo-Targeted Sub-Domains & Duplicate Content/Canonical
For background the sub domain structure here is inherited and commited to due to tech restrictions with some of our platforms. The brand I work with is splitting out their global site into regional sub sites (not too relevant but this is in order to display seasonal product in different hemispheres and to link to stores specific to the region). All sub-domains except EU will be geo-targeted to their relevant country. Regions and sub domains for reference: AU - Australia CA - Canada CH - Switzeraland EU - All Euro zone countries NZ - New Zealand US - United States This will be done with Wordpress multisite. The set up allows to publish content on one 'master' sub site and then decide which other sub sites to 'broadcast' to. Some content is specific to a sub-domain/region so no issue with duplicate and can set the sub-site version as canonical. However some content will appear on all sub-domains. au.example.com/awesome-content/ nz.example.com/awesome-content/ Now first question is since these domains are geo-targeted should I just have them all canonical to the version on that sub-domain? eg Or should I still signal the duplicate content with one canonical version? Essentially the top level example.com exists as a site only for publishing purposes - if a user lands on the top level example.com/awesome-content/ they are given a pop up to select region and redirected to the relevant sub-domain version. So I'm also unsure whether I want that content indexed at all?? I could make the top level example.com versions of all content be the canonical that all others point to eg. and rely on geo-targeting to have the right links show in the right search locations. I hope that's kind of clear?? Obviously I find it confusing and therefore hard to relay! Any feedback at all gratefully received. Cheers, Steve
Intermediate & Advanced SEO | | SteveHoney0 -
About duplicate content
We have to products: - loan for a new car
Intermediate & Advanced SEO | | KBC
- load for a second hand car Except for title tag, meta desc and H1, the content is of course very similmar. Are these pages considered as duplicate content? https://new.kbc.be/product/lenen/voertuig/autolening-tweedehands-auto.html
https://new.kbc.be/product/lenen/voertuig/autolening-nieuwe-auto.html thanks for the advice,0 -
Appropriate use of rel canonical
Hey Guys,I'm a bit stuck. My on-page grade indicated the following two issues and I need to find how how to fix both issues.If you have a solution, could you please let me know how to address these issues? It's all a bit intimidating at the moment!!Thank you so much..****************************************************************************************************************************************Appropriate Use of Rel Canonical If the canonical tag is pointing to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. Make sure you're targeting the right page (if this isn't it, you can reset the target above) and then change the canonical tag to reference that URL. Recommendation: We check to make sure that IF you use canonical URL tags, it points to the right page. If the canonical tag points to a different URL, engines will not count this page as the reference resource and thus, it won't have an opportunity to rank. If you've not made this page the rel=canonical target, change the reference to this URL. NOTE: For pages not employing canonical URL tags, this factor does not apply. No More Than One Canonical URL Tag The canonical URL tag is meant to be employed only a single time on an individual URL (much like the title element or meta description). To ensure the search engines properly parse the canonical source, employ only a single version of this tag. Recommendation: Remove all but a single canonical URL tag
Intermediate & Advanced SEO | | StoryScout1 -
Duplicate Content For E-commerce
On our E-commerce site, we have multiple stores. Products are shown on our multiple stores which has created a duplicate content problem. Basically if we list a product say a shoe,that listing will show up on our multiple stores I assumed the solution would be to redirect the pages, use non follow tags or to use the rel=canonical tag. Are there any other options for me to use. I think my best bet is to use a mixture of 301 redirects and canonical tags. What do you recommend. I have 5000+ pages of duplicate content so the problem is big. Thanks in advance for your help!
Intermediate & Advanced SEO | | pinksgreens0 -
HTTP Header Canonical Tags
I want to be able to add canonical tags to http headers of individual URL's using .htacess, but I can't find any examples for how to do this. The only example I found was when specifying a file: http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers N.B. It's not possible to add regular canonical tags to the of my pages as they're dynamically generated. I was trying to add the following to the .htaccess in order to add a canonical tag in the header of the page http://frugal-father.com/is-finance-in-the-uk-too-london-centric/, but I've checked with Live HTTP headers and the canonical line isn't showing : <files "is-finance-in-the-uk-too-london-centric="" "="">Header add Link "<http: frugal-father.com="">; rel="canonical"'</http:></files> Any ideas?
Intermediate & Advanced SEO | | AndrewAkesson0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Canonical tag vs 301
What is the reason that 301 is preferred and not rel canonical tag when it comes to implementing redirect. Page rank will be lost in both cases. So, why prefer one over the other ?
Intermediate & Advanced SEO | | seoug_20050 -
ECommerce products duplicate content issues - is rel="canonical" the answer?
Howdy, I work on a fairly large eCommerce site, shop.confetti.co.uk. Our CMS doesn't allow us to have 1 product with multiple colour and size options so we created individual product pages for each product variation. This of course means that we have duplicate content issues. The layout of the shop works like this; there is a product group page (here is our disposable camera group) and individual product pages are below. We also use a Google shopping feed. I'm sure we're being penalised as so many of the products on our site are duplicated so, my question is this - is rel="canonical" the best way to stop being penalised and how can I implement it? If not, are there any better suggestions? Also, we have targeted some long-tail keywords in some of the product descriptions so will using rel-canonical effect this or the Google shopping feed? I'd love to hear experiences from people who have been through similar things and what the outcome was in terms of ranking/ROI. Thanks in advance.
Intermediate & Advanced SEO | | Confetti_Wedding0