Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate without user-selected canonical excluded
-
We have pdf files uploaded in the media of wordpress and used in our website. As these pdfs are duplicate content of the original publishers, we have marked links to these pdf urls as nofollow. These pages are also disallowed in robots.txt
Now, Google Search Console has shown these pages Excluded as "Duplicate without user-selected canonical"
As it comes out we cannot use canonical tag with pdf pages so as to point to the original pdf source
If we embed a pdf viewer in our website and fetch the pdfs by passing the urls of the original publisher, would the pdfs be still read as text by google and again create duplicate content issue? Another thing, when the pdf expires and is removed, it would lead to 404 error.
If we direct our users to the third party website, then it would add up to our bounce rate.
What should be the appropriate way to handle duplicate pdfs?
Thanks
-
From what I have read, so much of the web is duplicate content so it really doesn't matter if the pdf is on other sites; let google figure it out. (example, every car brand dealer has a pdf of the same car model brochure on their dealer site) No big deal. Visitors will be landing on your site from other search relevance - the duplicate pdf doesn't matter. Just my take. Adrian
-
Sorry, I mean pdf files only
-
As the pdf pages are marked as a duplicate and not the pdf files, then you should check which page has duplicate content compared to it, and take the needed measures (canonical tags or 301 redirect) form the page with less rank to the page with more rank. Alternatively, you can edit the content so that it isn't anymore duplicate.
If I had a link to the site and duplicate pages, I would be able to give you a more detailed response.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com -
Hello Daniel
The pdfs are duplicates from another site.
The thing is that we have already disallowed the pdfs in the robots.txt file.
Now, what happened is this - We have a set of pages (let's call them content pages) which we had disallowed in the robots file as they had thin content. Those pages have links to their respective third party pdfs, which have been marked as nofollow. The pdfs are also disallowed in the robots file.
Few days back, we improved our content pages and removed them from robots file so that they can be indexed. Pdfs are still disallowed. Despite being disallowed, we have come across this issue with the pdf pages as "Duplicate without user-selected canonical."
I hope I make myself clear. Any insights now please.
-
If the pdfs are duplicate within your own site, then the best solution would be for you to link to the same document from different sources. Then you can delete the duplicated documents and 301 redirect them to the original.
If the pdfs are duplicate from another site, then disallowing them on robots.txt will stop them from being marked as a duplicate, as the crawler will not be able to access them at all. It will just take some time for them to be updated on google search console.
If however, you want to add canonical tags to the pdf documents (or other non-HTML documents), you can add it to the HTTP header through the .htaccess file. You can find a tutorial on how to do that in this article.
Daniel Rika - Dalerio Consulting
https://dalerioconsulting.com/
info@dalerioconsulting.com
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content
Let's say a blog is publishing original content. Now let's say a second blog steals that original content via bot and publishes it as it's own. Now further assume the original blog doesn't notice this for several years. How much damage could this do to blog A for Google results? Any opinions?
Intermediate & Advanced SEO | | CYNOT0 -
Is Google ignoring my canonicals?
Hi, We have rel=canonical set up on our ecommerce site but Google is still indexing pages that have rel=canonical. For example, http://www.britishbraces.co.uk/braces/novelty.html?colour=7883&p=3&size=599 http://www.britishbraces.co.uk/braces/novelty.html?p=4&size=599 http://www.britishbraces.co.uk/braces/children.html?colour=7886&mode=list These are all indexed but all have rel=canonical implemented. Can anyone explain why this has happened?
Intermediate & Advanced SEO | | HappyJackJr0 -
Duplicated Content with Index.php
Good Afternoon, My website uses Joomla CMS and has the htaccess rewrite code enabled to ensure the use of search engine friendly URLs (SEF's). While browsing the crawl diagnostics I have found that Moz considers the /index.php URL a duplicate to our root. I will always under the impression that the htaccess rewrite took care of that issue and obviously I would like to address it. I attempted to create a 301 redirect from the index.php URL to the root but ran into an issue when attempting to login to the admin portion of the website as the redirect sent me back to the homepage. I was curious if anyone had advice for handling the index.php duplication issue, specifically with Joomla. Additionally, I have confirmed that in Google Webmasters, under URL parameters, the index.php parameter is set as 'Representative URL'.
Intermediate & Advanced SEO | | BrandonEML0 -
Wildcard Redirects & Canonical Tags
I have an interesting situation. Current URLs Example1: www.domain.com/red-widgets-cid-1234.html
Intermediate & Advanced SEO | | NakulGoyal
www.domain.com/red-widgets-cid-1234-1.html
www.domain.com/red-widgets-cid-1234-1-1.html Canonical on All Above URLs:
www.domain.com/red-widgets-cid-1234.html New URL:
www.domain.com/red-widgets-cid-4567.html Current URLs Example2: www.domain.com/red-widgets-cid-1234+10.html
www.domain.com/red-widgets-cid-1234+10-1.html
www.domain.com/red-widgets-cid-1234+10-1-1.html Canonical on All Above URLs:
www.domain.com/red-widgets-cid-1234+10.html New URL:
www.domain.com/red-widgets-cid-6789.html Current URLs Example3: www.domain.com/red-widgets-cid-1234+10+5.html
www.domain.com/red-widgets-cid-1234+10+5-1.html
www.domain.com/red-widgets-cid-1234+10+5-1-1.html Canonical on All Above URLs:
www.domain.com/red-widgets-cid-1234+10+5.html New URL:
www.domain.com/american-red-widgets-cid-6789+5.html I want to make sure all variations of the above URL redirect to the new URLs. However, as you see in Example 3, we are dealing with variables that are passed on. (+5 in this case). Question 1: What wildcard 301 redirect / regular expression can I use to tackle these ? Question 2: If we redirect www.domain.com/red-widgets-cid-1234+10+5.html to www.domain.com/red-widgets-cid-6789+5.html and www.domain.com/red-widgets-cid-6789+5.html contains the canonical tag www.domain.com/american-red-widgets-cid-6789+5.html, any concerns or red flags here ?0 -
Redirect 301 or Canonical.
Hello all, I have a page with a long post title and url path name (more than 70 caracters and 115). This page has many visits but I am changing the SEO website structure according to SEOMOz and forums guidelines so: I WILL CREATE A DUPLICATE PAGE WITH THE SAME INFO. This issue has been marked as an issue in the SEO tools, for long names>70 and url path names>115 My question is which option should I use and you would recommend me? 1. OPTION 1: Ideally I would like to keep the old post, so I should use the canonical tag, but my main concern is if the search engines in terms of SEO, even the canonical has been done, will penalise my SEO as there is still a post with bad SEO optimising, or if this is not the case because I already used the canonical. 2. OPTION 2: Eliminate the post and redirection 301 to the new page to keep the juice. I would prefer option 1, as I keep both post and page, but only if searchengines do not penalise my SEO as they detect a long post name and url path name. Thank you verty much, Antonio
Intermediate & Advanced SEO | | aalcocer20030 -
Rel canonical and duplicate subdomains
Hi, I'm working with a site that has multiple sub domains of entirely duplicate content. So, the production level site that visitors see is (for made-up illustrative example): 123abc456.edu Then, there are sub domains which are used by different developers to work on their own changes to the production site, before those changes are pushed to production: Larry.123abc456.edu Moe.123abc456.edu Curly.123abc456.edu Google ends up indexing these duplicate sub domains, which is of course not good. If we add a canonical tag to the head section of the production page (and therefor all of the duplicate sub domains) will that cause some kind of problem... having a canonical tag on a page pointing to itself? Is it okay to have a canonical tag on a page pointing to that same page? To complete the example... In this example, where our production page is 123abc456.edu, our canonical tag on all pages (this page and therefor the duplicate subdomains) would be: Is that going to be okay and fix this without causing some new problem of a canonical tag pointing to the page it's on? Thanks!
Intermediate & Advanced SEO | | 945010 -
Duplicate content
I have just read http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world and I would like to know which option is the best fit for my case. I have the website http://www.hotelelgreco.gr and every image in image library http://www.hotelelgreco.gr/image-library.aspx has a different url but is considered duplicate with others of the library. Please suggest me what should i do.
Intermediate & Advanced SEO | | socrateskirtsios0 -
Duplicate Content on Blog
I have a blog I'm setting up. I would like to have a mini-about block set up on every page that gives very brief information about me and my blog, as well as a few links to the rest of the site and some social sharing options. I worry that this will get flagged as duplicate content because a significant amount of my pages will contain the same information at the top of the page, front and center. Is there anything I can do to address this? Is it as much of a concern as I am making it? Should I work on finding some javascript/ajax method for loading that content into the page dynamically only for normal browser pageviews? Any thoughts or help would be great.
Intermediate & Advanced SEO | | grayloon0