PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to deal with this duplicate content
Hello our websites offers prayer times in the US and UK. The problem is that we have nearby towns where the prayer times are the same and the pages (exp : https://prayer-times.us/prayer-times-lake-michigan-12258-en and https://prayer-times.us/prayer-times-lake-12147-en) are in duplicate . Same issue for this page https://prayer-time.uk/prayer-times-wallsend-411-en How can we solve this problem
On-Page Optimization | | Zakirou0 -
How to fix thin content issue?
Hello! I've checked my website via Moz and received "thin content" issue: "Your page is considered to have "thin content" if it has less than 50 words" But I definitely know that we have 5 text blocks with unique content, each block consist of more than 50 words. Do you have any ideas what may cause this issue? Thanks in advance, Yana
On-Page Optimization | | yanamazault0 -
Duplicate Content with ?Page ID's in WordPress
Hi there, I'm trying to figure out the best way to solve a duplicate content problem that I have due to Page ID's that WordPress automatically assigns to pages. I know that in order for me to resolve this I have to use canonical urls but the problem for me is I can't figure out the URL structure. Moz is showing me thousands of duplicate content errors that are mostly related to Page IDs For example, this is how a page's url should look like on my site Moz is telling me there are 50 duplicate content errors for this page. The page ID for this page is 82 so the duplicate content errors appear as follows and so on. For 47 more pages. The problem repeats itself with other pages as well. My permalinks are set to "Post Name" so I know that's not an issue. What can I do to resolve this? How can I use canonical URLs to solve this problem. Any help will be greatly appreciated.
On-Page Optimization | | SpaMedica0 -
Regarding Google Title 'Width' and changing Meta Titles w/o Penalty?
A vast majority of pages on my site are now too wide (the character count was fine prior to the March update). I want to go through and update them so they display properly and are not too wide.However, I am concerned, as my understanding was that changing Meta Titles is dangerous and can have a negative effect on your rankings and can cause real issues. Is this an opportunity to change my Titles all-together without any kind of penalty? Or can I only trim the end part? In summary: 1. Can I edit all of my Meta Titles without affecting my rankings? 2. If no, how do I edit them properly to fit within the proper width and not cause any issues? 3. If yes, I can go through and change all my Meta Titles to whatever extent and optimize them to reflect latest best practices? There are changes I wanted to make to all my meta titles but I've been afraid to... due to fear of rankings drops etc Any help with this would be greatly appreciated
On-Page Optimization | | lawfirm0 -
Is This Duplicate Content Hurting Our SERPs?
We sell 1000s of audio book title, many of which are published in more than one format (abridged, unabridged CD, and/or unabridged MP3) by the same publisher. Currently each title has its own page but the basic description of the title (story) is the same. Here is an example title that is offered in three formats. 44 Charles Street - Danielle Steel - abridged CD audiobook 44 Charles Street - Danielle Steel - MP3 CD audiobook 44 Charles Street - Danielle Steel - CD audiobook Each of the above pages has a different page title, a different URL, a different meta description however much of the body (from [Listen to a FREE Audio Clip] down is the same. Is this duplicate content hurting our SERPs?
On-Page Optimization | | lbohen1 -
Tags creating duplicated content issue?
Hello i believe a lot of us use tags in our blogs as a way to categorize content and make it easy searchable but this usually (at lease in my case) cause duplicate content creation. For example, if one article has 2 tags like "SEO" & "Marketing", then this article will be visible and listed in 2 urls inside the blog like this domain.com/blog/seo and domain.com/blog/marketing In case of a blog with 300+ posts and dozens of different tags this is creating a huge issue. My question is 1. Is this really bad? 2. If yes how to fix it without removing tags?
On-Page Optimization | | Lakiscy0 -
Archive of content
Hi there, I have recently joined a company to look after the e-marketing side of things, anyway the company I work for have been writing articles for a website that they own for over 2 years, probably about 200 or so unique articles on that website, however over the past year or so there has been no contribution to this site and was wondering if it would be worthwhile transferring these article over to our blog?, as this is where all the attention is in terms of marketing etc Kind Regards,
On-Page Optimization | | Paul780 -
Google's Page Layout Algorithm Change
Hello Everyone, Google says they've implemented this change because they are answering the complaints of users who have to search for actual content after they've clicked on a result. They go on to say users want to see content right away. Now while most of this talk is about ads, I wonder if this will also apply to websites that are image and flash heavy above the fold with very little content. I am working on a few auto dealer sites where 99% of the content above the fold are flash banners and images. Below all of this noise you can find about 200 words of text talking about their dealerships. I'd love to know everyone's thoughts on this...Does the new page layout algorithm change apply to only ads or to images and flash as well? Thanks
On-Page Optimization | | wparlaman0