PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content query
I'm currently reauthoring all of the product pages on our site. Within the redesign of all the pages is a set of "why choose us?" bullet points and "what our customers say" bullet points. On every page these bullet points are the same. We currently have 18% duplicate content sitewide and I'm reluctant to push this. The products are similar but targeted at different professions, so I'm not sure whether to alter the text slightly for the bullet points on each page, remove the bullet points entirely or implement some form of canonicalisation that won't impact the profession-specific pages' ability to rank well.
On-Page Optimization | | EdLongley0 -
Does Google use 302's to pass value to the target page?
Hi, I've received the below advice, is this correct? Throughout the site, the 302 (moved temporarily) status code is used for redirects, which Google will use to pass value to the target page. Is this correct? I was under the impression a 301 was used to pass value to the target page? Could someone explain the difference between a 301 and a 302, I'm not 100% sure. Thanks, Nathan
On-Page Optimization | | Heehaw0 -
Does this site have a duplicate content issue?
Google WMT is showing me only 2 short meta descriptions under "HTML Improvements" but I believe http://www.customgia.com may have a content duplication issue. Numerous keywords are used repeatedly across many product descriptions. To make matters worse, every product page has a "Design It!" button that sends the user to a flash-based jewelry designer in which they can edit the product's appearance. I'm not sure if these "designer pages" are adding unnecessary and potentially damaging duplicate content but it's certainly a possibility. There are many items on this site that are similar to one another but not the same. The product description tend to use the same phrases over and over again - words like crystal, Swarovski, beaded, design it, customize, change, pearl, glass beads, iridescent, pearl, drop earrings are used a lot. What I'm stuck on is whether or not I should be focusing on a content duplication issue as the primary SEO problem or if there is something bigger. Thank you for any assistance you can provide!
On-Page Optimization | | rja2140 -
Content in forum signatures being spidered, does it matter?
Hello, first post here, just started with SEOmoz so hope it's relevant. Searched a fair bit on this without getting a good answer either way so interested to get some opinions. The core of the site I run is a forum dedicated to collecting, for the sake of argument let's say cars. A good percentage of the users have signatures which list their collection, for example 1968 Car A - 1987 Car B - 1998 Car D and so on.... These signatures lists can be 20 items or more, some hotlink the signautres back to the relevant post on the forum, some not. The signatures show on every post on which the user makes. What I'm noting is a) SEOMoz is reporting a LOT of links on every forum page, due mainly to these signatures I guess. and of more interest b) The content of the signatures is being spidered. So for example of you search for '1968 Car A' you might get a couple of good results directly relevant to '1968 Car A' from my site, but you also get a lot of other non-relevant threads as results because the user just happens to have posted on them. Obviously this is much more apparent on the site google search. So what is the best approach? Leave as is? Hide the signatures from the BOTs? Another approach?
On-Page Optimization | | rutteger0 -
Duplicate Content - Deleting Pages
The Penguin update in April 2012 caused my website to lose about 70% of its traffic overnight and as a consequence, the same in volume of sales. Almost a year later I am stil trying to figure out what the problem is with my site. As with many ecommerce sites a large number of the product pages are quite similar. My first crawl with SEOMOZ identified a large number of pages that are very similar - the majority of these are in a category that doesn't sell well anyway and so to help with the problem I am thinking of removing one of my categories (about 1000 products). My question is - would removing all these links boost the overall SEO of the site since I am removing a large chunk of near-duplicate links? Also - if I do remove all these links would I have to put in place a 301 redirect for every single page and if so, what's the quickest way of doing this. My site is www.modern-canvas-art.com Robin
On-Page Optimization | | robbowebbo0 -
Panda Smacked - now it's your turn
Hi all Ok so we were smacked by Panda way back in June 2011, and are recovering from it, (though definitely still not back up to pre-Panda levels). Since then we have: 1. Taken down a load of thin content pages. 2. Increased content. 3. Tried to reduce page template complexity. However, one of the issues we have is that we make money from Adsense, so don't want to reduce the number of ads - however, we may still be falling foul of Panda because of it. So, please take a look at this sample page and tear it /us apart: http://www.compactlaw.co.uk/free-legal-information/children/children-act-orders.html Thank you. And if we can ever help the community back, please just ask.
On-Page Optimization | | dexm100 -
Wordpress pages URL's redirection.
I was checking W3C Markup Validation and in report it was shown that that pages (not post or any other URL's just PAGES) at investmentcontrarians.com are 301 redirected. e.g. original URL "http://www.investmentcontrarians.com/debt-crisis" which is redirected to "http://www.investmentcontrarians.com/debt-crisis/" I know that its not that serious issue, but still want to know why only pages are being redirected and how can we avoid it.
On-Page Optimization | | NumeroUnoWebSolutions0 -
Duplicate page content errors
Site just crawled and report shows many duplicate pages but doesn't tell me which ones are dups of each other. For you experienced duplicate page experts, do you have a subscription with copyscape and pay $.05 per test? What is the best way to clear these? Thanks in advance
On-Page Optimization | | joemas990