PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
CTA first content next or Content first CTA next
We are a casino affiliations company, our website has a lot of the same casino offers. So is it beneficial to put the content over the casino offers, then do a CSS flex, reverse wrap, so the HTML has the page content first, but the visual of the page displays the casinos first and the content after? or just the usual i.e image the HTML as content first, and CSS makes offers come first?
On-Page Optimization | | JoelssonMedia0 -
Will shortening down the amount of text on my pages affect it's SEO performance?
My website has several pages with a lot of text that becomes pretty boring. I'm looking at shortening down the amount of copy on each page but then within the updated, shortened copy, integrating more target keywords naturally. Will shortening down the current copy have a negative effect on my SEO performance?
On-Page Optimization | | Liquid20150 -
Hi - How do you get rid of duplicate content that was accidentally created on a tag url? For example, when I published a new article, the content was duplicated on: /posts/tag/lead-generation/
the original article was created with: /posts/shippers-looking-for-freight-brokers/ How can I fix this so a new URL is not created every time I add a tag to a new posting?
On-Page Optimization | | treetopgrowthstrategy0 -
Home Page Content
Hello. i'm optimizing this website, > home page for one keyword phrase and i was wondering how many words article do i need with that keyword?and if i need it at all? as you can see if i add some content on my home page before the slider, it will ruin the look of the website, What is the right way to do it? Thank you!
On-Page Optimization | | KentR0 -
What's a reasonable bounce rate for school website?
Does anyone have a baseline on what the average bounce rate should be on a school website?
On-Page Optimization | | BillyBobGriffin0 -
Suggestions to avoid duplicate content
Hi, we have about 6500 products, almost all with descriptions. SEOMOZ is showing about 2500 of them with duplicate content. The reason for this is that only one or two words are different for each product. For example, we have 500 award certificates. All are the same size and have the same description. But one is swimming, one baseball, one reading, etc, etc. Apparently the 1 word difference is not enough to differentiate. We have the same issue with our trophies - they are identical, except for figures. Does anyone have any good tips on how to change the content to avoid this issue and to avoid making up content for 2500 items? Thanks! Neil trophycentral.com
On-Page Optimization | | trophycentraltrophiesandawards0 -
Home Page Content - In a Div?
Is putting content in a div so it doesn't muck up the look of the home page create a problem in doing well organically? Example - http://www.callawaygardens.com. We have lots of clients that want no text on the home page and we are trying to figure out how to do this while still ranking well organically. What are your thoughts? Can we get in trouble? Are there negative impacts with SEO doing it like this? Thank you!
On-Page Optimization | | RezStream80 -
How woud you deal with Blog TAGS & CATEGORY listings that are marked a 'duplicate content' in SEOmoz campaign reports?
We're seeing "Duplicate Content" warnings / errors in some of our clients' sites for blog / event calendar tags and category listings. For example the link to http://www.aavawhistlerhotel.com/news/?category=1098 provides all event listings tagged to the category "Whistler Events". The Meta Title and Meta Description for the "Whistler Events" category is the same as another other category listing. We use Umbraco, a .NET CMS, and we're working on adding some custom programming within Umbraco to develop a unique Meta Title and Meta Description for each page using the tag and/or category and post date in each Meta field to make it more "unique". But my question is .... in the REAL WORLD will taking the time to create this programming really positively impact our overall site performance? I understand that while Google, BING, etc are constantly tweaking their algorithms as of now having duplicate content primarily means that this content won't get indexed and there won't be any really 'fatal' penalties for having this content on our site. If we don't find a way to generate unique Meta Titles and Meta Descriptions we could 'no-follow' these links (for tag and category pages) or just not use these within our blogs. I am confused about this. Any insight others have about this and recommendations on what action you would take is greatly appreciated.
On-Page Optimization | | RoyMcClean0