PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Number of internal links and passing 'link juice' down to key pages.
Howdy Moz friends. I've just been checking out this post on Moz from 2011 and wanted to know how relevant it is today? I'm particularly interested in a number of links we have on our HP potentially harming important landing page rankings because not enough 'link juice is getting to them i.e) are they are being diluted by all the many other links on the page? (deeper pages, faqs, etc etc) It seems strange to me that as Google as has got more sophisticated this would still be that relevant (thus the reason for posting). Anyway, I thought I was definitely worth asking. If we can leverage more out of our on-page efforts then great 🙂
On-Page Optimization | | isaac6630 -
Thoughts on archiving content on an event site?
I have a few sites that are used exclusively to promote live events (ex. tradeshows, conference, etc). In most cases these sites content fewer than 100 pages and include information for the upcoming event with links to register. Some time after the event has ended, we would redesign the site and start promoting next years event...essentially starting over with a new site (same domain). We understand the value that many of these past event pages have for users who are looking for info from the past event and we're looking for advice on how best to archive this content to preserve for SEO. We tend to use concise urls for pages on these sites. Ex. www.event.com/agenda or www.event.com/speakers. What are your thoughts on archiving the content from these pages so we can reuse the url with content for the new event? My first thought is to put these pages into an archive, like www.event.com/2015/speakers. Is there a better way to do this to preserve the SEO value of this content?
On-Page Optimization | | accessintel0 -
Would you consider this to be thin content
I always struggle with these pages I have on my site going back and forth debating what I want to do with them. On one side Google was content, yet at the same time its all about user experience. http://www.freescrabbledictionary.com/word-lists/words-that-start-with/letter/h/ I used to have all my words listed on one page which could have been well over 10,000. Now I pagination them as you can see. I debate writing a header of content for these pages, but honestly users just want the words. Get in, get what you need and get out. What is the recommendation on these pages. Should I write content? Should I not?
On-Page Optimization | | cbielich0 -
Duplicate and thin content - advanced..
Hi Guys Two issues to sort out.. So we have a website that lists products and has many pages for: a) The list pages - that lists all the products for that area.
On-Page Optimization | | nick-name123
b) The detailed pages - that when click into from the list page, will list the specific product in full. On the list page, we perhaps have half the description written down, when clicked into you see the full description.
If you search in google for a phrase on the detailed page, you will see results for that specific page including 'multiple' list pages where it is on. For example, lets say we are promoting 'trees' which are situated in Manhatten. And we are also promoting trees in Brooklyn, there is a crossover. So a tree listed in Manhatten will also be listen in brooklyn as its close by (not from America so don't laugh if I have areas muddled)
We then have quite a few pages with the same content as a result. I read a post a while back from the mighty Cutts who said not to worry about the duplicate unless its spammy, but what is good for one person, is spammy to another.. Does anyone have any ideas as to if this is a genuine problem and how you would solve? Also, we know we have alot of thin content on the site, but we dont know how to identify it. It's a large site so needs something automated (I think).. Thanks in advance Nick0 -
Regarding Google Title 'Width' and changing Meta Titles w/o Penalty?
A vast majority of pages on my site are now too wide (the character count was fine prior to the March update). I want to go through and update them so they display properly and are not too wide.However, I am concerned, as my understanding was that changing Meta Titles is dangerous and can have a negative effect on your rankings and can cause real issues. Is this an opportunity to change my Titles all-together without any kind of penalty? Or can I only trim the end part? In summary: 1. Can I edit all of my Meta Titles without affecting my rankings? 2. If no, how do I edit them properly to fit within the proper width and not cause any issues? 3. If yes, I can go through and change all my Meta Titles to whatever extent and optimize them to reflect latest best practices? There are changes I wanted to make to all my meta titles but I've been afraid to... due to fear of rankings drops etc Any help with this would be greatly appreciated
On-Page Optimization | | lawfirm0 -
tagged as duplicate content?
Hello folks, I'm new to SEOmoz . I was looking at our Crawl Diagnostics and found that some of our blog posts that have been commented on were tagged as duplicate content. For example: http://thankyouregistry.com/blog/remarriages-and-gift-registries/ http://thankyouregistry.com/blog/remarriages-and-gift-registries/comment-page-1/ I'm unsure how to fix these, so any ideas would be appreciated. Thanks a lot!
On-Page Optimization | | GiftReg0 -
Templated content with minor changes
Hello, I am working on a site with 1000s of inventory pages. i am slowly building links and writing custom content for pages that will bring best ROI. Will i have issues on templated pages with minor changes.currently traffic is pretty good. thank you
On-Page Optimization | | TP_Marketing0 -
Numbers in URL's - Search friendly or not?
Hi Mozzers, I have a client who has just launched a new website and we are having difficulties in making the URL's search friendly. I wont get into the technical aspects, but I'll explain the potential solutions the developers have given me. current: www.site.com/en/product/browse-by-product/37/22 Where 'en' stands for the English version of the website, 37 is the product category for example 'hard drives', and 22 is the product name or example 'seagate' Option to fix; www.site.com/en/p/product/hard-drives-37/seagate-22 This optional fix reduces the word product down to p, reduces 'browse by product' to 'product' and inserts the category and product names. Note the category identifier '37' has to be included in the URL, and the product identifier '22' also has to be in the URL. Obviously this is not great, but it is required at the moment. Best case scenario would be to have the URL like this... www.site.com/en/hard-drives/seagate So my question is, how far off the best case scenario is the option to fix? Scale of 1 to 10 would be good?
On-Page Optimization | | JoeyDorrington0