PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content - Pricing Plan tables
Hey guys, We're faced with a problem that we want to solve. We're working on the designs for a few pages for a drag & drop email builder we're currently working on, and we will be having the same pricing table on several pages (much like Moz does). We're worried that Google will take this as duplicate content and not be very fond of it. Any ideas about how we could integrate the same flow without potentially harming ranking efforts? And NO, re-writing the content for each table is not an option. It would do nothing but confuse the heck out of our clients. 😄 Thanks everybody!
On-Page Optimization | | andy.bigbangthemes0 -
Delete or not delete outdated content
Hi there!
On-Page Optimization | | Enrico_Cassinelli
We run a website about a region in Italy, the Langhe area, where we write about wine and food, local culture, and we give touristic informations. The website also sports a nice events calendar: in 4 years we (and our users) loaded more than 5700 events. Now, we're starting to have some troubles managing this database. The database related to events is huge both in file size and number of rows. There are a lot of images that eat up disk space, and also it's becoming difficult to manage all the data in our backend. Also, a lot of users are entering the website by landing on outdated events. I was wondering if it could be a good idea to delete events older than 6 months: the idea was to keep only the most important and yearly recurring events (which we can update each year with fresh information), and trash everything else. This of course means that 404 errors will increase, and also that our content will gettin thinner, but at the same time we'll have a more manageable database, and the content will be more relevant and "clean". What do you think? thank you 🙂 Best0 -
Duplicate Page content | What to do?
Hello Guys, I have some duplicate pages detected by MOZ. Most of the URL´s are from a registracion process for users, so the URL´s are all like this: www.exemple.com/user/login?destination=node/125%23comment-form What should I do? Add this to robot txt? If so how? Whats the command to add in Google Webmaster? Thanks in advance! Pedro Pereira
On-Page Optimization | | Kalitenko20140 -
Split testing and dupe content
Hi Everyone, good to be here. I'd like to do split testing in Adwords, currently with a clients site we are selling from a normal site with navigation. The site has about 5 specific products, I want to dupe one of the products and create a funnel without navigation distractions right to checkout. Then A/B test the same product pages in Adwords, one with nav and one without. Will the dupe content be ignored do you think? I'm only slightly concerned as the product pages rank well at the moment.
On-Page Optimization | | eonicWeb0 -
Building content pages, redirecting and linking
Previously the company had created some .HTML content pages around top shoe styles and top manufactures. One or two of these pages used to rank but have been neglected over the page 18 months. I want to build out new content round our top styles / top manufactures and I am wondering if I should use the existing HTML pages or create new pages that use our content management system. The .HTML pages can contain keywords in the URL, using our content management system, all URL’s are www.site.com/content/home/contentid=1234abcd. If we use the .HTML pages all content is managed manually. If we build out 6 to 10 pages, this can become a resource issue and may result in a bad experience for the website visitor. From an SEO perspective, does the benefit of having the keywords in the URL outweigh the manual management hassles? And if not, should we 301 all the HTML pages to the new content pages? And from a linking standpoint, I want these content pages to point to the new version of the top style. From a navigation standpoint, we also want to provide access to all styles from the manufacture. Should we nofollow the links to all styles?
On-Page Optimization | | seorunner0 -
Percentage of duplicate content allowable
Can you have ANY duplicate content on a page or will the page get penalized by Google? For example if you used a paragraph of Wikipedia content for a definition/description of a medical term, but wrapped it in unique content is that OK or will that land you in the Google / Panda doghouse? If some level of duplicate content is allowable, is there a general rule of thumb ratio unique-to-duplicate content? thanks!
On-Page Optimization | | sportstvjobs0 -
What's the best practice for implementing a "content disclaimer" that doesn't block search robots?
Our client needs a content disclaimer on their site. This is a simple "If you agree to these rules then click YES if not click NO" and you're pushed back to the home page. I have this gut feeling that this may cause an upset with the search robots. Any advice? R/ John
On-Page Optimization | | TheNorthernOffice790 -
Duplicate page content errors
Site just crawled and report shows many duplicate pages but doesn't tell me which ones are dups of each other. For you experienced duplicate page experts, do you have a subscription with copyscape and pay $.05 per test? What is the best way to clear these? Thanks in advance
On-Page Optimization | | joemas990