PDF's - Dupe Content
-
Hi
I have some pdfs linked to from a page with little content. Hence thinking best to extract the copy from the pdf and have on-page as body text, and the pdf will still be linked too. Will this count as dupe content ?
Or is it best to use a pdf plugin so page opens pdf automatically and hence gives page content that way ?
Cheers
Dan
-
Should be different, but you would have to look at them to make sure.
-
ps - is a pdf to html coverter different from a plugin that loads the pdf as an open page when you click it ? or same thing ?
-
That is what I was going to suggest - setting up a canonical in the http header of the PDF back to the article
https://support.google.com/webmasters/answer/139394?hl=en
As another option, you can just block access to the PDFs to keep them out of the index as well.
-
thanks Chris
yes you can canonicalise the pdf to the html (according to the comments of that article i just linked to anyway)
-
Hi Dan,
Yes PDFs are crawlable (sorry for confusion!) if you were to put it into say a .zip or .rar (or similar) it wouldn't be crawled or you could no index the link i guess. You would need to stick the PDF (download) behind some thing that couldn't be crawled. You could try rel= canonical but I've never tried it with a PDF so i'm not sure how that would go.
Hope that enlightens you a bit.
-
Thanks Chris although i thought PDFS were crawlable??: http://www.lunametrics.com/blog/2013/01/10/seo-pdfs/
Hence why im worried about dupe content if use content of pdf as body text too OR are you saying should no-follow the link to the pdf if use its content as body text because it is considered dupe content in that scenario ?
Ideally i want both - the copy on it used as body text copy on page and the pdf a linkable download, or page as embed of open pdf via a plugin.
-
What would give the user the best experience is the really question,I would;d say put it on page then if the user is lacking a plugin they can still read it, if you have it as a downloadable PDF is shouldn't be able to get crawled and thus avoiding the problem.
Hope that helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Consolidating a Large Site with Duplicate Content
I will be restructuring a large website for an OEM. They provide products & services for multiple industries, and the product/service offering is identical across all industries. I was looking at the site structure and ran a crawl test, and learned they have a LOT of duplicate content out there because of the way they set up their website. They have a page in the navigation for “solution”, aka what industry you are in. Once that is selected, you are taken to a landing page, and from there, given many options to explore products, read blogs, learn about the business, and contact them. The main navigation is removed. The URL structure is set up with folders, so no matter what you select after you go to your industry, the URL will be “domain.com/industry/next-page”. The product offerings, blogs available, and contact us pages do not vary by industry, so the content that can be found on “domain.com/industry-1/product-1” is identical to the content found on “domain.com/industry-2/product-1” and so-on and so-forth. This is a large site with a fair amount of traffic because it’s a pretty substantial OEM. Most of their content, however, is competing with itself because most of the pages on their website have duplicate content. I won’t begin my work until I can dive in to their GA and have more in-depth conversations with them about what kind of activity they’re tracking and why they set up the website this way. However, I don’t know how strategic they were in this set up and I don’t think they were aware that they had duplicate content. My first thought would be to work towards consolidating the way their site is set up, so we don’t spread the link-equity of “product-1” content, and direct all industries to one page, and track conversion paths a different way. However, I’ve never dealt with a site structure of this magnitude and don’t want to risk messing up their domain authority, missing redirect or URL mapping opportunities, or ruin the fact that their site is still performing well, even though multiple pages have the same content (most of which have high page authority and search visibility). I was curious if anyone has dealt with this before and if they have any recommendations for tackling something like this?
On-Page Optimization | | cassy_rich0 -
Changes taken over in the SERP's: How long do I have to wait until i can rely on the (new) position?
I changed different things on a particular page (mainly reduced the exaggerated keyword density --> spammy). I made it recrawl by Google (Search Console). The new version has now already been integrated in the SERP's.Question: Are my latest changes (actual crawled page in the SERP's is now 2 days old) already reflected in the actual position in the SERP's or should I wait for some time (how long?) to evaluate the effect of my changes? Can I rely on the actual position or not?
On-Page Optimization | | Cesare.Marchetti0 -
Duplicate content query
I'm currently reauthoring all of the product pages on our site. Within the redesign of all the pages is a set of "why choose us?" bullet points and "what our customers say" bullet points. On every page these bullet points are the same. We currently have 18% duplicate content sitewide and I'm reluctant to push this. The products are similar but targeted at different professions, so I'm not sure whether to alter the text slightly for the bullet points on each page, remove the bullet points entirely or implement some form of canonicalisation that won't impact the profession-specific pages' ability to rank well.
On-Page Optimization | | EdLongley0 -
What to do about resellers duplicating content?
Just went through a big redevelopment for a client and now have fresh images and updated content but now all the resellers have just grabbed the new images/content and pasted them on their own site. My client is a manufacture that sells directly online and over the phone for large orders. I'm just not sure how to handle the resellers duplicate content. Any thoughts on this? Am I being silly for worrying about this?
On-Page Optimization | | ericnkatz0 -
Dates in URL's
I have an issue of duplicate content errors and duplicate page titles which is penalising my site. This has arisen because a number of URLs are suffixed by date(s) and have been spidered . In principle I do not want any url with a suffixed date to be spidered. Eg:- www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm/06_07_13/13_07_13 http://www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm/20_07_13/27_07_13 Only this URL should be spidered:- http://www.carbisbayholidays.co.uk/carbis-bay/houses-in-carbis-bay/seaspray.htm I have over 10,000 of these duplicates and firstly wish to remove them on block from Google ( not one by one ) and secondly wish to amend my robots.txt file so the URL's are not spidered. I do not know the format for either. Can anyone help please.
On-Page Optimization | | carbisbayhols0 -
Product category content!? what should it include?
Hello everyone!, I consider myself a rookie... so... please, excuse me if this is super basic or dumb!. I'm working on a ecommerce web (family business!)... and i've got this doubt. Say you've got architected your site this way...: site.com/category
On-Page Optimization | | jleandroperez
site.com/category/model_1
site.com/category/model_2 I'm mainly interested in getting the category webpages to rank high. The problem i've got is... what to put in the CATEGORY webpage!. Suppose you sale office furniture... and the category is 'chairs'... if you add content there, it won't be useful. What do you suggest me to add there?. ====== NOTE: My 'categories' webpage is split vertically, so you've got an image gallery on the left, and the product description on the right. So all of my product pages look a bit alike... and the 'category' itself has a placeholder on the right. I suspect that's why i'm not getting good rankings! THANKS in advance.0 -
Duplicate page content
what is duplicate page content, I have a dating site and it's got a groups area where the members can base there discussions in a category like for an example, night life, health and beauty, and such. why would this cause a problem of duplicate page content and how would I fix it. explained in the terms of a dummy.
On-Page Optimization | | clickit2getwithit0 -
Multiple H1's
Hi, My SEOMOZ report states that I'm using two H1's on most of my pages, for example on this page: http://www.absolutepower.nl/eiwitshakes/proteine-shakes/ I only see one though. Anyone who could clarify this? Thanks! Jasper
On-Page Optimization | | Japking0