PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Please help - Duplicate Content
Hi, I am really struggling to understand why my site has a lot of duplicate content issues. It's flagging up as ridiculously high and I have no idea how to fix this, can anyone help me, please? Website is www.firstcapitol.co.uk
Intermediate & Advanced SEO | | Alix_SEO1 -
About duplicate content
We have to products: - loan for a new car
Intermediate & Advanced SEO | | KBC
- load for a second hand car Except for title tag, meta desc and H1, the content is of course very similmar. Are these pages considered as duplicate content? https://new.kbc.be/product/lenen/voertuig/autolening-tweedehands-auto.html
https://new.kbc.be/product/lenen/voertuig/autolening-nieuwe-auto.html thanks for the advice,0 -
Duplicate Page Content
We have different plans that you can signup for - how can we rectify the duplicate page content and title issue here? Thanks. | http://signup.directiq.com/?plan=100 | 0 | 1 | 32 | 1 | 200 |
Intermediate & Advanced SEO | | directiq
| http://signup.directiq.com/?plan=104 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=116 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=117 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=102 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=119 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=101 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=103 | 0 | 1 | 32 | 1 | 200 |
| http://signup.directiq.com/?plan=5 |0 -
Penalized for Similar, But Not Duplicate, Content?
I have multiple product landing pages that feature very similar, but not duplicate, content and am wondering if this would affect my rankings in a negative way. The main reason for the similar content is three-fold: Continuity of site structure across different products Similar, or the same, product add-ons or support options (resulting in exactly the same additional tabs of content) The product itself is very similar with 3-4 key differences. Three examples of these similar pages are here - although I do have different meta-data and keyword optimization through the pages. http://www.1099pro.com/prod1099pro.asp http://www.1099pro.com/prod1099proEnt.asp http://www.1099pro.com/prodW2pro.asp
Intermediate & Advanced SEO | | Stew2220 -
WMT Index Status - Possible Duplicate Content
Hi everyone. A little background: I have a website that is 3 years old. For a period of 8 months I was in the top 5 for my main targeted keyword. I seemed to have survived the man eating panda but not so sure about the blood thirsty penguin. Anyway; my homepage, along with other important pages, have been wiped of the face of Google's planet. First I got rid of some links that may not have been helping and disavowed them. When this didn't work I decided to do a complete redesign of my site with better content, cleaner design, removed ads (only had 1) and incorporated social integration. This has had no effect at all. I filed a reconsideration request and was told that I have NOT had any manual spam penalties made against me, by the way I never received any warning messages in WMT. SO, what could be the problem? Maybe it's duplicate content? In WMT the Index Status indicates that there are 260 pages indexed. However; I have only 47 pages in my sitemap and when I do a site: search on Google it only retrieves 44 pages. So what are all these other pages? Before I uploaded the redesign I removed all the current pages from the index and cache using the remove URL tool in WMT. I should mention that I have a blog on Blogger that is linked to a subdomain on my hosting account i.e. http://blog.mydomain.co.uk. Are the blog posts counted as pages on my site or on Blogger's servers? Ahhhh this is too complicated lol Any help will be much appreciated! Many thanks, Mark.
Intermediate & Advanced SEO | | Nortski0 -
Issue with duplicate content in blog
I have blog where all the pages r get indexed, with rich content in it. But In blogs tag and category url are also get indexed. i have just added my blog in seomoz pro, and i have checked my Crawl Diagnostics Summary in that its showing me that some of your blog content are same. For Example: www.abcdef.com/watches/cool-watches-of-2012/ these url is already get indexed, but i have asigned some tag and catgeory fo these url also which have also get indexed with the same content. so how shall i stop search engines to do not crawl these tag and categories pages. if i have more no - follow tags in my blog does it gives negative impact to search engines, any alternate way to tell search engines to stop crawling these category and tag pages.
Intermediate & Advanced SEO | | sumit600 -
Duplicate content clarity required
Hi, I have access to a masive resource of journals that we have been given the all clear to use the abstract on our site and link back to the journal. These will be really useful links for our visitors. E.g. http://www.springerlink.com/content/59210832213382K2 Simply, if we copy the abstract and then link back to the journal source will this be treated as duplicate content and damage the site or is the link to the source enough for search engines to realise that we aren't trying anything untoward. Would it help if we added an introduction so in effect we are sort of following the curating content model? We are thinking of linking back internally to a relevant page using a keyword too. Will this approach give any benefit to our site at all or will the content be ignored due to it being duplicate and thus render the internal links useless? Thanks Jason
Intermediate & Advanced SEO | | jayderby0 -
Duplicate content: is it possible to write a page, delete it and use it for a different site?
Hi, I've a simple question. Some time ago I built a site and added pages to it. I have found out that the site was penalized by Google and I have neglected it. The problem is that I had written well-optimized pages on that site, which I would like to use on another website. Thus, my question is: if I delete a page I had written on site 1, can use it on page 2 without being penalized by Google due to duplicate content? Please note: site one would still be online. I will simply delete some pages and use them on site 2. Thank you.
Intermediate & Advanced SEO | | salvyy0