PDF on financial site that duplicates ~50% of site content
-
I have a financial advisor client who has a downloadable PDF on his site that contains about 9 pages of good info. Problem is much of the content can also be found on individual pages of his site.
Is it best to noindex/follow the pdf? It would be great to let the few pages of original content be crawlable, but I'm concerned about the duplicate content aspect.
Thanks --
-
This is what we have done with pdfs. Assign rel="canonical" in .htaccess.
We did this with a few hundred files and it took google a LONG time to find and credit them.
-
You could set the header to noindex rather than rel=canonical
-
Personally I think it would be better not to index, it but if necessary, the index folder root seems like a good option
-
Thanks. Anybody want to weigh in on where to rel=canonical to? Home page?
-
If you are using apache, you should put it on your .htaccess with this form
<filesmatch “my-file.pdf”="">Header set Link ‘<http: misite="" my-file.html="">; rel=”canonical”‘</http:></filesmatch>
-
I think the right way here is to put the rel canonical in PDF header http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html
-
I thought the idea was to put rel=canonical on the duplicated page, to signal that "hey, this page may look like duplicate content, but please refer to this canonical URL"?
Looks like there is a pdf option for rel=canonical, I guess the question is, what page on the site to make canonical?
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
-
Hi Keith,
I'm sorry, I should have clarified. The rel=canonical tags would be on your Web pages, not the PDF (they are irrelevant in a PDF document). Then Google will attribute your Web page as the original source of the content and will understand that the PDF just contains bits of content from those pages. In this instance I would include a rel=canonical tag on every page of your site, just to cover your bases. Hope that helps!
Dana
-
Not sure which page I would mark as being canonical, since the pdf contains content from several different pages on the site. I don't think it's possible to assign different rel=canonical tags to separate portions of a pdf, is it?
-
As long as you have rel=canonical tags properly in place, you don't need to worry about the PDF causing duplicate content problems. That way, any original content should be picked up and any duplicate can be attributed to your existing Web pages. Hope that's helpful!
Dana
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to solve this issue and avoid duplicated content?
My marketing team would like to serve up 3 pages of similar content; www.example.com/one, www.example.com/two and www.example.com/three; however the challenge here is, they'd like to have only one page whith three different titles and images based on the user's entry point (one, two, or three). To avoid duplicated pages, how would suggest this best be handled?
Intermediate & Advanced SEO | | JoelHer0 -
Site architecture, inner link strategy and duplicate or thin content HELP :)
Ok, can I just say I love that Moz exists! I am still very new to this whole website stuff. I've had a site for about 2 years that I have re-designed several times. It has been published this entire time as I made changes but I am now ready to create amazing content for my niche. Trouble is my target audience is in a very focused niche and my site is really only about 1 topic - life insurance for military families. I'm a military spouse who happens to be an experience life insurance agent offering plans to active duty service members, their spouses as well as veterans and retirees. So really I have 3 niches within a niche. I'm REALLY struggling on how to set up my site architecture. My site is basically fresh so it's a good time to get it hammered down as best as possible with my limited knowledge. Might I also add this is a very competitive space. My competitors are big, established brands who offer life insurance along with unaffiliated, informational sites like military.com or the va benefits site. The people in my niche rarely actually search for life insurance because they think they are all set by the military. When they do search it's very short which is common as this niche lives in a world of acronyms. I'm going to have to get real creative to see if there are any long tail keywords I can use as supporting posts but I think my best route is to attempt to rank for the short one to three keyword phrases this niche looks for while searching. Given my expertise on the subject I am able to write long 1000-5000 content on the matter that will also point out some considerations my competitors dont really cover. My challenge is I cant see how this can be broken into sub topics without having thin supporting content. It's my understanding that I should create these in order to inner link and have a shot at ranking. In thinking about my topic I feel like the supporting posts can only be so long. Furthermore, my three niches within my small overall niche search for short but different keywords. Seems I am struggling to put it all into words. Let me stop here with a question - is it bad to have one category in a website? If not I feel like this would solve my dilemma in making a good site map and content plan. it is possible to split my main topic into 3 categories. I heard somewhere you shouldn't inner link posts from different categories. Problem is if I dont it's not ideal for the user experience as the topics really arent that different. Example a military member might be researching his/her own life insurance and be curious about his spouses coverage. In order to satisfy this user's experience and increase the time on my site I should link to where they can find more dept on their spouses coverage which would be in a different category. Is this still acceptable since it's really not a different subject?
Intermediate & Advanced SEO | | insuretheheroes.com0 -
If I put a piece of content on an external site can I syndicate to my site later using a rel=canonical link?
Could someone help me with a 'what if ' scenario please? What happens if I publish a piece of content on an external website, but then later decide to also put this content on my website. I want my website to rank first for this content, even though the original location for the content was the external website. Would it be okay for me to put a rel=canonical tag on the external website's content pointing to the copy on my website? Or would this be seen as manipulative?
Intermediate & Advanced SEO | | RG_SEO1 -
Do search engine consider this duplicate or thin content?
I operate an eCommerce site selling various equipment. We get product descriptions and various info from the manufacturer's websites offered to the dealers. Part of that info is in the form of User Guides and Operational Manuals downloaded in pdf format written by the manufacturer, then uploaded to our site. Also we embed and link to videos that are hosted on the manufacturer's respective YouTube or Vimeo channels. This is useful content for our customers.
Intermediate & Advanced SEO | | MichaelFactor
My questions are: Does this type of content help our site by offering useful info, or does it hurt our SEO due to it being thin and or duplicate content? Or does the original content publishers get all the benefit? Is there any benefit to us publishing this stuff? What exactly is considered "thin content"?0 -
Best way to remove duplicate content with categories?
I have duplicate content for all of the products I sell on my website due to categories and subcategories. Ex: http://www.shopgearinc.com/products/product/stockfeeder-af38.php http://www.shopgearinc.com/products/co-matic-power-feeders/stockfeeder-af38.php http://www.shopgearinc.com/products/co-matic-power-feeders/heavy-duty-feeders/stockfeeder-af38.php Above are 3 urls to the same title and content. I use a third party developer backend system so doing canonicalization seems difficult as I don't have full access. What is the best to get rid of this duplicate content. Can I do it through webmaster tools or should I pay the developer to do the canonicalization or a 301 redirect? Any suggestions? Thanks
Intermediate & Advanced SEO | | kysizzle60 -
Duplicate content resulting from js redirect?
I recently created a cname (e.g. m.client-site .com) and added some js (supplied by mobile site vendor to the head which is designed to detect if the user agent is a mobi device or not. This is part of the js: var CurrentUrl = location.href var noredirect = document.location.search; if (noredirect.indexOf("no_redirect=true") < 0){ if ((navigator.userAgent.match(/(iPhone|iPod|BlackBerry|Android.*Mobile|webOS|Window Now... Webmaster Tools is indicating 2 url versions for each page on the site - for example: 1.) /content-page.html 2.) /content-page.html?no_redirect=true and resulting in duplicate page titles and meta descriptions. I am not quite adept enough at either js or htaccess to really grasp what's going on here... so an explanation of why this is occurring and how to deal with it would be appreciated!
Intermediate & Advanced SEO | | SCW0 -
Can you be penalized by a development server with duplicate content?
I developed a site for another company late last year and after a few months of seo done by them they were getting good rankings for hundreds of keywords. When penguin hit they seemed to benefit and had many top 3 rankings. Then their rankings dropped one day early May. Site is still indexed and they still rank for their domain. After some digging they found the development server had a copy of the site (not 100% duplicate). We neglected to hide the site from the crawlers, although there were no links built and we hadn't done any optimization like meta descriptions etc. The company was justifiably upset. We contacted Google and let them know the site should not have been indexed, and asked they reconsider any penalties that may have been placed on the original site. We have not heard back from them as yet. I am wondering if this really was the cause of the penalty though. Here are a few more facts: Rankings built during late March / April on an aged domain with a site that went live in December. Between April 14-16 they lost about 250 links, mostly from one domain. They acquired those links about a month before. They went from 0 to 1130 links between Dec and April, then back to around 870 currently According to ahrefs.com they went from 5 ranked keywords in March to 200 in April to 800 in May, now down to 500 and dropping (I believe their data lags by at least a couple of weeks). So the bottom line is this site appeared to have suddenly ranked well for about a month then got hit with a penalty and are not in top 10 pages for most keywords anymore. I would love to hear any opinions on whether a duplicate site that had no links could be the cause of this penalty? I have read there is no such thing as a duplicate content penalty per se. I am of the (amateur) opinion that it may have had more to do with the quick sudden rise in the rankings triggering something. Thanks in advance.
Intermediate & Advanced SEO | | rmsmall0 -
HTTPS Duplicate Content?
I just recieved a error notification because our website is both http and https. http://www.quicklearn.com & https://www.quicklearn.com. My tech tells me that this isn't actually a problem? Is that true? If not, how can I address the duplicate content issue?
Intermediate & Advanced SEO | | QuickLearnTraining0