PDF for link building - avoiding duplicate content
-
Hello,
We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product.
We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful.
My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content?
Thanks.
-
Hey Bob
I think you should forget about any kind of perceived conventions and have whatever you think works best for your users and goals.
Again, look at unbounce, that is a custom landing page with a homepage link (to share the love) but not the general site navigation.
They also have a footer to do a bit more link love but really, do what works for you.
Forget conventions - do what works!
Hope that helps
Marcus -
I see, thanks! I think it's important not to have the ecommerce navigation on the page promoting the pdf. What would you say is ideal as far as the graphical and navigation components of the page with the PDF on it - what kind of navigation and graphical header should I have on it?
-
Yep, check the HTTP headers with webbug or there are a bunch of browser plugins that will let you see the headers for the document.
That said, I would push to drive the links to the page though rather than the document itself and just create a nice page that houses the document and make that the link target.
You could even make the PDF link only available by email once they have singed up or some such as canonical is only a directive and you would still be better getting those links flooding into a real page on the site.
You could even offer up some HTML to make this easier for folks to link to that linked to your main page. If you take a look at any savvy infographics etc folks will try to draw a link into a page rather than the image itself for the very same reasons.
If you look at something like the Noobs Guide to Online Marketing from Unbounce then you will see something like this as the suggested linking code:
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
Unbounce – The DIY Landing Page Platform
So, the image is there but the link they are pimping is a standard page:
http://unbounce.com/noob-guide-to-online-marketing-infographic/
They also cheekily add an extra homepage link in as well with some keywords and the brand so if folks don't remove that they still get that benefit.
Ultimately, it means that when links flood into the site they benefit the whole site rather than just promote one PDF.
Just my tuppence!
Marcus -
Thanks for the code Marcus.
Actually, the pdf is what people will be linking to. It's a guide for websites. I think the PDF will be much easier to promote than the article.I assume so anyway.
Is there a way to make sure my canonical code in htaccess is working after I insert the code?
Thanks again,
Bob
-
Hey Bob
There is a much easier way to do this and simply have your PDFs that you don't want indexed in a folder that you block access to in robots.txt. This way you can just drop PDFs into articles and link to them knowing full well these pages will not be indexed.
Assuming you had a PDF called article.pdf in a folder called pdfs/ then the following would prevent indexation.
User-agent: * Disallow: /pdfs/
Or to just block the file itself:
User-agent: *
Disallow: /pdfs/yourfile.pdf Additionally, There is no reason not to add the canonical link as well and if you find people are linking directly to the PDF then having this would ensure that the equity associated with those links was correctly attributed to the parent page (always a good thing).Header add Link '<http: www.url.co.uk="" pdfs="" article.html="">; </http:> rel="canonical"'
Generally, there are better ways to block indexation than with robots.txt but in the case of PDFs, we really don't want these files indexed as they make for such poor landing pages (no navigation) and we certainly want to remove any competition or duplication between the page and the PDF so in this case, it makes for a quick, painless and suitable solution.
Hope that helps!
Marcus -
Thanks ThompsonPaul,
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.html
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.html="">; rel="canonical""</http:>
-
You can insert the canonical header link using your site's .htaccess file, Bob. I'm sure Hostgator provides access to the htaccess file through ftp (sometimes you have to turn on "show hidden files") or through the file manager built into your cPanel.
Check tip #2 in this recent SEOMoz blog article for specifics:
seomoz.org/blog/htaccess-file-snippets-for-seosJust remember too - you will want to do the same kind of on-page optimization for the PDF as you do for regular pages.
- Give it a good, descriptive, keyword-appropriate, dash-separated file name. (essential for usability as well, since it will become the title of the icon when saved to someone's desktop)
- Fill out the metadata for the PDF, especially the Title and Description. In Acrobat it's under File -> Properties -> Description tab (to get the meta-description itself, you'll need to click on the Additional Metadata button)
I'd be tempted to build the links to the html page as much as possible as those will directly help ranking, unlike the PDF's inbound links which will have to pass their link juice through the canonical, assuming you're using it. Plus, the visitor will get a preview of the PDF's content and context from the rest of your site which which may increase trust and engender further engagement..
Your comment about links in the PDF got kind of muddled, but you'll definitely want to make certain there are good links and calls to action back to your website within the PDF - preferably on each page. Otherwise there's no clear "next step" for users reading the PDF back to a purchase on your site. Make sure to put Analytics tracking tags on these links so you can assess the value of traffic generated back from the PDF - otherwise the traffic will just appear as Direct in your Analytics.
Hope that all helps;
Paul
-
Can I just use htaccess?
See here: http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers
We only have one pdf like this right now and we plan to have no more than five.
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.pdf
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.pdf="">; rel="canonical""</http:>
-
How do I know if I can do an HTTP header request? I'm using shared hosting through hostgator.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
Thank you DoRM,
I assume that the PDF is what I want to be the main version since that is what I'll be marketing, but I could be wrong? What if I get backlinks to both pages, will both sets of backlinks count?
-
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
You can read more her http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Could duplicate (copied) content actually hurt a domain?
Hi 🙂 I run a small wordpress multisite network where the main site which is an informative portal about the Langhe region in Italy, and the subsites are websites of small local companies in the tourism and wine/food niche. As an additional service for those who build a website with us, I was thinking about giving them the possibility to use some ouf our portal's content (such as sights, events etc) on their website, in an automatic way. Not as an "SEO" plus, but more as a service for their current users/visitors base: so if you have a B&B you can have on your site an "events" section with curated content, or a section about thing to see (monuments, parks, museums, etc) in that area, so that your visitors can enjoy reading some content about the territory. I was wondering if, apart from NOT being benefical, it would be BAD from an SEO point of view... ie: if they could be actually penlized by google. Thanks 🙂 Best
Intermediate & Advanced SEO | | Enrico_Cassinelli0 -
Reasonable Cost for Link Building Service
We need about 5-10 high quality links to our website created every month. We need the link targets researched and outreach done to these sites. The sites most be legitimate and high quality; decent domain authority, real sites, not phony low quality sites. Sites that would show traffic in similarweb.com with decent metrics. We absolutely want to avoid any link building schemes that could get us penalized. I have been told that such a project would take a qualified SEO about 8-10 hours per months (more during the additional month of research, less afterward). As such, what is a reasonable cost for these 5-10 links per month? $300, $500, $700, more? I only want to work with a highly experienced SEO, native english speaker with extensive experience. What is fair? I don't want to overpay or to under pay. Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Duplicate content across different domains
Hi Guys, Looking for some advice regarding duplicate content across different domains. I have reviewed some previous Q&A on this topic e.g. https://moz.com/community/q/two-different-domains-exact-same-content but just want to confirm if I'm missing anything. Basically, we have a client which has 1 site (call this site A) which has solids rankings. They have decided to build a new site (site B), which contains 50% duplicate pages and content from site A. Our recommendation to them was to make the content on site B as unique as possible but they want to launch asap, so not enough time. They will eventually transfer over to unique content on the website but in the short-term, it will be duplicate content. John Mueller from Google has said several times that there is no duplicate content penalty. So assuming this is correct site A should be fine, no ranking losses. Any disagree with this? Assuming we don't want to leave this to chance or assume John Mueller is correct would the next best thing to do is setup rel canonical tags between site A and site B on the pages with duplicate content? Then once we have unique content ready, execute that content on the site and remove the canonical tags. Any suggestions or advice would be very much appreciated! Cheers, Chris
Intermediate & Advanced SEO | | jayoliverwright0 -
Best method for blocking a subdomain with duplicated content
Hello Moz Community Hoping somebody can assist. We have a subdomain, used by our CMS, which is being indexed by Google.
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/
https://admin.naturalworldsafaris.com/ The page is the same so we can't add a no-index or no-follow.
I have both set up as separate properties in webmaster tools I understand the best method would be to update the robots.txt with a user disallow for the subdomain - but the robots text is only accessible on the main domain. http://www.naturalworldsafaris.com/robots.txt Will this work if we add the subdomain exclusion to this file? It means it won't be accessible on https://admin.naturalworldsafaris.com/robots.txt (where we can't create a file). Therefore won't be seen within that specific webmaster tools property. I've also asked the developer to add a password protection to the subdomain but this does not look possible. What approach would you recommend?0 -
Link Building Question
Hey Moz'ers, I have created several blogs on different domains for the purpose of writing good content articles that contain 2-3 links per article that go back to my website. It has been up for about 3-4 weeks. I am not seeing my results/links showing up in OSE, is this because it still needs more time or is there something else I could be advised to look into? In theory these blogs will only contain 2-3 links from each domain to the site. I was also going to make sure the anchor text per link is different (keyword, brand name, random anchor like click here). Side note: How does this system sound as part of one small aspect to link building? red flags? Thanks for all the responses and advice.
Intermediate & Advanced SEO | | MonsterWeb280 -
Stellar Content - Calls to Action - Link Building
Hello, How do you place calls to actions, links back to your products/pages, and praise for your products in articles about your main products without looking bad to people who might give you a backlink. We want these things but we want our articles to be perfect for backlinks, natural and through a backlink campaign.
Intermediate & Advanced SEO | | BobGW0 -
Question about WhiteHat Quality Link Building Technique!
Hello, I am using Opensite Explorer as well as Link Builder from wordtracker to find good links which either link to my competitors or either links pointing to top 20 sites in my niche keyword. Then my team follow each link and find Directory Links Forum Profile Links Bookmark Links PR / Article Sites Links Guest Blog Post Sites Links... Then we make links manually to those sites for our websites as well. Is this a good whitehat strategy for long term good SEO, i believe opensiteexplorer's high page authority links shall worth in a long run. Also I timely post article to my blog and then distribute it to my twitter as well as run few social bookmarks on my article posted on my blog. I want to know community that am i doing SEO for link building in right way or any suggestion there from honorable SEOMOz Members. I know content is key however we are an ecommerece sites mostly thus we need to timely create backlinks as well to stay in competition. I will wait for feedback of honorable community if we are on right direction for SEO or not?
Intermediate & Advanced SEO | | andishm0 -
First link importance in the content
Hi, have you guys an opinion on this point, mentioned by Matt Cutts in 2010 : Matt made a point to mention that users are more likely to click on the first link in an article as opposed to a link at the bottom of the article. He said put your most important links at the top of the article. I believe it was Matt hinting to SEOs about this. http://searchengineland.com/key-takeaways-from-googles-matt-cutts-talk-at-pubcon-55457 I've asked this in private and Michael Cottam told me he read a study a year ago that indicated that the link juice passed to other pages diminished the further down the page you go. But he can't find it anymore ! Do you remember this study and have the link ? What is your opinion on Matt's point ?
Intermediate & Advanced SEO | | baptisteplace0