PDF for link building - avoiding duplicate content
-
Hello,
We've got an article that we're turning into a PDF. Both the article and the PDF will be on our site. This PDF is a good, thorough piece of content on how to choose a product.
We're going to strip out all of the links to our in the article and create this PDF so that it will be good for people to reference and even print. Then we're going to do link building through outreach since people will find the article and PDF useful.
My question is, how do I use rel="canonical" to make sure that the article and PDF aren't duplicate content?
Thanks.
-
Hey Bob
I think you should forget about any kind of perceived conventions and have whatever you think works best for your users and goals.
Again, look at unbounce, that is a custom landing page with a homepage link (to share the love) but not the general site navigation.
They also have a footer to do a bit more link love but really, do what works for you.
Forget conventions - do what works!
Hope that helps
Marcus -
I see, thanks! I think it's important not to have the ecommerce navigation on the page promoting the pdf. What would you say is ideal as far as the graphical and navigation components of the page with the PDF on it - what kind of navigation and graphical header should I have on it?
-
Yep, check the HTTP headers with webbug or there are a bunch of browser plugins that will let you see the headers for the document.
That said, I would push to drive the links to the page though rather than the document itself and just create a nice page that houses the document and make that the link target.
You could even make the PDF link only available by email once they have singed up or some such as canonical is only a directive and you would still be better getting those links flooding into a real page on the site.
You could even offer up some HTML to make this easier for folks to link to that linked to your main page. If you take a look at any savvy infographics etc folks will try to draw a link into a page rather than the image itself for the very same reasons.
If you look at something like the Noobs Guide to Online Marketing from Unbounce then you will see something like this as the suggested linking code:
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
[](<strong>http://unbounce.com/noob-guide-to-online-marketing-infographic/</strong>)
Unbounce – The DIY Landing Page Platform
So, the image is there but the link they are pimping is a standard page:
http://unbounce.com/noob-guide-to-online-marketing-infographic/
They also cheekily add an extra homepage link in as well with some keywords and the brand so if folks don't remove that they still get that benefit.
Ultimately, it means that when links flood into the site they benefit the whole site rather than just promote one PDF.
Just my tuppence!
Marcus -
Thanks for the code Marcus.
Actually, the pdf is what people will be linking to. It's a guide for websites. I think the PDF will be much easier to promote than the article.I assume so anyway.
Is there a way to make sure my canonical code in htaccess is working after I insert the code?
Thanks again,
Bob
-
Hey Bob
There is a much easier way to do this and simply have your PDFs that you don't want indexed in a folder that you block access to in robots.txt. This way you can just drop PDFs into articles and link to them knowing full well these pages will not be indexed.
Assuming you had a PDF called article.pdf in a folder called pdfs/ then the following would prevent indexation.
User-agent: * Disallow: /pdfs/
Or to just block the file itself:
User-agent: *
Disallow: /pdfs/yourfile.pdf Additionally, There is no reason not to add the canonical link as well and if you find people are linking directly to the PDF then having this would ensure that the equity associated with those links was correctly attributed to the parent page (always a good thing).Header add Link '<http: www.url.co.uk="" pdfs="" article.html="">; </http:> rel="canonical"'
Generally, there are better ways to block indexation than with robots.txt but in the case of PDFs, we really don't want these files indexed as they make for such poor landing pages (no navigation) and we certainly want to remove any competition or duplication between the page and the PDF so in this case, it makes for a quick, painless and suitable solution.
Hope that helps!
Marcus -
Thanks ThompsonPaul,
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.html
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.html="">; rel="canonical""</http:>
-
You can insert the canonical header link using your site's .htaccess file, Bob. I'm sure Hostgator provides access to the htaccess file through ftp (sometimes you have to turn on "show hidden files") or through the file manager built into your cPanel.
Check tip #2 in this recent SEOMoz blog article for specifics:
seomoz.org/blog/htaccess-file-snippets-for-seosJust remember too - you will want to do the same kind of on-page optimization for the PDF as you do for regular pages.
- Give it a good, descriptive, keyword-appropriate, dash-separated file name. (essential for usability as well, since it will become the title of the icon when saved to someone's desktop)
- Fill out the metadata for the PDF, especially the Title and Description. In Acrobat it's under File -> Properties -> Description tab (to get the meta-description itself, you'll need to click on the Additional Metadata button)
I'd be tempted to build the links to the html page as much as possible as those will directly help ranking, unlike the PDF's inbound links which will have to pass their link juice through the canonical, assuming you're using it. Plus, the visitor will get a preview of the PDF's content and context from the rest of your site which which may increase trust and engender further engagement..
Your comment about links in the PDF got kind of muddled, but you'll definitely want to make certain there are good links and calls to action back to your website within the PDF - preferably on each page. Otherwise there's no clear "next step" for users reading the PDF back to a purchase on your site. Make sure to put Analytics tracking tags on these links so you can assess the value of traffic generated back from the PDF - otherwise the traffic will just appear as Direct in your Analytics.
Hope that all helps;
Paul
-
Can I just use htaccess?
See here: http://www.seomoz.org/blog/how-to-advanced-relcanonical-http-headers
We only have one pdf like this right now and we plan to have no more than five.
Say the pdf is located at
domain.com/pdfs/white-papers.pdf
and the article that I want to rank is at
domain.com/articles/article.pdf
do I simply add this to my htaccess file?:
Header add Link "<http: www.domain.com="" articles="" article.pdf="">; rel="canonical""</http:>
-
How do I know if I can do an HTTP header request? I'm using shared hosting through hostgator.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
PDF seem to not rank as well as other normal webpages. They still rank do not get me wrong, we have over 100 pdf pages that get traffic for us. The main version is really up to you, what do you want to show in the search results. I think it would be easier to rank for a normal webpage though. If you are doing a rel="canonical" it will pass most of the link juice, not all but most.
-
Thank you DoRM,
I assume that the PDF is what I want to be the main version since that is what I'll be marketing, but I could be wrong? What if I get backlinks to both pages, will both sets of backlinks count?
-
Indicate the canonical version of a URL by responding with the
Link rel="canonical"
HTTP header. Addingrel="canonical"
to thehead
section of a page is useful for HTML content, but it can't be used for PDFs and other file types indexed by Google Web Search. In these cases you can indicate a canonical URL by responding with theLink rel="canonical"
HTTP header, like this (note that to use this option, you'll need to be able to configure your server):Link: <http: www.example.com="" downloads="" white-paper.pdf="">; rel="canonical"</http:>
Google currently supports these link header elements for Web Search only.
You can read more her http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
PDF - host, link, recreate?
I want to get as much SEO juice as possible onto my site. Our partner who is the manufacturer has about 5 pdf's per product listed on their website already. What should I do to create content and drive the most traffic to my reseller site? 1. Should I do a direct outbound link to their PDF and Google will crawl that content to boost my keywords? 2. Should I download the pdf and then upload the exact PDF onto our site? Will Google know this is not my content and copied? 3 Should I copy and paste the PDF content and paste it into our sites product page directly? 4. Should I recreate the PDF by copying most of our content and use our branding/contact details? then upload or copy and paste that content onto our site? (obviously alot more work) We have MANY products and different suppliers but want a way to be better at SEO then our manufactures. Option to any more ideas or ways to cut down on as much work as possible while driving the most traffic. Thank you!
Intermediate & Advanced SEO | | Jamesmcd030 -
Deep linking with redirects & building SEO
Hi there. I'm using deep linking with unique URL's that redirect to our website homepage or app (depending on whether the user accesses the link from an iphone or computer) as a way to track attribution and purchases. I'm wondering whether using links that redirect negatively affects our SEO? Is the homepage still building SEO rank despite the redirects? I appreciate your time & thanks for your help.
Intermediate & Advanced SEO | | L_M_SEO0 -
Case Sensitive URLs, Duplicate Content & Link Rel Canonical
I have a site where URLs are case sensitive. In some cases the lowercase URL is being indexed and in others the mixed case URL is being indexed. This is leading to duplicate content issues on the site. The site is using link rel canonical to specify a preferred URL in some cases however there is no consistency whether the URLs are lowercase or mixed case. On some pages the link rel canonical tag points to the lowercase URL, on others it points to the mixed case URL. Ideally I'd like to update all link rel canonical tags and internal links throughout the site to use the lowercase URL however I'm apprehensive! My question is as follows: If I where to specify the lowercase URL across the site in addition to updating internal links to use lowercase URLs, could this have a negative impact where the mixed case URL is the one currently indexed? Hope this makes sense! Dave
Intermediate & Advanced SEO | | allianzireland0 -
H3 Tags - Should I Link to my content Articles- ? And do I have to many H3 tags/ Links as it is ?
Hello All, On my ecommerce landing pages, I currently have links to my products as H3 Tags. I also have useful guides displayed on the page with links useful articles we have written (they currently go to my news section). I am wondering if I should put those article links as additional H3 tags as well for added seo benefit or do I have to many tags as it is ?. A link to my Landing Page I am talking about is - http://goo.gl/h838RW Screenshot of my h1-h6 tags - http://imgur.com/hLtX0n7 I enclose screenshot my guides and also of my H1-H6 tags. Any advice would be greatly appreciated. thanks Peter
Intermediate & Advanced SEO | | PeteC120 -
Research on building links to a website
Hi building a brand new site with no domain authority. I have created all the content and now want to start building links to the website. Mostly through guest posting, niche directories, broken link building and other whitehat methods. Anyway i was wondering if anyone has seen any good research on the way you should link to a brand new website or any site for that matter. Like in terms of % you should focus at the homepage, inner pages, anchor distribution, internal link structure, etc. A good start would be looking at successful competitors, but i wanted to see if anyone knows any studies on this. My goal is to build a link profile which meets the standards of Google and that lasts! Thanks, Mark
Intermediate & Advanced SEO | | Mikey0080 -
Link Building for E-Commerce
Hi, Our on page optimization, albeit for a few dupe content issues, is ok - We have good keyword rich URL's, Titles, H1's and unique product descriptions. So now I want to look at building links that will boost our DA and PA's. We have over 2000 products on the store and around 130 categories/subcategories -and I would appreciate any views on where to start - My initial view is to get backlinks from the relevant manufacturer websites to the "shop by brand" page on our site related to these manufacturers - What other strategies should I look at? Thanks, Ben
Intermediate & Advanced SEO | | bjs20100 -
Duplicate Content/ Indexing Question
I have a real estate Wordpress site that uses an IDX provider to add real estate listings to my site. A new page is created as a new property comes to market and then the page is deleted when the property is sold. I like the functionality of the service but it creates a significant amount of 404's and I'm also concerned about duplicate content because anyone else using the same service here in Las Vegas will have 1000's of the exact same property pages that I do. Any thoughts on this and is there a way that I can have the search engines only index the core 20 pages of my site and ignore future property pages? Your advice is greatly appreciated. See link for example http://www.mylvcondosales.com/mandarin-las-vegas/
Intermediate & Advanced SEO | | AnthonyLasVegas0 -
How to prevent duplicate content within this complex website?
I have a complex SEO issue I've been wrestling with and I'd appreciate your views on this very much. I have a sports website and most visitors are looking for the games that are played in the current week (I've studied this - it's true). We're creating a new website from scratch and I want to do this is as best as possible. We want to use the most elegant and best way to do this. We do not want to use work-arounds such as iframes, hiding text using AJAX etc. We need a solid solution for both users and search engines. Therefor I have written down three options: Using a canonical URL; Using 301-redirects; Using 302-redirects. Introduction The page 'website.com/competition/season/week-8' shows the soccer games that are played in game week 8 of the season. The next week users are interested in the games that are played in that week (game week 9). So the content a visitor is interested in, is constantly shifting because of the way competitions and tournaments are organized. After a season the same goes for the season of course. The website we're building has the following structure: Competition (e.g. 'premier league') Season (e.g. '2011-2012') Playweek (e.g. 'week 8') Game (e.g. 'Manchester United - Arsenal') This is the most logical structure one can think of. This is what users expect. Now we're facing the following challenge: when a user goes to http://website.com/premier-league he expects to see a) the games that are played in the current week and b) the current standings. When someone goes to http://website.com/premier-league/2011-2012/ he expects to see the same: the games that are played in the current week and the current standings. When someone goes to http://website.com/premier-league/2011-2012/week-8/ he expects to the same: the games that are played in the current week and the current standings. So essentially there's three places, within every active season within a competition, within the website where logically the same information has to be shown. To deal with this from a UX and SEO perspective, we have the following options: Option A - Use a canonical URL Using a canonical URL could solve this problem. You could use a canonical URL from the current week page and the Season page to the competition page: So: the page on 'website.com/$competition/$season/playweek-8' would have a canonical tag that points to 'website.com/$competition/' the page on 'website.com/$competition/$season/' would have a canonical tag that points to 'website.com/$competition/' The next week however, you want to have the canonical tag on 'website.com/$competition/$season/playweek-9' and the canonical tag from 'website.com/$competition/$season/playweek-8' should be removed. So then you have: the page on 'website.com/$competition/$season/playweek-9' would have a canonical tag that points to 'website.com/$competition/' the page on 'website.com/$competition/$season/' would still have a canonical tag that points to 'website.com/$competition/' In essence the canonical tag is constantly traveling through the pages. Advantages: UX: for a user this is a very neat solution. Wherever a user goes, he sees the information he expects. So that's all good. SEO: the search engines get very clear guidelines as to how the website functions and we prevent duplicate content. Disavantages: I have some concerns regarding the weekly changing canonical tag from a SEO perspective. Every week, within every competition the canonical tags are updated. How often do Search Engines update their index for canonical tags? I mean, say it takes a Search Engine a week to visit a page, crawl a page and process a canonical tag correctly, then the Search Engines will be a week behind on figuring out the actual structure of the hierarchy. On top of that: what do the changing canonical URLs to the 'quality' of the website? In theory this should be working all but I have some reservations on this. If there is a canonical tag from 'website.com/$competition/$season/week-8', what does this do to the indexation and ranking of it's subpages (the actual match pages) Option B - Using 301-redirects Using 301-redirects essentially the user and the Search Engine are treated the same. When the Season page or competition page are requested both are redirected to game week page. The same applies here as applies for the canonical URL: every week there are changes in the redirects. So in game week 8: the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-8' the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-8' A week goes by, so then you have: the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-9' the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-9' Advantages There is no loss of link authority. Disadvantages Before a playweek starts the playweek in question can be indexed. However, in the current playweek the playweek page 301-redirects to the competition page. After that week the page's 301-redirect is removed again and it's indexable. What do all the (changing) 301-redirects do to the overall quality of the website for Search Engines (and users)? Option C - Using 302-redirects Most SEO's will refrain from using 302-redirects. However, 302-redirect can be put to good use: for serving a temporary redirect. Within my website there's the content that's most important to the users (and therefor search engines) is constantly moving. In most cases after a week a different piece of the website is most interesting for a user. So let's take our example above. We're in playweek 8. If you want 'website.com/$competition/' to be redirecting to 'website.com/$competition/$season/week-8/' you can use a 302-redirect. Because the redirect is temporary The next week the 302-redirect on 'website.com/$competition/' will be adjusted. It'll be pointing to 'website.com/$competition/$season/week-9'. Advantages We're putting the 302-redirect to its actual use. The pages that 302-redirect (for instance 'website.com/$competition' and 'website.com/$competition/$season') will remain indexed. Disadvantages Not quite sure how Google will handle this, they're not very clear on how they exactly handle a 302-redirect and in which cases a 302-redirect might be useful. In most cases they advise webmasters not to use it. I'd very much like your opinion on this. Thanks in advance guys and galls!
Intermediate & Advanced SEO | | StevenvanVessum0