Skip to content
    Moz logo Menu open Menu close
    • Products
      • Moz Pro
      • Moz Pro Home
      • Moz Local
      • Moz Local Home
      • STAT
      • Moz API
      • Moz API Home
      • Compare SEO Products
      • Moz Data
    • Free SEO Tools
      • Domain Analysis
      • Keyword Explorer
      • Link Explorer
      • Competitive Research
      • MozBar
      • More Free SEO Tools
    • Learn SEO
      • Beginner's Guide to SEO
      • SEO Learning Center
      • Moz Academy
      • SEO Q&A
      • Webinars, Whitepapers, & Guides
    • Blog
    • Why Moz
      • Agency Solutions
      • Enterprise Solutions
      • Small Business Solutions
      • Case Studies
      • The Moz Story
      • New Releases
    • Log in
    • Log out
    • Products
      • Moz Pro

        Your all-in-one suite of SEO essentials.

      • Moz Local

        Raise your local SEO visibility with complete local SEO management.

      • STAT

        SERP tracking and analytics for enterprise SEO experts.

      • Moz API

        Power your SEO with our index of over 44 trillion links.

      • Compare SEO Products

        See which Moz SEO solution best meets your business needs.

      • Moz Data

        Power your SEO strategy & AI models with custom data solutions.

      NEW Keyword Suggestions by Topic
      Moz Pro

      NEW Keyword Suggestions by Topic

      Learn more
    • Free SEO Tools
      • Domain Analysis

        Get top competitive SEO metrics like DA, top pages and more.

      • Keyword Explorer

        Find traffic-driving keywords with our 1.25 billion+ keyword index.

      • Link Explorer

        Explore over 40 trillion links for powerful backlink data.

      • Competitive Research

        Uncover valuable insights on your organic search competitors.

      • MozBar

        See top SEO metrics for free as you browse the web.

      • More Free SEO Tools

        Explore all the free SEO tools Moz has to offer.

      What is your Brand Authority?
      Moz

      What is your Brand Authority?

      Check yours now
    • Learn SEO
      • Beginner's Guide to SEO

        The #1 most popular introduction to SEO, trusted by millions.

      • SEO Learning Center

        Broaden your knowledge with SEO resources for all skill levels.

      • On-Demand Webinars

        Learn modern SEO best practices from industry experts.

      • How-To Guides

        Step-by-step guides to search success from the authority on SEO.

      • Moz Academy

        Upskill and get certified with on-demand courses & certifications.

      • SEO Q&A

        Insights & discussions from an SEO community of 500,000+.

      Unlock flexible pricing & new endpoints
      Moz API

      Unlock flexible pricing & new endpoints

      Find your plan
    • Blog
    • Why Moz
      • Small Business Solutions

        Uncover insights to make smarter marketing decisions in less time.

      • Agency Solutions

        Earn & keep valuable clients with unparalleled data & insights.

      • Enterprise Solutions

        Gain a competitive edge in the ever-changing world of search.

      • The Moz Story

        Moz was the first & remains the most trusted SEO company.

      • Case Studies

        Explore how Moz drives ROI with a proven track record of success.

      • New Releases

        Get the scoop on the latest and greatest from Moz.

      Surface actionable competitive intel
      New Feature

      Surface actionable competitive intel

      Learn More
    • Log in
      • Moz Pro
      • Moz Local
      • Moz Local Dashboard
      • Moz API
      • Moz API Dashboard
      • Moz Academy
    • Avatar
      • Moz Home
      • Notifications
      • Account & Billing
      • Manage Users
      • Community Profile
      • My Q&A
      • My Videos
      • Log Out

    The Moz Q&A Forum

    • Forum
    • Questions
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. Home
    2. SEO Tactics
    3. Intermediate & Advanced SEO
    4. Can PDF be seen as duplicate content? If so, how to prevent it?

    Moz Q&A is closed.

    After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

    Can PDF be seen as duplicate content? If so, how to prevent it?

    Intermediate & Advanced SEO
    7
    20
    12808
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with question management privileges can see it.
    • Gestisoft-Qc
      Gestisoft-Qc Subscriber last edited by

      I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it.

      We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process.

      However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents.

      Any one has insight on how to deal with PDF provided by third parties?

      Thanks in advance.

      1 Reply Last reply Reply Quote 1
      • ilonka65
        ilonka65 last edited by

        It looks like Google is not crawling tabs anymore, therefore if your pdf's are tabbed within pages, it might not be an issue: https://www.seroundtable.com/google-hidden-tab-content-seo-19489.html

        1 Reply Last reply Reply Quote 0
        • ASriv
          ASriv Subscriber last edited by

          Sure, I understand - thanks EGOL

          1 Reply Last reply Reply Quote 0
          • EGOL
            EGOL @ASriv last edited by

            I would like to give that to you but it is on a site that I don't share in forums.  Sorry.

            1 Reply Last reply Reply Quote 0
            • ASriv
              ASriv Subscriber last edited by

              Thanks EGOL

              That would be ideal.

              For a site that has multiple authors and with it being impractical to get a developer involved every time a web page / blog post and the pdf are created, is there a single line of code that could be used to accomplish this in .htaccess?

              If so, would you be able to show me an example please?

              EGOL 1 Reply Last reply Reply Quote 0
              • EGOL
                EGOL last edited by

                I assigned rel=canonical to my PDFs using htaccess.

                Then, if anyone links to the PDFs the linkvalue gets passed to the webpage.

                1 Reply Last reply Reply Quote 0
                • ASriv
                  ASriv Subscriber last edited by

                  Hi all

                  I've been discussing the topic of making content available as both blog posts and pdf downloads today.

                  Given that there is a lot of uncertainty and complexity around this issue of potential duplication, my plan is to house all the pdfs in a folder that we block with robots.txt

                  Anyone agree / disagree with this approach?

                  1 Reply Last reply Reply Quote 0
                  • Dr-Pete
                    Dr-Pete Staff @ATMOSMarketing56 last edited by

                    Unfortunately, there's no great way to have it both ways. If you want these pages to get indexed for the links, then they're potential duplicates. If Google filters them out, the links probably won't count. Worst case, it could cause Panda-scale problems. Honestly, I suspect the link value is minimal and outweighed by the risk, but it depends quite a bit on the scope of what you're doing and the general link profile of the site.

                    1 Reply Last reply Reply Quote 0
                    • ATMOSMarketing56
                      ATMOSMarketing56 Subscriber last edited by

                      I think you can set it to public or private (logged-in only) and even put a price-tag on it if you want. So yes setting it to private would help to eliminate the dup content issue, but it would also hide the links that I'm using to link-build.

                      I would imagine that since this guide would link back to our original site that it would be no different than if someone were to copy the content from our site and link back to us with it, thus crediting us as the original source. Especially if we ensure to index it through GWMT before submitting to other platforms. Any good resources that delve into that?

                      Dr-Pete 1 Reply Last reply Reply Quote 0
                      • Dr-Pete
                        Dr-Pete Staff last edited by

                        Potentially, but I'm honestly not sure how Scrid's pages are indexed. Don't you need to log in or something to actually see the content on Scribd?

                        1 Reply Last reply Reply Quote 0
                        • ATMOSMarketing56
                          ATMOSMarketing56 Subscriber last edited by

                          What about this instance:

                          (A) I made an "ultimate guide to X" and posted it on my site as individual HTML pages for each chapter

                          (B) I made a PDF version with the exact same content that people can download directly from the site

                          (C) I uploaded the PDF to sites like Scribd.com to help distribute it further, and build links with the links that are embedded in the PDF.

                          Would those all be dup content? Is (C) recommended or not?

                          1 Reply Last reply Reply Quote 0
                          • EGOL
                            EGOL @Gestisoft-Qc last edited by

                            Thanks!. I am going to look into this.  I'll let you know if I learn anything.

                            1 Reply Last reply Reply Quote 0
                            • Dr-Pete
                              Dr-Pete Staff @Gestisoft-Qc last edited by

                              If they duplicate your main content, I think the header-level canonical may be a good way to go. For the syndication scenario, it's tough, because then you're knocking those PDFs out of the rankings, potentially, in favor of someone else's content.

                              Honestly, I've seen very few people deal with canonicalization for PDFs, and even those cases were small or obvious (like a page with the exact same content being outranked by the duplicate PDF). It's kind of uncharted territory.

                              1 Reply Last reply Reply Quote 3
                              • EGOL
                                EGOL @Gestisoft-Qc last edited by

                                Thanks for all of your input Dr. Pete. The example that you use is almost exactly what I have - hundreds of .pdfs on a fifty page site. These .pdfs rank well in the SERPs, accumulate pagerank, and pass traffic and link value back to the main site through links embedded within the .pdf. The also have natural links from other domains. I don't want to block them or nofollow them butyour suggestion of using header directive sounds pretty good.

                                1 Reply Last reply Reply Quote 0
                                • Dr-Pete
                                  Dr-Pete Staff @Gestisoft-Qc last edited by

                                  Oh, sorry - so these PDFs aren't duplicates with your own web/HTML content so much as duplicates with the same PDFs on other websites?

                                  That's more like a syndication situation. It is possible that, if enough people post these PDFs, you could run into trouble, but I've never seen that. More likely, your versions just wouldn't rank. Theoretically, you could use the header-level canonical tag cross-domain, but I've honestly never seen that tested.

                                  If you're talking about a handful of PDFs, they're a small percentage of your overall indexed content, and that content is unique, I wouldn't worry too much. If you're talking about 100s of PDFs on a 50-page website, then I'd control it. Unfortunately, at that point, you'd probably have to put the PDFs in a folder and outright block it. You'd remove the risk, but you'd stop ranking on those PDFs as well.

                                  1 Reply Last reply Reply Quote 2
                                  • EGOL
                                    EGOL @Gestisoft-Qc last edited by

                                    @EGOL: Can you expend a bit on your Author suggestion?

                                    I was wondering if there is a way to do rel=author for a pdf document.  I don't know how to do it and don't know if it is possible.

                                    1 Reply Last reply Reply Quote 0
                                    • Gestisoft-Qc
                                      Gestisoft-Qc Subscriber @Dr-Pete last edited by

                                      To make sure I understand what I'm reading:

                                      • PDFs don't usually rank as well as regular pages (although it is possible)
                                      • It is possible to configure a canonical tag on a PDF

                                      My concern isn't that our PDFs may outrank the original content but rather getting slammed by Google for publishing them.

                                      Am right in thinking a canonical tag prevents to accumulate link juice? If so I would prefer to not use it, unless it leads to Google slamming.

                                      Any one has experienced Google retribution for publishing PDF coming from a 3rd party?

                                      @EGOL: Can you expend a bit on your Author suggestion?

                                      Thanks all!

                                      EGOL Dr-Pete 5 Replies Last reply Reply Quote 0
                                      • Dr-Pete
                                        Dr-Pete Staff last edited by

                                        I think it's possible, but I've only seen it in cases that are a bit hard to disentangle. For example, I've seen a PDF outrank a duplicate piece of regular content when the regular content had other issues (including massive duplication with other, regular content). My gut feeling is that it's unusual.

                                        If you're concerned about it, you can canonicalize PDFs with the header-level canonical directive. It's a bit more technically complex than the standard HTML canonical tag:

                                        http://googlewebmastercentral.blogspot.com/2011/06/supporting-relcanonical-http-headers.html

                                        I'm going to mark this as "Discussion", just in case anyone else has seen real-world examples.

                                        Gestisoft-Qc 1 Reply Last reply Reply Quote 2
                                        • EGOL
                                          EGOL last edited by

                                          I am really interested in hearing what others have to say about this.

                                          I know that .pdfs can be very valuable content.  They can be optimized, they rank in the SERPs, they accumulate PR and they can pass linkvalue.  So, to me it would be a mistake to block them from the index...

                                          However, I see your point about dupe content... they could also be thin content.  Will panda whack you for thin and dupes in your PDFs?

                                          How can canonical be used... what about author?

                                          Anybody know anything about this?

                                          1 Reply Last reply Reply Quote 3
                                          • MargaritaS
                                            MargaritaS last edited by

                                            Just like any other piece of duplicate content, you can use canonical link elements to specify the original piece of content (if there's indeed more than one identical piece). You could also block these types of files in the robots.txt, or use noindex-follow meta tags.

                                            Regards,

                                            Margarita

                                            1 Reply Last reply Reply Quote 5
                                            • 1 / 1
                                            • First post
                                              Last post

                                            Got a burning SEO question?

                                            Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                                            Start my free trial


                                            Browse Questions

                                            Explore more categories

                                            • Moz Tools

                                              Chat with the community about the Moz tools.

                                            • SEO Tactics

                                              Discuss the SEO process with fellow marketers

                                            • Community

                                              Discuss industry events, jobs, and news!

                                            • Digital Marketing

                                              Chat about tactics outside of SEO

                                            • Research & Trends

                                              Dive into research and trends in the search industry.

                                            • Support

                                              Connect on product support and feature requests.

                                            • See all categories

                                            Related Questions

                                            • Virginia-Girtz

                                              How can I avoid duplicate content for a new landing page which is the same as an old one?

                                              Hello mozers! I have a question about duplicate content for you... One on my clients pages have been dropping in search volume for a while now, and I've discovered it's because the search term isn't as popular as it used to be. So... we need to create a new landing page using a more popular search term. The page which is losing traffic is based on the search query "Can I put a solid roof on my conservatory" this only gets 0-10 searches per month according to the keyword explorer tool. However, if we changed this to "replacing conservatory roof with solid roof" this gets up to 500 searches per month. Muuuuch better! The issue is, I don't want to close down and re-direct the old page because it's got a featured snippet and sits in position 1. So I'd like to create another page instead... however, as the two are effectively the same content, I would then land myself in a duplicate content issue. If I were to put a rel="canonical" tag in the original "can I put a solid roof...." page but say the master page is now the new one, would that get around the issue?

                                              Intermediate & Advanced SEO | | Virginia-Girtz
                                              0
                                            • MJTrevens

                                              Can I use duplicate content in different US cities without hurting SEO?

                                              So, I have major concerns with this plan. My company has hundreds of facilities located all over the country. Each facility has it's own website. We have a third party company working to build a content strategy for us. What they came up with is to create a bank of content specific to each service line. If/when any facility offers that service, they then upload the content for that service line to that facility website. So in theory, you might have 10-12 websites all in different cities, with the same content for a service. They claim "Google is smart, it knows its content all from the same company, and because it's in different local markets, it will still rank." My contention is that duplicate content is duplicate content, and unless it is "localize" it, Google is going to prioritize one page of it and the rest will get very little exposure in the rankings no matter where you are. I could be wrong, but I want to be sure we aren't shooting ourselves in the foot with this strategy, because it is a major major undertaking and too important to go off in the wrong direction. SEO Experts, your help is genuinely appreciated!

                                              Intermediate & Advanced SEO | | MJTrevens
                                              1
                                            • iQi

                                              Duplicate content on recruitment website

                                              Hi everyone, It seems that Panda 4.2 has hit some industries more than others. I just started working on a website, that has no manual action, but the organic traffic has dropped massively in the last few months. Their external linking profile seems to be fine, but I suspect usability issues, especially the duplication may be the reason. The website is a recruitment website in a specific industry only. However, they posts jobs for their clients, that can be very similar, and in the same time they can have 20 jobs with the same title and very similar job descriptions. The website currently have over 200 pages with potential duplicate content. Additionally, these jobs get posted on job portals, with the same content (Happens automatically through a feed). The questions here are: How bad would this be for the website usability, and would it be the reason the traffic went down? Is this the affect of Panda 4.2 that is still rolling What can be done to resolve these issues? Thank you in advance.

                                              Intermediate & Advanced SEO | | iQi
                                              0
                                            • team_tic

                                              International SEO - cannibalisation and duplicate content

                                              Hello all, I look after (in house) 3 domains for one niche travel business across three TLDs: .com .com.au and co.uk and a fourth domain on a co.nz TLD which was recently removed from Googles index. Symptoms: For the past 12 months we have been experiencing canibalisation in the SERPs (namely .com.au being rendered in .com) and Panda related ranking devaluations between our .com site and com.au site. Around 12 months ago the .com TLD was hit hard (80% drop in target KWs) by Panda (probably) and we began to action the below changes. Around 6 weeks ago our .com TLD saw big overnight increases in rankings (to date a 70% averaged increase). However, almost to the same percentage we saw in the .com TLD we suffered significant  drops in our .com.au rankings. Basically Google seemed to switch its attention from .com TLD to the .com.au TLD. Note: Each TLD is over 6 years old, we've never proactively gone after links (Penguin) and have always aimed for quality in an often spammy industry. **Have done: ** Adding HREF LANG markup to all pages on all domain Each TLD uses local vernacular e.g for the .com site is American Each TLD has pricing in the regional currency Each TLD has details of the respective local offices, the copy references the lacation, we have significant press coverage in each country like The Guardian for our .co.uk site and Sydney Morning Herlad for our Australia site Targeting each site to its respective market in WMT Each TLDs core-pages (within 3 clicks of the primary nav) are 100% unique We're continuing to re-write and publish unique content to each TLD on a weekly basis As the .co.nz site drove such little traffic re-wrting we added no-idex and the TLD has almost compelte dissapread (16% of pages remain) from the SERPs. XML sitemaps Google + profile for each TLD **Have not done: ** Hosted each TLD on a local server Around 600 pages per TLD are duplicated across all TLDs (roughly 50% of all content). These are way down the IA but still duplicated. Images/video sources from local servers Added address and contact details using SCHEMA markup Any help, advice or just validation on this subject would be appreciated! Kian

                                              Intermediate & Advanced SEO | | team_tic
                                              1
                                            • HiteshBharucha

                                              Duplicate content on subdomains.

                                              Hi Mozer's, I have a site www.xyz.com and also geo targeted sub domains www.uk.xyz.com, www.india.xyz.com and so on. All the sub domains have the content which is same as the content on the main domain that is www.xyz.com. So, I want to know how can i avoid content duplication. Many Thanks!

                                              Intermediate & Advanced SEO | | HiteshBharucha
                                              0
                                            • knielsen

                                              Copying my Facebook content to website considered duplicate content?

                                              I write career advice on Facebook on a daily basis. On my homepage users can see the most recent 4-5 feeds (using FB social media plugin). I am thinking to create a page on my website where visitors can see all my previous FB feeds. Would this be considered duplicate content if I copy paste the info, but if I use a Facebook social media plugin then it is not considered duplicate content? I am working on increasing content on my website and feel incorporating FB feeds would make sense. thank you

                                              Intermediate & Advanced SEO | | knielsen
                                              0
                                            • YNWA

                                              Duplicate Content on Press Release?

                                              Hi, We recently held a charity night in store. And had a few local celebs turn up etc... We created a press release to send out to various media outlets, within the press release were hyperlinks to our site and links on certain keywords to specific brands on our site. My question is, should we be sending a different press release to each outlet to stop the duplicate content thing, or is sending the same release out to everyone ok? We will be sending approx 20 of these out, some going online and some not. So far had one local paper website, a massive football website and a local magazine site. All pretty much same content and a few pics. Any help, hints or tips on how to go about this if I am going to be sending out to a load of other sites/blogs? Cheers

                                              Intermediate & Advanced SEO | | YNWA
                                              0
                                            • ContentWriterMicky

                                              How to resolve Duplicate Page Content issue for root domain & index.html?

                                              SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?

                                              Intermediate & Advanced SEO | | ContentWriterMicky
                                              0

                                            Get started with Moz Pro!

                                            Unlock the power of advanced SEO tools and data-driven insights.

                                            Start my free trial
                                            Products
                                            • Moz Pro
                                            • Moz Local
                                            • Moz API
                                            • Moz Data
                                            • STAT
                                            • Product Updates
                                            Moz Solutions
                                            • SMB Solutions
                                            • Agency Solutions
                                            • Enterprise Solutions
                                            Free SEO Tools
                                            • Domain Authority Checker
                                            • Link Explorer
                                            • Keyword Explorer
                                            • Competitive Research
                                            • Brand Authority Checker
                                            • MozBar Extension
                                            • MozCast
                                            Resources
                                            • Blog
                                            • SEO Learning Center
                                            • Help Hub
                                            • Beginner's Guide to SEO
                                            • How-to Guides
                                            • Moz Academy
                                            • API Docs
                                            About Moz
                                            • About
                                            • Team
                                            • Careers
                                            • Contact
                                            Why Moz
                                            • Case Studies
                                            • Testimonials
                                            Get Involved
                                            • Become an Affiliate
                                            • MozCon
                                            • Webinars
                                            • Practical Marketer Series
                                            • MozPod
                                            Connect with us

                                            Contact the Help team

                                            Join our newsletter
                                            Moz logo
                                            © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                            • Accessibility
                                            • Terms of Use
                                            • Privacy

                                            Looks like your connection to Moz was lost, please wait while we try to reconnect.