Can Google index the text content in a PDF?
-
I really really thought the answer was always no. There's plenty of other things you can do to improve search visibility for a PDF, but I thought the nature of the file type made the content itself not-parsable by search engine crawlers...
But now, my client's competitor is ranking for my client's brand name with a PDF that contains comparison content.
Thing is, my client's brand isn't in the title, the alt-text, the url... it's only in the actual text of the PDF.
Did I miss a major update? Did I always have this wrong?
-
Yes they can crawl and index also the contents of PDF's and they are doing that extensively. Its nothing new actually. As long as the contents of the PDF is not only images but also text they will be able to scan the actual text.
Interesting article with tips to make your PDF's SEO-friendly: https://www.searchenginejournal.com/pdf-seo-best-practices/59975/
Cheers,
Cesare
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Google index internal anchors as separate pages?
Hi, Back in September, I added a function that sets an anchor on each subheading (h[2-6]) and creates a Table of content that links to each of those anchors. These anchors did show up in the SERPs as JumpTo Links. Fine. Back then I also changed the canonicals to a slightly different structur and meanwhile there was some massive increase in the number of indexed pages - WAY over the top - which has since been fixed by removing (410) a complete section of the site. However ... there are still ~34.000 pages indexed to what really are more like 4.000 plus (all properly canonicalised). Naturally I am wondering, what google thinks it is indexing. The number is just way of and quite inexplainable. So I was wondering: Does Google save JumpTo links as unique pages? Also, does anybody know any method of actually getting all the pages in the google index? (Not actually existing sites via Screaming Frog etc, but actual pages in the index - all methods I found sadly do not work.) Finally: Does somebody have any other explanation for the incongruency in indexed vs. actual pages? Thanks for your replies! Nico
Technical SEO | | netzkern_AG0 -
Content relaunch without content duplication
We write great Content for blog and websites (or at least we try), especially blogs. Sometimes few of them may NOT get good responses/reach. It could be the content which is not interesting, or the title, or bad timing or even the language used. My question for the discussion is, what will you do if you find the content worth audience's attention missed it during its original launch. Is that fine to make the text and context better and relaunch it ? For example: 1. Rechristening the blog - Change Title to make it attractive
Technical SEO | | macronimous
2. Add images
3. Check spelling
4. Do necessary rewrite, spell check
5. Change the timeline by adding more recent statistics, references to recent writeups (external and internal blogs for example), change anything that seems outdated Also, change title and set rel=cannoical / 301 permanent URLs. Will the above make the blog new? Any ideas and tips to do? Basically we like to refurbish (:-)) content that didn't succeed in the past and relaunch it to try again. If we do so will there be any issues with Google bots? (I hope redirection would solve this, But still I want to make sure) Thanks,0 -
Does Google distinguish between core content and accessory, 3rd party widgets when considering how slow or fast a site is?
Our site's Facebook Plugin is really slowing page speed down. As far as users are concerned, the page loads fast enough and they can already start interacting with the page before the last sidebar widget has loaded. But the FB widget is really slow to load and is dragging the performance down in Google Analytics Page Speed for example. Any thoughts on whether this should be an SEO concern, and whether Google differentiates between different elements of the page when deciding whether a page is a bad user experience? Thanks!
Technical SEO | | etruvian0 -
Can a 307 Redirect Pass on a Manual Google Link Penalty?
Hi, I am using a 307 redirect to redirect traffic from an old site which has a google manual link penalty against it to a brand new site. My understanding is that 307 will not pass on link juice which is okay as I'm starting fresh with the new site, but I would hate to risk having the penalty from the old site being passed onto the new site. I am using a 307 in lieu of have a "Click Here to be directed to new site" page.. Thanks in advance.
Technical SEO | | Robdob20130 -
How do I get google to index the right pages with the right key word?
Hello I notice that even though I have a site map google is indexing the wrong pages under the wrong key words. As a result its not as relevant and is not ranking properly.
Technical SEO | | ursalesguru0 -
Http VS https and google crawl and indexing ?
Is it true that https pages are not crawled and indexed by Google and other search engines as well as http pages?
Technical SEO | | sherohass0 -
Syndication partner ranking in Google News for our content
Our blog is part of Google News and is syndicated for use by several of our partners such as Chicago Tribune. Lately, we see the syndicator version of the post appearing in Google News instead of our original version. Ours generally ranks in the regular index. ChiTrib does have canonical URL tags and syndication-source tags pointing to our original. They are meta tags, not link tags. We do have a News-specific sitemap that is being reported in WMT as error-free. However, it shows no urls indexed in the News module -- even when I can find those specific URLs (our version) in the News. For an example: Here is a ChiTrib post currently ranking in Google News
Technical SEO | | CarsProduction
http://www.chicagotribune.com/classified/automotive/sns-school-carpool-lanes-are-a-danger-zone-20120301,0,3514283.story The original version is here:
http://blogs.cars.com/kickingtires/2012/03/school-carpool-lanes-are-a-danger-zone.html The News sitemap URL is
http://blogs.cars.com/kickingtires/kickingtires_newsmap.xml One of our front-end producers is speculating that the Facebook sharing code on ChiTrib is having an effect. Given that FB is FB and Google is Google, that sounds wrong to me when we're talking about specifically Google News. Any suggestions? Thanks.0 -
Forget Duplicate Content, What to do With Very Similar Content?
All, I operate a Wordpress blog site that focuses on one specific area of the law. Our contributors are attorneys from across the country who write about our niche topic. I've done away with syndicated posts, but we still have numerous articles addressing many of the same issues/topics. In some cases 15 posts might address the same issue. The content isn't duplicate but it is very similar, outlining the same rules of law etc. I've had an SEO I trust tell me I should 301 some of the similar posts to one authoritative post on the subject. Is this a good idea? Would I be better served implementing canonical tags pointing to the "best of breed" on each subject? Or would I be better off being grateful that I receive original content on my niche topic and not doing anything? Would really appreciate some feedback. John
Technical SEO | | JSOC0