Can Google index the text content in a PDF?
-
I really really thought the answer was always no. There's plenty of other things you can do to improve search visibility for a PDF, but I thought the nature of the file type made the content itself not-parsable by search engine crawlers...
But now, my client's competitor is ranking for my client's brand name with a PDF that contains comparison content.
Thing is, my client's brand isn't in the title, the alt-text, the url... it's only in the actual text of the PDF.
Did I miss a major update? Did I always have this wrong?
-
Yes they can crawl and index also the contents of PDF's and they are doing that extensively. Its nothing new actually. As long as the contents of the PDF is not only images but also text they will be able to scan the actual text.
Interesting article with tips to make your PDF's SEO-friendly: https://www.searchenginejournal.com/pdf-seo-best-practices/59975/
Cheers,
Cesare
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How does Google treat Content hidden in click-to-expand tabs?
Hi Peeps I'm working a web build project and having some debates going on with our UX and SEO department regards hidden content in click-to-expand tabs. The UX team is suggesting using these tabs is a legitimate method of making large amounts of copy more easily digestible to readers. The tabs are for FAQs ( hopefully, you can view the wireframe URL ) and the SEO team are concerned that the content in these tabs contains some core keyword phrases which may not be indexed. I am the project lead on this and honestly can't claim to be an expert on either discipline so any advice would be very welcome. Can search engines index content hidden in these tabs? Thank you in advance for any advice shared. Nicky 213985904
Technical SEO | | nickspiteri0 -
Google Indexing Pages with Made Up URL
Hi all, Google is indexing a URL on my site that doesn't exist, and never existed in the past. The URL is completely made up. Anyone know why this is happening and more importantly how to get rid of it. Thanks 🙂
Technical SEO | | brian-madden0 -
Will Google still ignore the second instance of anchor text on a page if it has an H2 tag on it?
We have a page set up that has anchor text with header tags. There is an instance where the same anchor text is on the page twice linking to the same page, and I know that Google will ignore the second instance. But in the second instance it also had an H2 tag (which I removed and put it on the first instance of anchor text even though it's smaller). Is this good practice?
Technical SEO | | AliMac260 -
Google is not indexing my new URL structure. Why not?
Hi all, We launched a new website for a customer on April 29th. That same day we resubmitted the new sitemap & asked Google to fetch the new website. Screenshot is attached of this (GWT Indexed). However, when I look at Google Index (see attachment - Google Index), Automated Production's old website URL's still appear. It's been two weeks. Is it normal for Google's index to take this long to update? Thanks for your help. Cole VoLPjhy vfxVUsO
Technical SEO | | ColeLusby0 -
How GOOGLE can re-index my site as possible as?
I have facing the question about re-indexing in the google search engine, the case is: i have changed my site meta description but google indexed display part description why?? my site is http://www.green-lotus-trekking.com/everest-base-camp-trek/ whats the problem in meta tag description? Please let me know about this?
Technical SEO | | agsln0 -
Can Page Content & Description Have Same Content?
I'm studying my crawl report and there are several warnings regarding missing meta descriptions. My website is built in WordPress and part of the site is a blog. Several of these missing description warnings are regarding blog posts and I was wondering if I am able to copy the first few lines of content of each of the posts to put in the meta description, or would that be considered duplicate content? Also, there are a few warnings that relate to blog index pages, e.g. http://www.iainmoran.com/2013/02/ - I don't know if I can even add a description of these as I think they are dynamically created? While on the subject of duplicate content, if I had a sidebar with information on several of the pages (same info) while the content would be coming from a WP Widget, would this still be considered duplicate content and would Google penalise me for it? Would really appreciate some thoughts on this,please. Thanks, Iain.
Technical SEO | | iainmoran0 -
I am trying to figure out why a website is not getting fully indexed by google. Any ideas?
I am trying to figure out why a website is not getting fully indexed by google. The website was built with Godaddy's website designer so maybe this is the problem. Originally, the internal links throughout the navigation were linked to “pages” within the site. I went in and changed all of these navigation links to point to the actual url links throughout the site instead of relative links pointing to pages on the server. I thought this would have solved the problem because I thought that perhaps google was not able to follow the original relative links. When I check to see how many pages are in the google index I still see the same #. What is going on? Should this website be rebuilt using more search engine friendly code like wordpress? Is there a simple fix that will enable google to find all of this content created by Godaddy design software? I appreciate any help offered. Here is the site- http://www.securehomeusa.com/
Technical SEO | | ULTRASEM0 -
Google indexing page with description
Hello, We rank fairly high for a lot of terms but Google is not indexing our descriptions properly. An example is with "arnold schwarzenegger net worth". http://www.google.ca/search?q=arnold+schwarzenegger+net+worth&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a When we add content, we throw up a placeholder page first. The content gets added with no body content and the page only contains the net worth amount of the celebrity. We then go back through and re-add the descriptions and profile bio shortly after. Will that affect how the pages are getting indexed and is there a way we can get Google to go back to the page and try to index the description so it doesn't just appear as a straight link? Thanks, Alex
Technical SEO | | Anti-Alex0