How to tell if PDF content is being indexed?
-
I've searched extensively for this, but could not find a definitive answer.
We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines.
When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed?
I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct?
Since WordPress has you upload PDFs like they are an image could this be causing the problem?
Would it make sense to take the time and extract all of the PDF content to html?
Thanks for any assistance, this has been driving me crazy.
-
Kyle,
Thanks for the quick response. The data is being displayed in the title and meta description field. I also did some searches for specific terms with my parameter search from our site and filetype:pdf, which shows that the content is being indexed. It also shows that the PDF titles and meta descriptions are not optimized, so I have some work there.
Thanks,
Anthony
-
Is the data being displayed in the title and meta description in the SERP content from the PDF?
If so, then yes, they are being indexed/crawled.
Regards,
Kyle
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing our old domain
We changed our primary domain from vivitecsolutions.com to vivitec.net. Google is indexing our new domain, but still has our old domain indexed too. The problem is that the old site is timing out because of the https: Thought on how to make the old indexing go away or properly forward the https?
Technical SEO | | AdsposureDev0 -
Content in Accordion doesn't rank as well as Content in Text box?
Does content rank better in a full view text layout, rather than in a clickable accordion? I read somewhere because users need to click into an accordion it may not rank as well, as it may be considered hidden on the page - is this true? accordion example: see features: https://www.workday.com/en-us/applications/student.html
Technical SEO | | DigitalCRO1 -
Why are only a few of our pages being indexed
Recently rebuilt a site for an auctioneers, however it has a problem in that none of the lots and auctions are being indexed by Google on the new site, only the pages like About, FAQ, home, contact. Checking WMT shows that Google has crawled all the pages, and I've done a "Fetch as Google" on them and it loads up fine, so there's no crawling issues that is standing out. I've set the "URL Parameters" to no effect too. Also built a sitemap with all the lots in, pushed to Google which then crawled them all (massive spike in Crawl rate for a couple days), and still just indexing a handful of pages. Any clues to look into would be greatly appreciated. https://www.wilkinsons-auctioneers.co.uk/auctions/
Technical SEO | | Blue-shark0 -
Search results indexed
Hi there, is is bad practice in seo to have search results for products indexed? For example a search result of holidays to Ibiza, with lots of deals coming up? its a search query url that would be indexed, with just an image and price per product on the page, with about 10 per page? Any advice appreciated.
Technical SEO | | pauledwards0 -
A problem with duplicate content
I'm kind of new at this. My crawl anaylsis says that I have a problem with duplicate content. I set the site up so that web sections appear in a folder with an index page as a landing page for that section. The URL would look like: www.myweb.com/section/index.php The crawl analysis says that both that URL and its root: www.myweb.com/section/ have been indexed. So I appear to have a situation where the page has been indexed twice and is a duplicate of itself. What can I do to remedy this? And, what steps should i take to get the pages re-indexed so that this type of duplication is avoided? I hope this makes sense! Any help gratefully received. Iain
Technical SEO | | iain0 -
Problem with indexing
Hello, we've changed our CMS recently, everything seems to work well, but for some reason google, and other crawlers can't see or index other pages than main. There is no restriction in robots, nor any other visible issue. Please help if you can. Website: http://www.design-glassware.com/
Technical SEO | | divan0 -
How do you measure content on a website?
I never thought of this question before. Maybe because i didn't focus myself on content but only on optimizing existing content from clients. So how do you measure the content on a specific page?
Technical SEO | | mosaicpro0 -
Is this 404 page indexed?
I have a URL that when searched for shows up in the Google index as the first result but does not have any title or description attached to it. When you click on the link it goes to a 404 page. Is it simply that Google is removing it from the index and is in some sort of transitional phase or could there be another reason.
Technical SEO | | bfinternet0