How to tell if PDF content is being indexed?
-
I've searched extensively for this, but could not find a definitive answer.
We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines.
When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed?
I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct?
Since WordPress has you upload PDFs like they are an image could this be causing the problem?
Would it make sense to take the time and extract all of the PDF content to html?
Thanks for any assistance, this has been driving me crazy.
-
Kyle,
Thanks for the quick response. The data is being displayed in the title and meta description field. I also did some searches for specific terms with my parameter search from our site and filetype:pdf, which shows that the content is being indexed. It also shows that the PDF titles and meta descriptions are not optimized, so I have some work there.
Thanks,
Anthony
-
Is the data being displayed in the title and meta description in the SERP content from the PDF?
If so, then yes, they are being indexed/crawled.
Regards,
Kyle
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content Brainstorming
Hi, New here in the SEO world. Excellent resources here. We have an ecommerce website that sells presentation templates. Today our templates come in 3 flavours - for PowerPoint, for Keynote and both - called Presentation Templates. So we've ended up with 3 URLS with similar content. Same screenshots, similar description.. Example: https://www.improvepresentation.com/keynote-templates/social-media-keynote-template https://www.improvepresentation.com/powerpoint-templates/social-media-powerpoint-template https://www.improvepresentation.com/presentation-templates/social-media-presentation-template I know what you're thinking. Why not make a website with a template and give 3 download options right? But what about https://www.improvepresentation.com/powerpoint-templates/ https://www.improvepresentation.com/keynote-templates/ These are powerfull URL's in my opinion taking into account that the strongest keyword in our field is "powerpoint templates" How would you solve this "problem" or maybe there is no problem at all.
Technical SEO | | slidescamp0 -
Mobile website content optimisation
Hi there, someone I know is going to put their site to a mobile version with a mobile sub domain (m.). I have recommended responsive but for now this is their only way forward to cope with the 21st April update by Google. My question is what is the best practice for content, as its a different url will there need to be a canonical tag in to stop duplication and thus being penalised from the Google panda update? Any advice much appreciated.
Technical SEO | | tdigital0 -
Duplicate content and rel canonicals?
Hi. I have a question relating to 2 sites that I manage with regards to duplicate content. These are 2 separate companies but the content is off a data base from the one(in other words the same). In terms of the rel canonical, how would we do this so that google does not penalise either site but can also have the content to crawl for both or is this just a dream?
Technical SEO | | ProsperoDigital0 -
Bing is not Indexing my site.
Hi, My website is four months old and has more than 8000 pages. Bing has indexed only 8 pages till date and Google also keeps playing hide and seek with it. There was a time when google indexed almost all the pages of my site but now there are only 5000 pages indexed. Moreover when I check my site on google (by typing site:socktail.com), it shows only 26 pages. Please let me know what should I do. If somebody wants to take a look, my website is http://socktail.com Thanks
Technical SEO | | saurabh19050 -
Avoiding Cannibalism and Duplication with content
Hi, For the example I will use a computers e-commerce store... I'm working on creating guides for the store -
Technical SEO | | BeytzNet
How to choose a laptop
How to choose a desktop I believe that each guide will be great on its own and that it answers a specific question (meaning that someone looking for a laptop will search specifically laptop info and the same goes for desktop). This is why I didn't creating a "How to choose a computer" guide. I also want each guide to have all information and not to start sending the user to secondary pages in order to fill in missing info. However, even though there are several details that are different between the laptops and desktops, like importance of weight, screen size etc., a lot of things the checklist (like deciding on how much memory is needed, graphic card, core etc.) are the same. Please advise on how to pursue it. Should I just write two guides and make sure that the same duplicated content ideas are simply written in a different way?0 -
How to know which pages are indexed by Google?
So apparently we have some sites that are just duplicates of our original main site but aiming at different markets/cities. They have completely different urls but are the same content as our main site with different market/city changed. How do I know for sure which ones are indexed. I enter the url into Google and its not there. Even if I put in " around " it. Is there another way to query google for my site? Is there a website that will tell you which ones are indexed? This is probably a dumb question.
Technical SEO | | greenhornet770 -
How to change noindex to index?
Hey, I've recently upgraded to a pro SEOmoz account and have realised i have 14574 issues to do with 'blocked by meta-robot' and that 'This page is being kept out of the search engine indexes by the meta tag , which may have a value of "noindex", keeping this page out of the index.' How can i change this so my pages get indexed? I read somewhere that i need to change my privacy settings but that thread was 3 years old and now the WP Dashboard has updated.. Please let me know Many thanks, Jamie P.s Im using WordPress 3.5 And i have the XML sitemap plugin And i have no idea where to look for this robots.txt file..
Technical SEO | | markgreggs0 -
Similar Content vs Duplicate Content
We have articles written for how to setup pop3 and imap. The topics are technically different but the settings within those are very similar and thus the inital content was similar. SEOMoz reports these pages as duplicate content. It's not optimal for our users to have them merged into one page. What is the best way to handle similar content, while not getting tagged for duplicate content?
Technical SEO | | Izoox0