How to tell if PDF content is being indexed?
-
I've searched extensively for this, but could not find a definitive answer.
We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines.
When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed?
I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct?
Since WordPress has you upload PDFs like they are an image could this be causing the problem?
Would it make sense to take the time and extract all of the PDF content to html?
Thanks for any assistance, this has been driving me crazy.
-
Kyle,
Thanks for the quick response. The data is being displayed in the title and meta description field. I also did some searches for specific terms with my parameter search from our site and filetype:pdf, which shows that the content is being indexed. It also shows that the PDF titles and meta descriptions are not optimized, so I have some work there.
Thanks,
Anthony
-
Is the data being displayed in the title and meta description in the SERP content from the PDF?
If so, then yes, they are being indexed/crawled.
Regards,
Kyle
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Indexing Issue
Hi, We have moved one of our domain https://www.mycity4kids.com/ in angular js and after that, i observed the major drop in the number of indexed pages. I crosschecked the coding and other important parameters but didn't find any major issue. What could be the reason behind the drop?
Technical SEO | | ResultFirst0 -
Dulpicate Content being reported
Hi I have a new client whose first MA crawl report is showing lots of duplicate content. The main batch of these are all the HP url with an 'attachment' part at the end such as: www.domain.com/?attachment_id=4176 As far as i can tell its some sort of slide show just showing a different image in the main frame of each page, with no other content. Each one does have a unique meta title & H1 though. Whats the best thing to do here ? Not a problem and leave as is Use the paremeter handling tool in GWT Canonicalise, referencing the HP or other solution ? Many Thanks Dan
Technical SEO | | Dan-Lawrence0 -
Duplicate content issue
Moz crawl diagnostic tool is giving me a heap of duplicate content for each event on my website... http://www.ticketarena.co.uk/events/Mint-Festival-7/ http://www.ticketarena.co.uk/events/Mint-Festival-7/index.html Should i use a 301 redirect on the second link? i was unaware that this was classed as duplicate content. I thought it was just the way the CMS system was set up? Can anyone shed any light on this please. Thanks
Technical SEO | | Alexogilvie0 -
Content too buried in source code?
Our team is working on a refresh/redesign and am wondering if there's a quantifiable way of determining how high our meta data, H1 and paragraph should be in the source code. Or even whether I should be concerned with that. Our navigation will likely have dozens of links (we're going to keep it to under 100), and this doesn't even factor in the design elements. I am concerned about the content being buried. Are these the kind of concerns I should be having? Is there a measurable way to avoid it?
Technical SEO | | SSFCU0 -
Container Page/Content Page Duplicate Content
My client has a container page on their website, they are using SiteFinity, so it is called a "group page", in which individual pages appear and can be scrolled through. When link are followed, they first lead to the group page URL, in which the first content page is shown. However, when navigating through the content pages, the URL changes. When navigating BACK to the first content page, the URL is that for the content page, but it appears to indexers as a duplicate of the group page, that is, the URL that appeared when first linking to the group page. The client updates this on the regular, so I need to find a solution that will allow them to add more pages, the new one always becoming the top page, without requiring extra coding. For instance, I had considered integrating REL=NEXT and REL=PREV, but they aren't going to keep that up to date.
Technical SEO | | SpokeHQ1 -
Duplicate Content
SEOmoz is reporting duplicate content for 2000 of my pages. For example, these are reported as duplicate content: http://curatorseye.com/Name=“Holster-Atlas”---Used-by-British-Officers-in-the-Revolution&Item=4158
Technical SEO | | jplill
http://curatorseye.com/Name=âHolster-Atlasâ---Used-by-British-Officers-in-the-Revolution&Item=4158 The actual link on the site is http://www.curatorseye.com/Name=“Holster-Atlas”---Used-by-British-Officers-in-the-Revolution&Item=4158 Any insight on how to fix this? I'm not sure where the second version of the URL is coming from. Thanks,
Janet0 -
Getting Posts Indexed
On a Wordpress site I'm working on you can get to any product from home in 2 clicks but I'm a llittle concerned about the URL which looks like this: domain/categoryname/subcategoryname/productpage Will I have trouble getting my products indexed?
Technical SEO | | waynekolenchuk0 -
I have 2 websites with the same content
Hello everyone, this is my first post here on SEOmoz and I have a questions that I cannot seem to figure out. So here is my scenario: I have 2 websites that are identical. The only difference between the 2 websites is the domain name. This was done a while back for marketing purposes, however, I am no longer needing my 2nd website. What is the best way to get rid of this second website? I still have about 1 paying customer a day convert on this 2nd website and I do not want to loose them, however, I know that I am getting penalized by the search engines because of this duplicate content. Please let me know the best way of going about this. PS: I have read about 301 redirects, canonicalizing URLs, and other methods but do not know which one to choose. Any help is greatly appreciated!
Technical SEO | | threebiz0