Thinking about not indexing PDFs on a product page

Bio-RadAbs

Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.

So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content

On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.

Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks

Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.

Bio-RadAbs

Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea

EGOL

If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.

PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.

So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.

Bio-RadAbs

I don't think see my PDFs show up for a search term when my HTML pages are being displayed.

However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.

In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.

I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.

Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown

Bio-RadAbs

Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?

On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.

I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.

Bio-RadAbs

Thanks for the replies

Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.

Result:

site:static.domain.com search term

Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.

So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?

Andy.Drinkwater

The way I see it, I'm reducing duplicate content.

Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.

the PDF appearing when a HTML page doesn't, is another way of gaining real estate

Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.

-Andy

HashtagHustler

Morning,

To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.

I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!

The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.

http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/

Another option is the rel canonical tags?

Hope this helps!

ColeLusby

"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.

So let me ask you - why do you not want them indexed?

I say let them be indexed.

Cole

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Thinking about not indexing PDFs on a product page

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

How to check if the page is indexable for SEs?

Best practice to prevent pages from being indexed?

Removing pages from index

Huge e-commerce site migration - what to do with product pages?

HTTPS pages - To meta no-index or not to meta no-index?

Effect of Removing Footer Links In all Pages Except Home Page

How to associate content on one page to another page

Amount of pages indexed for classified (number of pages for the same query)