Thinking about not indexing PDFs on a product page
-
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.
So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content
On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.
Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks
Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
-
Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea
-
If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.
PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.
So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.
-
I don't think see my PDFs show up for a search term when my HTML pages are being displayed.
However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.
In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.
I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.
Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown
-
Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?
On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.
I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.
-
Thanks for the replies
Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.
Result:
site:static.domain.com search term
Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.
So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?
-
The way I see it, I'm reducing duplicate content.
Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.
the PDF appearing when a HTML page doesn't, is another way of gaining real estate
Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.
-Andy
-
Morning,
To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.
I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!
The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.
http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/
Another option is the rel canonical tags?
Hope this helps!
-
"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.
So let me ask you - why do you not want them indexed?
I say let them be indexed.
Cole
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Page with metatag noindex is STILL being indexed?!
Hi Mozers, There are over 200 pages from our site that have a meta tag "noindex" but are STILL being indexed. What else can I do to remove them from the Index?
Intermediate & Advanced SEO | | yaelslater0 -
301 Redirect to Home Page or Sub-Page?
What do you think about 301 redirect of good expired domain to a sub-page instead of the home page? I'm doing this so I don't hurt my brand name. Let me know your thoughts please. Thank you
Intermediate & Advanced SEO | | JuanWork0 -
Product Page rankings - How to boost?
Hi folks I am responsible for an e-commerce website. Our website is doing very well but I believe that our product pages should be ranking more highly than they currently are. When taking over my current role, it became clear that a number of changes would need to be made to try and boost the under performing product pages. Amongst other things I therefore implemented the following: New Product content - we have placed a massive focus on reworking all product content so that it is unique and offers value to the reader. The new content includes videos, images and text that is all keyword rich but (I hope) not seen as overly spammy. Duplicate content - the CMS was creating multiple versions of the same page - I addressed this by implementing 301 redirects and adding canonical links. This ensures there is now only 1 version of the page Parameters - I instructed Google to not index certain URLs containing specific parameters Internal links - I have tried to improve the number of links to the products from relevant key category pages My question is, although some of the changes have only been in place for a month, what else can I do to ensure that the product pages rank as highly as possible. As an e-commerce website with so many products it is very difficult to link to these product pages directly, so any tips or suggestions would be welcome! Here's an example of a product page link : http://www.directheatingsupplies.co.uk/pid_37440/100180/Worcester-Greenstar-29CDi-Classic-Gas-Combi-Boiler-7738100216-29-Cdi.aspx
Intermediate & Advanced SEO | | DHS_SH0 -
Incorrect cached page indexing in Google while correct page indexes intermittently
Hi, we are a South African insurance company. We have a page http://www.miway.co.za/midrivestyle which has a 301 redirect to http://www.miway.co.za/car-insurance. Problem is that the former page is ranking in the index rather than the latter. The latter page does index occasionally in the same position, but rarely. This is primarily for search phrases like "car insurance" and "car insurance quotes". The ranking was knocked down the index with Penquin 2.0. It was not ranking at all but we have managed to recover to 12/13. This abnormally has only been occurring since the recovery. The correct page does index for other search terms like "insurance for car". Your help would be appreciated, thanks!
Intermediate & Advanced SEO | | miway0 -
Ecommerce SEO - Indexed product pages are returning 404's due to product database removal. HELP!
Hi all, I recently took over an e-commerce start-up project from one of my co-workers (who left the job last week). This previous project manager had uploaded ~2000 products without setting up a robot.txt file, and as a result, all of the product pages were indexed by Google (verified via Google Webmaster Tool). The problem came about when he deleted the entire product database from our hosting service, godaddy and performed a fresh install of Prestashop on our hosting plan. All of the created product pages are now gone, and I'm left with ~2000 broken URL's returning 404's. Currently, the site does not have any products uploaded. From my knowledge, I have to either: canonicalize the broken URL's to the new corresponding product pages, or request Google to remove the broken URL's (I believe this is only a temporary solution, for Google honors URL removal request for 90 days) What is the best way to approach this situation? If I setup a canonicalization, would I have to recreate the deleted pages (to match the URL address) and have those pages redirect to the new product pages (canonicalization)? Alex
Intermediate & Advanced SEO | | byoung860 -
Why the archive sub pages are still indexed by Google?
Why the archive sub pages are still indexed by Google? I am using the WordPress SEO by Yoast, and selected the needed option to get these pages no-index in order to avoid the duplicate content.
Intermediate & Advanced SEO | | MichaelNewman1 -
Indexing specified entry pages
Hi,We are currently working on location based info.Basically, when someone searches from Florida they will get specific Florida results and when they search from California they will specific California results.How does this location based info affect crawling and indexing?Lets say we have location info for googlebot, sometimes they crawl from a New York ip address, sometimes they do it from Texas and sometimes from California. In this case google will index 3 different pages with 3 different prices and a bit different text, and I'm afraid they might see these as some kind of cloaking or suspicious movement because we serve different versions of the page. What's the best way to handle this?
Intermediate & Advanced SEO | | SEODinosaur0 -
Scrolling Text Old School SEO and hidden index page
We have taken over a site and now find our self looking at the homepage of the site which has hidden scrolling text. A old school way of adding text without leaving loads of paragraphs. I have also removed all links to the index.htm page but somewhere visitors are still coming to this page in there droves. I am considering using a canonical url code but I would rather nip it in the bud. Would love some feedback from some other experts here is the site - http://www.radiatorcentre.com You never stop learning in seo and maybe we can all learn from this example. Thanks
Intermediate & Advanced SEO | | onlinemediadirect0