Thinking about not indexing PDFs on a product page
-
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.
So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content
On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.
Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks
Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
-
Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea
-
If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.
PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.
So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.
-
I don't think see my PDFs show up for a search term when my HTML pages are being displayed.
However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.
In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.
I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.
Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown
-
Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?
On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.
I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.
-
Thanks for the replies
Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.
Result:
site:static.domain.com search term
Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.
So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?
-
The way I see it, I'm reducing duplicate content.
Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.
the PDF appearing when a HTML page doesn't, is another way of gaining real estate
Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.
-Andy
-
Morning,
To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.
I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!
The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.
http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/
Another option is the rel canonical tags?
Hope this helps!
-
"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.
So let me ask you - why do you not want them indexed?
I say let them be indexed.
Cole
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Webmaster Tools Not Indexing New Pages
Hi there Mozzers, Running into a small issue. After a homepage redesign (from a list of blog posts to a product page), it seems that blog posts are buried on the http://OrangeOctop.us/ site. The latest write-up on "how to beat real madrid in FIFA 15", http://orangeoctop.us/against-real-madrid-fifa-15/ , has yet to be indexed. It would normally take about a day naturally for pages to be indexed or instantly with a manual submission. I have gone into webmaster tools and manually submitted the page for crawls multiple times on multiple devices. Still not showing up in the search results. Can anybody advise?
Intermediate & Advanced SEO | | orangeoctop.us0 -
Why does our business directions page rank above business profile page
Hi All, We are having an issue at the moment where our business direction page is ranking above the main business profile page. Our website is zodio.com, similar to Yelp but for South East Asia. An example of each page is below: Business Profile Page - http://www.zodio.com/business/detail/126037914/chowking Business Directions - http://www.zodio.com/business/direction/126037914 On many of our long tail searches for particular businesses, the business directions rank above the business details. Does anyone have any idea of why this would happen? I have researched Yelp and they do not have this issue. A few search examples in Google are as follows (one is in Thai): agonos dental clinic เวิลด์ชาร์มมิ่ง kawanku elektrik I have been rattling my brain and search for answers but cannot find anything. The communities help would be much appreciated. Many Thanks, Neil W
Intermediate & Advanced SEO | | zodiothailand0 -
2 pages lost page rank and not showing any backlinks in google
Hi we have a business/service related website, 2 of our main pages lost their page rank from 3 to 0 and are not showing any backlinks in google. What could be the possible reason. Please guide me.
Intermediate & Advanced SEO | | Tech_Ahead0 -
Page indexed but not showing up at all in search results
I am currently working on the SEO for a roofing company. I have developed GEO targeted pages for both commercial and residential roofing (as well as attic insulation and gutters) and have hundreds of 1st page placements for the GEO targeted keywords. What is baffling me is that they are performing EXTREMELY poorly on the bigger cities, to the point of not evening showing up in the first 5 pages. I also target a page specifically for roof repair in Phoenix and it is not coming up AT ALL. This is not typically the results I get when directly targeting keywords. I'm working on implementing keyword variations as well as adding about 10 or so information pages (@ 700 words) regarding different roofing systems which I plan to cross link on the site, etc. I'm just wondering if there is a simple answer as to why the pages I want to be showing up the most are performing so poorly and what I would need to do to improve their rankings.
Intermediate & Advanced SEO | | dogstarweb0 -
Best Strategy to display 8mg Images on Product Pages for Ecommerce
I have an ecommerce store that has a variety of images including some super high quality images that are 8 mg. This style of image could be completed for hundreds of products in the store. Does anyone have any tips on what I should be watching out for here? Is 8 mg too unusable?
Intermediate & Advanced SEO | | LukeyJamo0 -
Why Does Ebay Allow Internal Search Result Pages to be Indexed?
Click this Google query: https://www.google.com/search?q=les+paul+studio Notice how Google has a rich snippet for Ebay saying that it has 229 results for Ebay's internal search result page: http://screencast.com/t/SLpopIvhl69z Notice how Sam Ash's internal search result page also ranks on page 1 of Google. I've always followed the best practice of setting internal search result pages to "noindex." Previously, our company's many Magento eCommerce stores had the internal search result pages set to be "index," and Google indexed over 20,000 internal search result URLs for every single site. I advised that we change these to "noindex," and impressions from Search Queries (reported in Google Webmaster Tools) shot up on 7/24 with the Panda update on that date. Traffic didn't necessarily shoot up...but it appeared that Google liked that we got rid of all this thin/duplicate content and ranked us more (deeper than page 1, however). Even Dr. Pete advises no-indexing internal search results here: http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world So, why is Google rewarding Ebay and Sam Ash with page 1 rankings for their internal search result pages? Is it their domain authority that lets them get away with it? Could it be that noindexing internal search result pages is NOT best practice? Is the game different for eCommerce sites? Very curious what my fellow professionals think. Thanks,
Intermediate & Advanced SEO | | M_D_Golden_Peak
Dan0 -
To index or not to index search pages - (Panda related)
Hi Mozzers I have a WordPress site with Relevanssi the search engine plugin, free version. Questions: Should I let Google index my site's SERPS? I am scared the page quality is to thin, and then Panda bear will get angry. This plugin (or my previous search engine plugin) created many of these "no-results" uris: /?s=no-results%3Ano-results%3Ano-results%3Ano-results%3Ano-results%3Ano-results%3Ano-results%3Akids+wall&cat=no-results&pg=6 I have added a robots.txt rule to disallow these pages and did a GWT URL removal request. But links to these pages are still being displayed in Google's SERPS under "repeat the search with the omitted results included" results. So will this affect me negatively or are these results harmless? What exactly is an omitted result? As I understand it is that Google found a link to a page they but can't display it because I block GoogleBot. Thanx in advance guys.
Intermediate & Advanced SEO | | ClassifiedsKing0 -
Wrong Page Indexing in SERPS - Suggestions?
Hey Moz'ers! I have a quick question. Our company (Savvy Panda) is working on ranking for the keyword: "Milwaukee SEO". On our website, we have a page for "Milwaukee SEO" in our services section that's optimized for the keyword and we've been doing link building to this. However, when you search for "Milwaukee SEO" a different page is being displayed in the SERP's. The page that's showing up in the SERP's is a category view of our blog of articles with the tag "Milwaukee SEO". **Is there a way to alert google that the page showing up in the SERP's is not the most relevant and request a new URL to be indexed for that spot? ** I saw a webinar awhile back that showed something like that using google webmaster sitelinks denote tool. I would hate to denote that URL and then loose any kind of indexing for the keyword.
Intermediate & Advanced SEO | | SavvyPanda
Ideas, suggestions?0