Thinking about not indexing PDFs on a product page
-
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.
So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content
On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.
Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks
Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
-
Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea
-
If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.
PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.
So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.
-
I don't think see my PDFs show up for a search term when my HTML pages are being displayed.
However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.
In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.
I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.
Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown
-
Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?
On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.
I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.
-
Thanks for the replies
Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.
Result:
site:static.domain.com search term
Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.
So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?
-
The way I see it, I'm reducing duplicate content.
Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.
the PDF appearing when a HTML page doesn't, is another way of gaining real estate
Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.
-Andy
-
Morning,
To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.
I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!
The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.
http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/
Another option is the rel canonical tags?
Hope this helps!
-
"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.
So let me ask you - why do you not want them indexed?
I say let them be indexed.
Cole
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
No Index No follow instead of Rel canoncical on product pages
Hi all, we handle our product pages no with rel canonical now, we have 1 url that is indexed http://www.prams.net/cam-combi-family the other colours have different urls like http://www.prams.net/cam-combi-family-3-in-1-pram-reversible-seat-car-seat-grey-d which canonicalize to the indexed page. Google still crawls all those pages. For crawl budget reasons we want to use "no index, no follow" instead on these pages (the pages for the other colours)? Google would then crawl fewer pages more often? Does this make sense? Are their any downsides doing it? Thanks in advance Dieter
Intermediate & Advanced SEO | | Storesco1 -
301 migration - Indexed Pages rising on old site
Hello, We did a 301 redirect from site a to site b back in March. I would check on a daily basis on the index count using query "site:sitename" The past couple of days, the old domain (that was 301 redirected) indexed pages has been rising which is really concerning. We did a 301 redirect back in march 2016, and the indexed count went from 400k pages down to 78k. However, the past 3 days it went from 78k to 89,500. And I'm worried that the number is going to continue to rise. My question - What would you do to investigate / how to investigate this issue? Would it be screaming frog and look at redirects? Or is this a unique scenario that I'd have to do other steps/procedures?
Intermediate & Advanced SEO | | ggpaul5620 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
My blog is indexing only the archive and category pages
Hi there MOZ community. I am new to the QandA and have a question. I have a blog Its been live for months - but I can not get the posts to rank in the serps. Oddly only the categories rank. The posts are crawled it seems - but seen as less important for a reason I don't understand. Can anyone here help with this? See here for what i mean. I have had several wp sites rank well in the serps - and the posts do much better. Than the categories or archives - super odd. Thanks to all for help!
Intermediate & Advanced SEO | | walletapp0 -
Is it a bad idea to use our meta description as a short description of a product on that product page?
Does this count as duplicating content even though the meta description has no effect on search results?
Intermediate & Advanced SEO | | USAMM0 -
What to do when you buy a Website without it's content which has a few thousand pages indexed?
I am currently considering buying a Website because I would like to use the domain name to build my project on. Currently that domain is in use and that site has a few thousand pages indexed and around 30 Root domains linking to it (mostly to the home page). The topic of the site is not related to what I am planing to use it for. If there is no other way, I can live with losing the link juice that the site is getting at the moment, however, I want to prevent Google from thinking that I am trying to use the power for another, non related topic and therefore run the risk of getting penalized. Are there any Google guidelines or best practices for such a case?
Intermediate & Advanced SEO | | MikeAir0 -
How do you de-index and prevent indexation of a whole domain?
I have parts of an online portal displaying in SERPs which it definitely shouldn't be. It's due to thoughtless developers but I need to have the whole portal's domain de-indexed and prevented from future indexing. I'm not too tech savvy but how is this achieved? No index? Robots? thanks
Intermediate & Advanced SEO | | Martin_S0 -
Not sure why Home page is outranked by less optimized internal pages.
We launched our website just three weeks ago, and one of our primary keyword phrases is "e-business consultants". Here's what I don't get. Our home page is the page most optimized around this search phrase. Using SEOmoz On-Page Optimization tool, the home page scores an "A". And yet it doesn't rank in the top 50 on Google Canada, although two other INTERNAL pages - www.ebusinessconsultants.ca/about/consulting-team/ & /www.ebusinessconsultants.ca/about/consulting-approach/ - rank 5 & 6 on Google Canada, even though they only score a grade "C" for on-page optimization for this keyword phrase. I've always understood that the home page is the most powerful page. Why are these others outranking it? I checked the crawl and Google Webmaster, and there is no obvious problem on the home page. Is this because the site is so new? It goes against all previous experience I've had in similar situation. Any guidance/ insight would be highly appreciated!!
Intermediate & Advanced SEO | | axelk0