Thinking about not indexing PDFs on a product page
-
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.
So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content
On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.
Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks
Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
-
Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea
-
If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.
PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.
So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.
-
I don't think see my PDFs show up for a search term when my HTML pages are being displayed.
However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.
In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.
I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.
Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown
-
Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?
On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.
I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.
-
Thanks for the replies
Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.
Result:
site:static.domain.com search term
Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.
So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?
-
The way I see it, I'm reducing duplicate content.
Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.
the PDF appearing when a HTML page doesn't, is another way of gaining real estate
Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.
-Andy
-
Morning,
To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.
I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!
The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.
http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/
Another option is the rel canonical tags?
Hope this helps!
-
"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.
So let me ask you - why do you not want them indexed?
I say let them be indexed.
Cole
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why is my home page ranking much higher than my collection page?
Hi everyone, Why is my client's home page ranking high for a certain keyword phrase rather than a collection page I have which is well optimised for this keyword? The collection page is on the 10th SERPs page. I did see there were keywords used in the footer of page and the keyword was also used in some intro text on the home page so I removed the keyword from these two places nearly 2 weeks ago and requested google to reindex both the collection page and home page and I've not seen any improvement of the collection page's ranking in SERPs. I also changed the meta description and meta title as the ctr was poor but there wasn''t that many impressions either. It is a competitive keyword organically so maybe the collection page's authority is just not good enough compared to the competitors hence why they are choosing the home page as it has higher page authority however this still is not helpful to searchers who land on home page. Does anyone have any ideas of what else I can do to get google to rank the ocllection page higher for the keyword instead of home page?
Intermediate & Advanced SEO | | TZ19820 -
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
Pages excluded from Google's index due to "different canonicalization than user"
Hi MOZ community, A few weeks ago we noticed a complete collapse in traffic on some of our pages (7 out of around 150 blog posts in question). We were able to confirm that those pages disappeared for good from Google's index at the end of January '18, they were still findable via all other major search engines. Using Google's Search Console (previously Webmastertools) we found the unindexed URLs in the list of pages being excluded because "Google chose different canonical than user". Content-wise, the page that Google falsely determines as canonical instead has little to no similarity to the pages it thereby excludes from the index. False canonicalization About our setup: We are a SPA, delivering our pages pre-rendered, each with an (empty) rel=canonical tag in the HTTP header that's then dynamically filled with a self-referential link to the pages own URL via Javascript. This seemed and seems to work fine for 99% of our pages but happens to fail for one of our top performing ones (which is why the hassle 😉 ). What we tried so far: going through every step of this handy guide: https://moz.com/blog/panic-stations-how-to-handle-an-important-page-disappearing-from-google-case-study --> inconclusive (healthy pages, no penalties etc.) manually requesting re-indexation via Search Console --> immediately brought back some pages, others shortly re-appeared in the index then got kicked again for the aforementioned reasons checking other search engines --> pages are only gone from Google, can still be found via Bing, DuckDuckGo and other search engines Questions to you: How does the Googlebot operate with Javascript and does anybody know if their setup has changed in that respect around the end of January? Could you think of any other reason to cause the behavior described above? Eternally thankful for any help! ldWB9
Intermediate & Advanced SEO | | SvenRi1 -
Duplicate Content: Is a product feed/page rolled out across subdomains deemed duplicate content?
A company has a TLD (top-level-domain) which every single product: company.com/product/name.html The company also has subdomains (tailored to a range of products) which lists a choosen selection of the products from the TLD - sort of like a feed: subdomain.company.com/product/name.html The content on the TLD & subdomain product page are exactly the same and cannot be changed - CSS and HTML is slightly differant but the content (text and images) is exactly the same! My concern (and rightly so) is that Google will deem this to be duplicate content, therfore I'm going to have to add a rel cannonical tag into the header of all subdomain pages, pointing to the original product page on the TLD. Does this sound like the correct thing to do? Or is there a better solution? Moving on, not only are products fed onto subdomain, there are a handfull of other domains which list the products - again, the content (text and images) is exactly the same: other.com/product/name.html Would I be best placed to add a rel cannonical tag into the header of the product pages on other domains, pointing to the original product page on the actual TLD? Does rel cannonical work across domains? Would the product pages with a rel cannonical tag in the header still rank? Let me know if there is a better solution all-round!
Intermediate & Advanced SEO | | iam-sold0 -
What is the best way to optimize/setup a teaser "coming soon" page for a new product launch?
Within the context of a physical product launch what are some ideas around creating a /coming-soon page that "teases" the launch. Ideally I'd like to optimize a page around the product, but the client wants to try build consumer anticipation without giving too many details away. Any thoughts?
Intermediate & Advanced SEO | | GSI0 -
Page indexed but not showing up at all in search results
I am currently working on the SEO for a roofing company. I have developed GEO targeted pages for both commercial and residential roofing (as well as attic insulation and gutters) and have hundreds of 1st page placements for the GEO targeted keywords. What is baffling me is that they are performing EXTREMELY poorly on the bigger cities, to the point of not evening showing up in the first 5 pages. I also target a page specifically for roof repair in Phoenix and it is not coming up AT ALL. This is not typically the results I get when directly targeting keywords. I'm working on implementing keyword variations as well as adding about 10 or so information pages (@ 700 words) regarding different roofing systems which I plan to cross link on the site, etc. I'm just wondering if there is a simple answer as to why the pages I want to be showing up the most are performing so poorly and what I would need to do to improve their rankings.
Intermediate & Advanced SEO | | dogstarweb0 -
Where is the best place for Landing Pages to reside on the Home Page?
On this site http://www.austintenantadvisors.com/ I have my main landing pages listed in the navigation under "Types". The reason why I did this is because I am not sure where to insert those on the home page where it does not look spammy to Google and looks natural for users. Obviously they need to appear somewhere on the home page for Google to be able to continue crawling and indexing them. Any thoughts or suggestions would be appreciated.
Intermediate & Advanced SEO | | webestate0 -
No equivalent page to re-direct to for highly trafficked pages, what should we do?
We have several old pages on our site that we want to get rid of, but we don't want to 404 them since they have decent traffic numbers. Would it be fine to set up a 301 re-direct from all of these pages to our home page? I know the best option is to find an equivalent page to re-direct to, but there isn't a great equivalent.
Intermediate & Advanced SEO | | nicole.healthline0