Thinking about not indexing PDFs on a product page
-
Our product pages generate a PDF version of the page in a different layout. This is done for 2 reasons, it's been the standard across similar industries and to help customers print them when working with the product.
So there is a use when it comes to the customer but search? I've thought about this a lot and my thinking is why index the PDF at all? Only allow the HTML page to be indexed. The PDF files are in a subdomain, so I can easily no index them. The way I see it, I'm reducing duplicate content
On the flip side, it is hosted in a subdomain, so the PDF appearing when a HTML page doesn't, is another way of gaining real estate. If it appears with the HTML page, more estate coverage.
Anyone else done this? My knowledge tells me this could be a good thing, might even iron out any backlinks from being generated to the PDF and lead to more HTML backlinks
Can PDFs solely exist as a form of data accessible once on the page and not relevant to search engines. I find them a bane when they are on a subdomain.
-
Thanks EGOL, I didn't think about using rel=canonical on htaccess. Great idea
-
If you link to a pdf, some of your power flows into it. If someone else links to a pdf, some of his power flows into it.
PDFs accumulate backlinks, accumulate pagerank. You should assign these valuable assets to real pages.
So, if you have pdfs that are duplicates of webpages then you should use rel=canonical using htaccess to attribute them to their matched webpage. If you don't do that then you assets are being squandered.
-
I don't think see my PDFs show up for a search term when my HTML pages are being displayed.
However, there was a situation when a PDF was displayed and I created a HTML page of it and set up redirects from the PDF to the HTML page. I followed that up by reuploading the PDF as a new URL and offering to download. That way I transfered the rank juice to the HTML page.
In a nutshell, no I don't see my PDFs outranking my HTML pages, but I do know my PDFs are indexed and I don't know if they show up for a different search term.
I guess my main question is, would not indexing them open up the chance for more backlinks to your HTML page and not the PDF? And in Google's eyes, it won't debate over which to display, the HTML page or PDF as both have the same content.
Maybe I'm over thinking and the straight answer is, if a HTML page exists, Google won't give preference to the PDF but in the event there is no HTML, the PDF is shown
-
Yeah, we offer the same. The user is able to download the PDF or have it open in a new window. I haven't seen Google automatically present my PDF and so far my searches have shown my HTML page, but my question to Cole remains, could Google be comparing the PDF and HTML page with each other? What if in a search situation it would prefer showing the PDF higher than the HTML page?
On your next question, I don't get duplicate warning for PDF. I believe the PDFs are indeed being indexed as the text is readable. How well are they being indexed? I've got close to 22,000 search results for my subdomain so yeah, they are indexed.
I do have rel-canonical tags on the HTML page, but can't appear it on the PDF as it's a file and not a page.
-
Thanks for the replies
Cole - Google indexed our PDFs though. I tested this by doing a site:domain.com search term, and then a site:static.domain.com search term search.
Result:
site:static.domain.com search term
Google showed me the PDF document that is available for download from the HTML page that ranks high for that search term search.
So Google is indexing both the PDF and HTML. To answer your question as to why I don't want them indexed.. Well, my thinking was. If the PDF appears and if someone backlinks to it, I rather get that backlink to the HTML page. PDFs are hosted on my subdomain and I don't want the subdomain to get the rank. Back of my head, I'm also debating, whether my PDF and HTML are competing with each other?
-
The way I see it, I'm reducing duplicate content.
Anything you can do that helps with this, is a good move - nothing wrong with a little tidying up.
the PDF appearing when a HTML page doesn't, is another way of gaining real estate
Do you currently have this happen? PDF's can actually out-rank HTM pages on occasion - they aren't the preferred media type of Google, but like any page, it's all about the content.
-Andy
-
Morning,
To my knowledge Google isn't able to open a PDF. You could always present the users with the option of downloading a PDF. Any tech website I have been to generally offers it in a download, or opens it in another window.
I don't know why it would automatically present a PDF, although, I probably don't work in the same industry! Ha!
The other question I have is, are you getting Duplicate content warnings? Are the PDF's currently being indexed? If so, how well are they being Indexed? Google can read an open PDF, or a PDF that automatically displays, but some are easier to read that others depending on the settings of the PDF.
http://www.searchenginejournal.com/8-tips-to-make-your-pdf-page-seo-friendly-by/59975/
Another option is the rel canonical tags?
Hope this helps!
-
"I'm reducing duplicate content " - Google cannot crawl PDFs, but they do index them and show them in search results.
So let me ask you - why do you not want them indexed?
I say let them be indexed.
Cole
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Difficulty with Indexing Pages - Desperate for Help!
I have a website with product pages that use the same URL, but load different data based on what's passed to them with GET. I am using a Wordpress website, but all of the page information is retrieved from a database using PHP and displayed with PHP. Somehow these pages are not being indexed by Google. I have done the following: 1. Created a site map pointing to each page. 2. Defined URL parameters in Search Console for these type of pages. 3. Created a product schema using schema.org, and tested it without errors. I have requested re-indexing repeatedly and these pages and images on the pages are still not being indexed! Does anybody have any suggestions?
Intermediate & Advanced SEO | | jacleaves0 -
Fresh page versus old page climbing up the rankings.
Hello, I have noticed that if publishe a webpage that google has never seen it ranks right away and usually in a descend position to start with (not great but descend). Usually top 30 to 50 and then over the months it slowly climbs up the rankings. However, if my page has been existing for let's say 3 years and I make changes to it, it takes much longer to climb up the rankings Has someone noticed that too ? and why is that ?
Intermediate & Advanced SEO | | seoanalytics0 -
No Index No follow instead of Rel canoncical on product pages
Hi all, we handle our product pages no with rel canonical now, we have 1 url that is indexed http://www.prams.net/cam-combi-family the other colours have different urls like http://www.prams.net/cam-combi-family-3-in-1-pram-reversible-seat-car-seat-grey-d which canonicalize to the indexed page. Google still crawls all those pages. For crawl budget reasons we want to use "no index, no follow" instead on these pages (the pages for the other colours)? Google would then crawl fewer pages more often? Does this make sense? Are their any downsides doing it? Thanks in advance Dieter
Intermediate & Advanced SEO | | Storesco1 -
How I can improve my website On page and Off page
My Website is guitarcontrol.com, I have very strong competition in market. Please advice me the list of improvements on my websites. In regarding ON page, Linkbuiding and Social media. What I can do to improve my website ranking?
Intermediate & Advanced SEO | | zoe.wilson170 -
Page 1 Reached, Further Page Improvements and What Next ?
Moz, I have a particularly tricky competitive keyword that i have finally climbed our website to the 10th position of page 1, i am particularly pleased about this as all of the website and content is German which i have little understanding of and i have little support on this, I am pleased with the content and layout of the page and i am monitoring all Google Analytics values very closely, as well as the SERP positions, So as far as further progression with this page and hopefully climbing further up page 1, where do you think i should focus my efforts ? Page Speed optimization?, Building links to this page ?, blogging on this topic (with links) , Mobile responsive design (More difficult), further improvements to pages and content linked from this page ? Further improvements to the website in general?,further effort on tracking visitors and user experience monitoring (Like setting up Crazyegg or something?) Any other ideas would be greatly appreciated, Thanks all, James
Intermediate & Advanced SEO | | Antony_Towle0 -
Town and County pages taking months to index.
Hi, At http://www.general-hypnotherapy-register.com/regional-hypnotherapy-directory/ we have a load of town and county pages for all of the hypnotherapists on the site a) I have checked all of these links and they are spiderable. b) About a month back I noticed after the site changes, not entirely sure why, but the site was generating rogue pages, eg http://www.general-hypnotherapy-register.com/hypnotherapists/page/5/?town=barnsley instead of http://www.general-hypnotherapy-register.com/hypnotherapists/?town=barnsley We have added meta no index, no follow to these rogue pages around 4 weeks ago..however these pages still have a google cache date of Oct 4th predating these meta changes c) There are examples of the pages we do want, indexed, and ranking too on page 1, site:www.general-hypnotherapy-register.com/hypnotherapists eg http://www.general-hypnotherapy-register.com/hypnotherapists/?town=ockham however these pages are few and far between, these have a recent google cache date of Nov 1 **d) **The xml sitemap has all of the correct URLS, but in webmaster tools, the amount of pages indexed has been stubbornly flat at 2800 out of 4400 for 4 weeks now e) Query Paramaters: for ?town and ?county in webmaster tools, are set to Yes/Specifies Would love any suggestions, Thanks. Mark.
Intermediate & Advanced SEO | | Advantec0 -
Product Descriptions for a product with many designs
I'm a newbie with SEO and I have a question regarding product descriptions. Let's say I am selling 100 dog id tags. The tags are all made of same materials, same size, just different designs. Now for the product description, do I need to write a different set of description for all 100 tags? This is an example of a short product description(there's more) for all the pet tags: Personalized with 4 lines of information and 20 characters in each line. Lifetime guarantee - If your pet ID tag ever becomes illegible, we will replace it free of charge. Solid one-piece construction - No glued or ""sandwiched"" materials to wear out or fall apart. Split ring for collar attachement included with EVERY tag. Countless uses - School backpacks, luggage, fashion accessories, and many more! All of the above information pertains to all the pet tags. Can all my product descriptions contain that information, or will I need to modify this 100 times for each individual pet tag? I read up a lot on duplicate content so I am slightly confused. Will this hurt my SEO? Thanks, Keith
Intermediate & Advanced SEO | | ktw0160 -
Most Painless way of getting Duff Pages out of SE's Index
Hi, I've had a few issues that have been caused by our developers on our website. Basically we have a pretty complex method of automatically generating URL's and web pages on our website, and they have stuffed up the URL's at some point and managed to get 10's of thousands of duff URL's and pages indexed by the search engines. I've now got to get these pages out of the SE's indexes as painlessly as possible as I think they are causing a Panda penalty. All these URL's have an addition directory level in them called "home" which should not be there, so I have: www.mysite.com/home/page123 instead of the correct URL www.mysite.com/page123 All these are totally duff URL's with no links going to them, so I'm gaining nothing by 301 redirects, so I was wondering if there was a more painless less risky way of getting them all out the indexes (IE after the stuff up by our developers in the first place I'm wary of letting them loose on 301 redirects incase they cause another issue!) Thanks
Intermediate & Advanced SEO | | James770