Embedding PDF previews and maintaining crawlability/link-equity.
-
One site that I'm working on has previously had a great deal of success from the pdf preview content on the site. The pdf previews are quite substantial and rank for many many long-tail terms that drive a reasonable amount of traffic back to the site to purchase the full version of the product.
As part of a site redesign, the way the pdf previews are embedded/presented on the page is changing slightly:
The proposed modal pop-up on the new site the code looks like thie:
<object data="my-pdf-preview.pdf" type="application/pdf" style="width:100%; min-height:600px; max-height:100%;max-height:100%;"><embed src="my-pdf-preview.pdf" type="application/pdf"></object>
Where as the old code looked like this:
<object data="mt-pdf-previewpreview.pdf#view=FitH,50&scrollbar=1&toolbar=0&statusbar=0&messages=0&navpanes=0" <br="">type='application/pdf'
width='100%'
height='600'>It appears your Web browser is not configured to display PDF files.
No worries, you can download the PDF file here.</object>
Note: how previously the code contained a plain, standard link to the pdf document.
My worry is that without this link, search engines won't a) be able to discover/crawl the pdf content or b) pass any link-equity to these pdfs.
Does anyone have any experience/recommendations about this? I'd like to have some information before I request that they add a plain link to the pdf previews back onto the on-page content.
-
That's the route I'd push for as well I think.
Agreed on experimentation. Please report back if you get a chance to test this. Perhaps choose a small number of PDFs on this site redesign and leave the link off of them?
-
Thanks Kane - I've managed to make the case for a real-simple "download preview pdf" link so at least I feel comfortable that they won't lose too much of this "hidden" traffic.
It would still be nice to understand how <embed> is handled and whether any link-equity passes though the embed. Tight deadlines on projects don't mean you have time to experiment.
-
I haven't seen any studies with <embed> the way I have with <iframe>. <embed> is also used for video and flash, but neither would be indexed the same way as PDF so hard to compare. The embed tag is pretty standardized, so I really doubt they wouldn't crawl this similarly.</p> <p>IIRC in the ugly era of flash, it was proper to have a <noscript> {crawlable content here} </noscript> section after the <embed>, so that's one comparable situation, but that's due to the flash itself not being crawled well.</p> <p>If it's not a hassle, I would add the text link to the PDF that says "download full PDF" or similar. If it is a hassle and takes longer than a couple hours, then it's a harder call.</p> <p>Similar thread that could be helpful:</p> <ul> <li><a href="http://stackoverflow.com/questions/3686331/does-google-index-html-content-supplied-by-the-object-tag">http://stackoverflow.com/questions/3686331/does-google-index-html-content-supplied-by-the-object-tag</a></li> </ul></iframe>
-
Search engines will still be able to crawl the PDF. They crawl images, don't they?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Dynamic referenced canonical pages based on IP region and link equity question
Hi all, My website uses relative URLs that has PHP to read a users IP address, and update the page's referenced canonical tag to an region specific absolute URL for ranking / search results. E.g. www.example.com/category/product - relative URL referenced for internal links / external linkbuilding If a US IP address hits this link, the URL is the same, but canonicalisation is updated in the source to reference www.example.com**/us/**category/product, so all ranking considerations are pointed to that page instead. None of these region specific pages are actually used internally within the site. This decision was done so external links / blog content would fit a user no matter where they were coming from. I'm assuming this is an issue in trying to pass link equity with Googlebot, because it is splitting the strength between different absolute canonical pages depending on what IP it's using to crawl said links (as the relative URL will dynamically alter the canonical reference which is what ranking in SERPs) Any assistance or information no matter how small would be invaluable. Thanks!
Intermediate & Advanced SEO | | MattBassos0 -
Internal Linking
Hi, I'm doing internal anchor text links. Relative path. if I use /destination-page instead of https://website.com/destination-page will I still receive a transfer of internal Google trust to the destination page? Does google treat just the / url the same as full url??
Intermediate & Advanced SEO | | Scotty_Wilson0 -
How would you link build to this page?
Hi Guys, I'm looking to build links to a commercial page similar to this: https://apolloblinds.com.au/venetian-blinds/ How would you even create quality links (not against Google TOS) to a commercial page like that? Any ideas would be very much appreciated. Cheers.
Intermediate & Advanced SEO | | spyaccounts140 -
Google and PDF indexing
It was recently brought to my attention that one of the PDFs on our site wasn't showing up when looking for a particular phrase within the document. The user was trying to search only within our site. Once I removed the site restriction - I noticed that there was another site using the exact same PDF. It appears Google is indexing that PDF but not ours. The name, title, and content are the same. Is there any way to get around this? I find it interesting as we use GSA and within GSA it shows up for the phrase. I have to imagine Google is saying that it already has the PDF and therefore is ignoring our PDF. Any tricks to get around this? BTW - both sites rightfully should have the PDF. One is a client site and they are allowed to host the PDFs created for them. However, I'd like Mathematica to also be listed. Query: no site restriction (notice: Teach for america comes up #1 and Mathematica is not listed). https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#q=HSAC_final_rpt_9_2013.pdf+"Teach+charlotte"+filetype:pdf&as_qdr=all&filter=0 Query: site restriction (notice that it doesn't find the phrase and redirects to any of the words) https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#as_qdr=all&q="Teach+charlotte"+site:www.mathematica-mpr.com+filetype:pdf
Intermediate & Advanced SEO | | jpfleiderer0 -
Should I have as few internal links as possible?
On most pages of my site i have a Quick Links section, which gives x3 cross sales links to other products, a newsletter sign up link, link to Blog, x4 links from images to surveys, newsletters, feedback etc. Will these links be hurting my optimal SEO juice between pages, should the number of internal links be kept to a minimum? My site is www.over50choices.co.uk if that helps. Thanks
Intermediate & Advanced SEO | | AshShep1
Ash0 -
Should /node/ URLs be 301 redirect to Clean URLs
Hi All! We are in the process of migrating to Drupal and I know that I want to block any instance of /node/ URLs with my robots.txt file to prevent search engines from indexing them. My question is, should we set 301 redirects on the /node/ versions of the URLs to redirect to their corresponding "clean" URL, or should the robots.txt blocking and canonical link element be enough? My gut tells me to ask for the 301 redirects, but I just want to hear additional opinions. Thank you! MS
Intermediate & Advanced SEO | | MargaritaS0 -
Using a 302 re-direct from http://www to https://www to secure customer data
My website sends Customers from a http://www.mysite.com/features page to a https://www.mysite.com/register page which is an account sign-up form using a 302 re-direct. Any page that collects customer data has an authenticated SSL certificate to protect any data on the site. Is this 302 the most appropriate way of doing this as the weekly crawl picks it up as being bad practise? Is there a better alternative?
Intermediate & Advanced SEO | | Ubique0 -
Increasing Internal Links But Avoiding a Link Farm
I'm looking to create a page about Widgets and all of the more specific names for Widgets we sell: ABC Brand Widgets, XYZ Brand Widgets, Big Widgets, Small Widgets, Green Widgets, Blue Widgets, etc. I'd like my Widget page to give a brief explanation about each kind of Widget with a link deeper into my site that gives more detail and allows you to purchase. The problem is I have a lot of Widgets and this could get messy: ABC Green Widgets, Small XYZ Widgets, many combinations. I can see my Widget page teetering on being a link farm if I start throwing in all of these combos. So where should I stop? How much do I do? I've read more than 100 links on a page being considered a link farm, is that a hardline number or a general guideline?
Intermediate & Advanced SEO | | rball10