PDF best practices: to get them indexed or not? Do they pass SEO value to the site?
-
All PDFs have landing pages, and the pages are already indexed. If we allow the PDFs to get indexed, then they'd be downloadable directly from google's results page and we would not get GA events.
The PDFs info would somewhat overlap with the landing pages info. Also, if we ever need to move content, we'd now have to redirects the links to the PDFs.
What are best practices in this area? To index or not?
What do you / your clients do and why?
Would a PDF indexed by google and downloaded directly via a link in the SER page pass SEO juice to the domain? What if it's on a subdomain, like when hosted by Pardot? (www1.example.com)
-
repeatedly noticed that google index PDF files. But only their headers, without the contents of the file itself.
If you format the file description correctly, you can do it through the PDF Architect (http://pdf-architect.ideaprog.download/) program, or any other convenient for you.
-
PDFs can be canonicalized using .htaccess. Google is usually very slow to discover and obey this but it can be done. However, if your PDF is not close to being an exact copy of the target page, Google will probably not honor the canonicalization and they will index the PDF and the html page separately.
PDFs can be optimized (given a title tag) by editing the properties of the document. Most PDF - making software has the ability to do this.
You can insert "buy buttons" and advertising in PDFs. Just make an image, paste it into the document and link it to your shopping cart or to your target document.
PDFs accumulate linkjuice and pass it to other documents.
Use the same strategies with PDFs as you would with an html page for directing visitors where you want them to go and getting them to do what you want them to do.
Some people will link to your PDF, others will grab your PDF and place it on their website (in that situation, you lose the canonical but still get juice from any embeded links), and benefit from ads and buttons that might be included. Lock the PFD with your PDF-creating software to prevent people from editing your PDF (but they can always copy/paste to get around it).
Other types of documents such as Excel spreadsheets, PowerPoint documents, Google images, etc can have embedded text, embedded links and other features that are close to equivalent to an html document.
-
PDF documents aren't written in HTML so you can't put canonical tags into PDFs. So that won't help or work. In-fact, if you are considering any types of tags of any kind for your PDFs, stop - because PDF files cannot have HTML tags embedded within them
If your PDF files have landing pages, just let those rank and let people download the actual PDF files from there if they chose to do so. In reality, it's best to convert all your PDFs to HTML and then give a download link to the PDF file in case people need it (in this day and age though, PDF is a backwards format. It's not even responsive, for people's pones - it sucks!)
The only canonical tags you could apply, would be on the landing pages (which do support HTML) pointing to the PDF files. Don't do that though, it's silly. Just convert the PDFs to HTML, then leave a download button for the old PDFs in-case anyone absolutely needs them. If the PDF and the HTML page contain similar info, it won't affect you very much.
What will affect you, is putting canonical tags on the landing pages thus making them non-canonical (and stopping the landing pages from ranking properly). You're in a situation where a perfect outcome isn't possible, but that's no reason to pick the worst outcome by 'over-adhering' to Google's guidelines. Sometimes people use Google's guidelines in ways Google didn't anticipate that they would
PDF documents don't usually pass PageRank at all, as far as I know
If you want to optimise the PDF documents themselves, the document title which you save them with is used in place of a <title>tag (which, since PDFs aren't in HTML, they can't use <title>). You can kind of optimise PDF documents by editing their document titles, but it's not super effective and in the end HTML conversions usually perform much better. As stated, for the old fossils who still like / need PDF, you can give them a download link</p> <p>In the case of downloadable PDF files with similar content to their connected landing pages, Google honestly don't care too much at all. Don't go nutty with canonical tags, don't stop your landing pages from ranking by making them non-canonical</p></title>
-
Yes, the PDFs would help increase your domain rank as they are practically considered as pages by Google, as explained in their QnA here.
Regarding hosting the PDFs on a subdomain, Google has stated that it's almost the same as having them on a subfolder, but that is highly contested by everyone since it's much harder to rank a subdomain than a subfolder.
Regarding the canonical tags, they are created for "Similar or Duplicate Pages", so the content doesn't have to be identical, and you'll be good so long as most of the content is the same. Otherwise, you can safely have them both be and have backlinks linking from the pdf to the main content to transfer "link juice", as they are considered as valid links.
I hope my response was beneficial to you and that the included proof was substantial.
Daniel Rika
-
Thank you.
Could you address my question about what's best practice? What do most companies do?
I am not sure what the best choice would be for us -- to expose PDFs which compete with their own landing pages or not.
Also, do you know if PDFs pass SEO "juice" to the main domain? Even if they are hosted at www2.maindomain.com?
Where can I see some proof that this is the case?
If the PDFs have a canonical tag pointing to the parent page, wouldn't this be confusing for the search engines as these are two separate files with differing content? Canonical tags are usually used to eliminate duplicates for differing URLs with identical content.
-
Whether you want to index the pdf directly or not will mostly depend on the content of the pdf:
- If you are using the pdf as a way to gather e-mails for your newsletter, or if you are offering the pdf as a way to get users to your site, then it would be best not to have them indexed directly, but instead have the users go to your site first.
- If the pdf in itself is a way for you to promote your website or content then you can index it so that it can be accessed directly and may help you to get a bit more rank or clicks.
If you are looking to track pdf views, there are options to connect GA and track your pdf views, such as this plugin.
If the content is similar to the web page, then you can put a canonical tag to transfer the ranking. You can add it to the http header using the .htaccess file as explained here.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filter Tracking works fine at staging site but not on LIVE site why?
Hello Expert, For my ecommerce site I want to track filter url's like price range, size, width, color etc and fully filter url should display in google analytic. I have implemented filter tracking at staging server and it works perfectly but on LIVE site it not show me full filter url. Do you guys think any parameter which i have configured in search console affect this? Note - I have configured in this way - http://webmasters.stackexchange.com/questions/93008/how-to-track-a-product-filter-in-the-product-list-view-with-google-analytics My filter url's are given below. And in search console I have configure two parameters. 1) effect - Sort, Crawl - No urls 2) FT - effect- ( - ) , crawl - Let google bot decide. But as per me this parameter is for crawling should not affect tracking right? mysite.com?FP=0&filtSeq=Price&Sort=BS
Reporting & Analytics | | adamjack
mysite.com?FT=7581&filtSeq=Type&Sort=BS
mysite.com?FT=1042&filtSeq=Colour&Sort=BS In robot file nothing is block. In analytic it showing me url till mysite.com only where as in staging it shows me full filter url. Thanks!0 -
Should Google Trends Match Organic Traffic to My Site?
When looking at Google Trends and my Organic Traffic (using GA) as percentages of their total yearly values I have a correlation of .47. This correlation doesn't seem right when you consider that Google Trends (which is showing relative search traffic data) should match up pretty strongly to your Organic Traffic. Any thoughts on what might be going on? Why isn't Google Trends correlating with Organic Traffic? Shouldn't they be pulling from the same data set? Thanks, Jacob
Reporting & Analytics | | jacob.young.cricut0 -
Stop getting info from Google analytics on purchases in our site
Hi guys, We have eCommerce.
Reporting & Analytics | | WayneRooney
We connected the site to the Google analytic eCommerce.
Everything was work fine until 3 weeks ago. Suddenly we stooped getting purchases information in the analytic although i see purchases in the website. We didn't change anything in the website and i really don't know how to solve this problem.
If someone here can point me where i can get some info on how to fix it it can be great. Thanks a lot!0 -
Large event site - how should I structure my URLs?
Hi guys, I'm working on a new website which is consolidating a number of existing event sites into one. The existing sites use a variety of URL structures: www.eventsite1.com/events/event-name www.eventsite2.com/festival-program/event-name www.eventsite3.com/event-name This inconsistency has led to issues with tracking category usage properly in analytics - for instance, with eventsite3.com, events fall within categories (www.eventsite3.com/category-name) but as soon as you drill into an event detail page (www.eventsite3.com/event-name) from the category page, the category is lost to analytics. This is compounded when one event lives within multiple categories, as I can't figure out which category is the most effective for a particular event. I've seen other event sites establish a canonical URL for a primary category, display it in the URL (i.e. www.eventsite4.com/primary-category/event-name) yet still let that event get hit via the secondary categories (www.eventsite4.com/secondary-category/event-name). This way, the categories get passed to analytics without any duplicate content issues (i.e. via the setting of canonicals) Basically, I want to make sure that whatever instruction I give to the devs for the new site re: URL structure is correct from an SEO perspective and analytics perspective. Do I even need to worry about having the category in the URL? Can someone please help me with this? Hope this makes sense Cheers
Reporting & Analytics | | cos20300 -
Best top 5 actionable insights from Google Analytics
Buongiorno from 14 degrees C cloudy wetherby UK 🙂 With a whole gamut of books out there are written around the topic of "Actionable Web analytics" where you have to wade through chapters of how to suck eggs and how to get analytics part of your business culture (whatever! most clients who look after their websites are entry level Marketing Execs who pay the digital agencies to do the thinking for them) I'd like to cut to the chase and list out say a top 5 of actionable insights a client would actually give the time of day for so heres my top five, whats yours? 1. Landing page bounce - showing them the £5k landing page designed with no conversion thinking is turning off customers
Reporting & Analytics | | Nightwing
2. Synching up PPC campaigns to analytics and showing them how much money they are waisting
3. Goal funnels - Showing them there 20 step shopping conversion funnel gets abandoned at step 3
4. Non brand keyword traffic (although a stack of this gets blocked by "not provided" Big groan...)
5. Event tracking on there 10 page carousel anner showing them no one clicks on banners 2 - 10 So under the strict label of "Actionable data only" from Google analytics whats your best insights you share with clients in the hope that they may actually give data reports the time of day! Grazie tanto,
David1 -
How do i get Social Media Actions Tracked in GA
Greetings from 17 degrees C wetherby UK 🙂 http://i216.photobucket.com/albums/cc53/zymurgy_bucket/how-do-get-this.jpg The above url pints to dat I'd love to see in my Google analytics account but instead all i can see is this:
Reporting & Analytics | | Nightwing
http://i216.photobucket.com/albums/cc53/zymurgy_bucket/no-socail-media-engagementcopy.jpg What i really want to measure is Facebook Likes etc not just referral traffic from social media sites. So my question is please... "Do i have to add additional tracking code to Google analytics as explained here - https://developers.google.com/analytics/devguides/collection/gajs/gaTrackingSocial " Thanks in advance,
David0 -
Multi-Site Analytics Dashboards?
Anyone have recommendations on a good multi-site analytics dashboard? I am managing roughly 20 sites right now, and am looking for a dashboard that provides basic info like # of visitors, search traffic, etc. for a couple dozen sites at a glance.
Reporting & Analytics | | TakeshiYoung0 -
Google Analytics Best Practice Set up for Clients
Hi When setting up new Google Analytics accounts for clinets what is the preferred/best practice. At present we have our own company google account and add new clinets this way (to our account) - the disadvantage with this, we can only grant them limited account access otherwise they would be able to view all the accounts we cretaed. Plus we can't link their adwords to the GA account we cretaed them. Is it best practice to set the client up with their own Google Account and then we just link to their account. Advise would be appreciated, thank you.
Reporting & Analytics | | daracreative0