Dealing with PDFs?
-
Hello fellow mozzers!
One of our clients does an excellent job of providing excellent content, and we don't even have to nag them about it (imagine that!). This content is usually centered around industry reports, financial analyses, and economic forcasts; however, they always post them in the form of pdfs.
How does Google view PDF's, and is there a way to optimize them? Ideally, I am going to try to get this client set up with a blog-like plateform that will use HTML text, rather than PDF's, but I wanted to see what info was out there for PDF's.
Thanks!
-
Thank you Keri for the helpful resource. I actually ended up doing all of those things for our client. Also, I found out that the default Drupal 6 robot.txt file, does not allow the search engines to index pdf's, images, and flash. Therefore, one must eliminate the disallow: /sites/ from the Robot.txt file.
-
This doesn't address ranking, but the YOUmoz post does talk about best practices for optimizing PDF content and may help you. http://www.seomoz.org/ugc/how-to-optimize-pdf-documents-for-search
-
To be honest Dana, outside of the basics mentioned, I tended not to go overboard and many of them started to rank naturally as Google spidered the site. Just remember to give the link to the PDF a strong anchor text and if possible, add a little content around it to explain what visitors can expect in the document. Also remember to add a link to Adobe so that they can download the free reader if they dont have it already.
Hope this helps,
Regards,
Andy
-
Thank you iNet SEO, Excellent resource...
I was also wondering if anyone had any posts / experience with understanding the indexing and ranking of PDF content?
-
Yes, you can optomise PDF's - have a read of this as it seems to cover most points
http://www.seoconsultants.com/pdf/seo
Sorry, I forgot to add that PDF's are useful for those who are wishing to download something to read at a later stage or whilst offline. Don't rush to advise them that HTML is the way to go unless it actually is. I have printed off many a PDF and taken it into meetings with me.
Regards,
Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Dealing with Expired & Reoccurring Content At Scale
Hello, I have a question concerning maintenance & pruning content with a large site that has a ton of pages that are either expired OR reoccurring. Firstly, there's ~ 12,000 pages on the site. They have large sections of the site that have individual landing pages for time-sensitive content, such as promotions and shows. They have TONS of shows every day, so the # of page to manage keeps exponentially increasing. Show URLs: I'm auditing the show URLs and looking at pages that have backlinks. With those, I am redirecting to the main show pages.
Technical SEO | | triveraseo
-However, there are significant # of show URLs that are from a few years ago (2012, 2013, 2014, 2015) that DON'T get traffic or have any backlinks (or ranking keywords). Can I delete these pages entirely from the site, or should I go through the process of 410-ing them (and then deleting? or ...?)Can you let 410's sit?)? They are in the XML sitemap right now, so they get crawled, but are essentially useless, and I want to cut off the dead weight, but I'm worried about deleting a large # of pages from the site at once. For show URLs that are still obsolete, but rank well in terms of kewyords and get some traffic...is there any recommended option? Should I bother adding them to a past shows archive section or not since they are bringing in a LITTLE traffic? Or ax them since it's such a small amount of traffic compared to what they get from the main pages. There are URLs that are orphaned and obsolete right now, but will reoccur. For instance, when an artist performs, they get their own landing page, they may acquire some backlinks and rank, but then that artist doesn't come back for a few months. The page just sits there, orphaned and in the XML sitemap. However, regardless of back-links/keywords, the page will come back eventually. Is there any recommended way to maintain this kind of situation? Again, there are a LOT of URLs in this same boat. Promotional URLs: I'm going about the same process for promotions and thankfully, the scale of hte issue is much less. However, same question as above...they have some promotional URLs, like NYE Special Menu landing pages or Lent-Specials, etc, for each of their restaurants. These pages are only valid for a short amount of time each year, and otherwise, are obsolete. I want to reuse the pages each year, though, but don't want them to just sit there in the XML sitemap. Is there ever an instance where I might want to 302 redirect them, and then remove the 302 for the short amount of time they are valid? I'm not AS concerned about the recycled promotional URLs. There are much fewer URLs in this category. However, as you can probably tell, this large site has this problem of reoccurring content throughout, and I'd like to get a plan in place to clean it up and then create rules to maintain. Promotional URLs that reoccur are smaller, so if they are orphaned, not the end of the world, but there are thousands of show URLs with this issue, so I really need to determine the best play here. Any help is MUCH appreciated!0 -
How should I deal with "duplicate" content in an Equipment Database?
The Moz Crawler is identifying hundreds of instances of duplicate content on my site in our equipment database. The database is similar in functionality to a site like autotrader.com. We post equipment with pictures and our customers can look at the equipment and make purchasing decisions. The problem is that, though each unit is unique, they often have similar or identical specs which is why moz (and presumably google/bing) are identifying the content as "duplicate". In many cases, the only difference between listings are the pictures and mileage- the specifications and year are the same. Ideally, we wouldn't want to exclude these pages from being indexed because they could have some long-tail search value. But, obviously, we don't want to hurt the overall SEO of the site. Any advice would be appreciated.
Technical SEO | | DohenyDrones0 -
Google how deal with licensed content when this placed on vendor & client's website too. Will Google penalize the client's site for this ?
One of my client bought licensed content from top vendor of Health Industry. This same content is on the vendor's website & my client's site also but on my site there is a link back to vendor is placed which clearly tells to anyone that this is a licensed content & we bought from this vendor. My client bought paid top quality content for best source of industry but at this same this is placed on vendor's website also. Will Google penalize my client's website for this ? Niche is HEALTH
Technical SEO | | sourabhrana1 -
How to deal with duplicated content on product pages?
Hi, I have a webshop with products with different sizes and colours. For each item I have a different URL, with almost the same content (title tag, product descriptions, etc). In order to prevent duplicated content I'am wondering what is the best way to solve this problem, keeping in mind: -Impossible to create one page/URL for each product with filters on colour and size -Impossible to rewrite the product descriptions in order to be unique I'm considering the option to canonicolize the rest of de colours/size variations, but the disadvantage is that in case the product is not in stock it disappears from the website. Looking forward to your opinions and solutions. Jeroen
Technical SEO | | Digital-DMG0 -
What is the best way to deal with an event calendar
I have an event calendar that has multiple repeating items into the future. They are classes that typically all have the same titles but will occasionally have different information. I don't know what is the best way to deal with them and am open to suggestions. Currently Moz anayltics is showing multiple errors (duplicate page titles, descriptions and overly dynamic urls). I'm assuming that it's showing duplicate elements way into the future. I thought of having the calendar no followed at all but the content for the classes seems valuable. Thanks,
Technical SEO | | categorycode0 -
Dealing with 410 Errors in Google Webmaster Tools
Hey there! (Background) We are doing a content audit on a site with 1,000s of articles, some going back to the early 2000s. There is some content that was duplicated from other sites, does not have any external links to it and gets little or no traffic. As we weed these out we set them to 410 to let the Goog know that this is not an error, we are getting rid of them on purpose and so the Goog should too. As expected, we now see the 410 errors in the Crawl report in Google Webmaster Tools. (Question) I have been going through and "Marking as Fixed" in GWT to clear out my console of these pages, but I am wondering if it would be better to just ignore them and let them clear out of GWT on their own. They are "fixed" in the 410 way as I intended and I am betting Google means fixed as being they show a 200 (if that makes sense). Any opinions on the best way to handle this? Thx!
Technical SEO | | CleverPhD0 -
What is the best way to deal with pages whose content changes?
My site features businesses that offers activities for kids. Each business has its own page on my site. Business pages contains a listing of different activities that organization is putting on (such as events, summer camps, drop-in activities). Some businesses only offer seasonal activities (for example, during Christmas break and summer camps). The rest of the year, the business has no activities -- the page is empty. This is creating 2 problems. It's poor user experience (which I can fix no problem) but it also is thin content and sometimes treated as duplicate content. What's the best way to deal with pages whose content can be quite extensive at certain points of the year and shallow or empty at other parts? Should I include a meta ROBOTS tag to not index when there is no content, and change the tag to index when there is content? Should I just ignore this problem? Should I remove the page completely and do a redirect? Would love to know people's thoughts.
Technical SEO | | ChatterBlock0 -
How I can deal with ajax pagination?
Hello! I would like to have your input about how I can deal with a specific page in my website You can see my page here As you can see, we have a list of 76 ski resort, our pagination use ajax, wich mean we have only one url, and just below the list, we have a simple list of all the ski resort in this mountain, which show all the 76 ski resorts.. I know it's quite bad, since we can reach the same ski resort with two différents anchors links. Thanks you very much in advance, Simon
Technical SEO | | Alexandre_0