Convert keyword rich PDFs to web pages (text & images)
-
SteriPEN is a portable water purifier that kills viruses, protozoa, e-coli, etc.
Because of the technical and safety requirements nature of the product, our website has much documentation of testing, organisms affected, and more. These are in pdf form and can often be found through google search (and through links on specific pages).
Because of the keyword-richness of these documents pertaining to microbes SteriPEN kills, etc. does it make sense to convert these pdf's into html text and images?
Then I was thinking perhaps writing a blog post AND generating key links on important landing pages to these documents (as html).
Removing pdfs may be harmful? Not a clue as to the cost/benefit.
-
Google can read PDFs, and returns them in search results, but some users might prefer to view an HTML version. Also, it looks like images in PDFs are not indexed, according to the 2nd post below.
Regarding duplicate content, Google says (2nd post below):
Q: Is it considered duplicate content if I have a copy of my pages in both HTML and PDF?
A: Whenever possible, we recommend serving a single copy of your content. If this isn’t possible, make sure you indicate your preferred version by, for example, including the preferred URL in your Sitemap or by specifying the canonical version in the HTML or in the HTTP headers of the PDF resource. For more tips, read our Help Center article about canonicalization.These will be of interest to you:
http://www.google.com/support/forum/p/Webmasters/thread?tid=4472512a5515686b&hl=en&fid=4472512a5515686b00047d6de91c24fa&hltp=2
http://googlewebmastercentral.blogspot.com/2011/09/pdfs-in-google-search-results.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonicle & rel=NOINDEX used on the same page?
I have a real estate company: www.company.com with approximately 400 agents. When an agent gets hired we allow them to pick a URL which we then register and manage. For example: www.AGENT1.com We then take this agent domain and 301 redirect it to a subdomain of our main site. For example
Intermediate & Advanced SEO | | EasyStreet
Agent1.com 301’s to agent1.company.com We have each page on the agent subdomain canonicled back to the corresponding page on www.company.com
For example: agent1.company.com canonicles to www.company.com What happened is that google indexed many URLS on the subdomains, and it seemed like Google ignored the canonical in many cases. Although these URLS were being crawled and indexed by google, I never noticed any of them rank in the results. My theory is that Google crawled the subdomain first, indexed the page, and then later Google crawled the main URL. At that point in time, the two pages actually looked quite different from one another so Google did not recognize/honor the canonical. For example:
Agent1.company.com/category1 gets crawled on day 1
Company.com/category1 gets crawled 5 days later The content (recently listed properties for sale) on these category pages changes every day. If Google crawled the pages (both the subdomain and the main domain) on the same day, the content on the subdomain and the main domain would look identical. If the urls are crawled on different days, the content will not match. We had some major issues (duplicate content and site speed) on our www.company.com site that needed immediate attention. We knew we had an issue with the agent subdomains and decided to block the crawling of the subdomains in the robot.txt file until we got the main site “fixed”. We have seen a small decrease in organic traffic from google to our main site since blocking the crawling of the subdomains. Whereas with Bing our traffic has dropped almost 80%. After a couple months, we have now got our main site mostly “fixed” and I want to figure out how to handle the subdomains in order to regain the lost organic traffic. My theory is that these subdomains have a some link juice that is basically being wasted with the implementation of the robots.txt file on the subdomains. Here is my question
If we put a ROBOTS rel=NOINDEX on all pages of the subdomains and leave the canonical (to the corresponding page of the company site) in place on each of those pages, will link juice flow to the canonical version? Basically I want the link juice from the subdomains to pass to our main site but do not want the pages to be competing for a spot in the search results with our main site. Another thought I had was to place the NOIndex tag only on the category pages (the ones that seem to change every day) and leave it off the product (property detail pages, pages that rarely ever change). Thank you in advance for any insight.0 -
"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console
Hi, "Null" is appearing as top keyword in Google search console > Google Index > Content Keywords for our site http://goo.gl/cKaQ4K . We do not use "null" as keyword on site. We are not able to find why Google is treating "null" as a keyword for our site. Is anyone facing such issue. Thanks & Regards
Intermediate & Advanced SEO | | vivekrathore0 -
Alt image tags , shoud i reupload my site images as i never optimized right first time?
Hi guys i own a photographic website. www.hemeravisuals.co.uk And when I created it , i wasn't aware of the world of SEO , alt tags and labelling your images etc...
Intermediate & Advanced SEO | | hemeravisuals
Would it be wise to reupload my sites images (100 in total) as I cannot rename the files on my wordpress site but it does allow me to add alt text , captions etc?Or just add the data i can to the images allready on the site? Would it be worthwhile in terms of search and pagerank?0 -
Page 1 Reached, Further Page Improvements and What Next ?
Moz, I have a particularly tricky competitive keyword that i have finally climbed our website to the 10th position of page 1, i am particularly pleased about this as all of the website and content is German which i have little understanding of and i have little support on this, I am pleased with the content and layout of the page and i am monitoring all Google Analytics values very closely, as well as the SERP positions, So as far as further progression with this page and hopefully climbing further up page 1, where do you think i should focus my efforts ? Page Speed optimization?, Building links to this page ?, blogging on this topic (with links) , Mobile responsive design (More difficult), further improvements to pages and content linked from this page ? Further improvements to the website in general?,further effort on tracking visitors and user experience monitoring (Like setting up Crazyegg or something?) Any other ideas would be greatly appreciated, Thanks all, James
Intermediate & Advanced SEO | | Antony_Towle0 -
Show parts of page A on page B & C?
Good afternoon,
Intermediate & Advanced SEO | | rayvensoft
A quick question. I am working on a website which has a large page with different sections. Lets say: Page 1
SECTION A
SECTION B
SECTION C Now, they are adding a new area where they want to show only certain sections, so it would look like this: Page 2
SECTION A Page 3
SECTION C Page 4
SECTION D So my question is, would a rel='canonical' tag back to Page 1 be the correct way of preempting any duplicate content issues? I do not need Page 2-4 to even be indexed, it is just a matter of usability and giving the users what they are looking for without all the rest of the extra stuff. Gracias. Tesekürler. Salamat Ko. Thanks. (bonus thumbs up for anybody who knows which languages each of those are) 🙂0 -
Can too many "noindex" pages compared to "index" pages be a problem?
Hello, I have a question for you: our website virtualsheetmusic.com includes thousands of product pages, and due to Panda penalties in the past, we have no-indexed most of the product pages hoping in a sort of recovery (not yet seen though!). So, currently we have about 4,000 "index" page compared to about 80,000 "noindex" pages. Now, we plan to add additional 100,000 new product pages from a new publisher to offer our customers more music choice, and these new pages will still be marked as "noindex, follow". At the end of the integration process, we will end up having something like 180,000 "noindex, follow" pages compared to about 4,000 "index, follow" pages. Here is my question: can this huge discrepancy between 180,000 "noindex" pages and 4,000 "index" pages be a problem? Can this kind of scenario have or cause any negative effect on our current natural SEs profile? or is this something that doesn't actually matter? Any thoughts on this issue are very welcome. Thank you! Fabrizio
Intermediate & Advanced SEO | | fablau0 -
Should I use the main keyword in the title tag for the site on all category pages?
I am pretty excited about changing all my title tags (for the most important 7 pages) since I have seen my rankings jump up in the SERP just by adding the main keyword for my website in the title tag. To make it easier I will explain my business. Simply, I run an online jewelry shop, so basically the keywords I want to use is "Jewelry online" and for the main categories "Necklace", "Rings" and "Bracelets". What I am unsure about is whether to use all the keywords in the main pages title tag or should I just use the main keyword "Jewelry online". I don’t want to create competition between my own pages of course. Jewelry Online - Trendy Fashion Jewelry | Homepage Or Jewelry Online - Necklace, Rings, Bracelets | Homepage And the same goes for the main categories, should I include "jewelry online" or not, like: Bracelets - Fashion Jewelry Online | Homepage Or Bracelets - Trendy_ Bangles_ and Arm Cuffs | Homepage Any suggestions what is the best practice for the title tag on main page and the main categories? Thanks
Intermediate & Advanced SEO | | ikomorin0 -
Should I completly remove the meta tags keywords in the html page?
So if the metag is not longer used by the search engines should I keep them in my html ?
Intermediate & Advanced SEO | | lomastravel0