Does google scrape links from PDF files? do these links pass link juice?
-
Title is pretty much the whole question.
-
I made a test and it seems that yes, the links from pdf count for ranking.
The test is on my Romanian blog http://seogan.ro/link-building-pdf-urile-o-sursa-de-linkuri-test
You can find an English translation here: http://www.seogan.com/pdf-link-building
Hope it helps.
-
Yes it does according to Google tech spec http://code.google.com/apis/searchappliance/documentation/50/admin_crawl/Introduction.html
which specifically states if follows html links in pdf 'It follows HTML links in PDF files, Word documents, and Shockwave documents'. Google's own api docs carry more weight than a comment in a forum_._ If they are licencing this out as an application it would suggest that the same technology is available in the main engine as does Dunamis's comment about a listing in a pdf document being found in search results.
You can test for youself by publishing a pdf with a link to a info page that does not show up in any other links. Include the pdf in your sitemap but not the test page and check if it shows in googles index site:yoursite.com the next time it crawls.
This also gives some insight in an interview with Matt Cutts - http://www.stonetemple.com/articles/interview-matt-cutts-012510.shtml
Eric Enge: What about PDF files?
Matt Cutts: We absolutely do process PDF files. I am not going to talk about whether links in PDF files pass PageRank. But, a good way to think about PDFs is that they are kind of like Flash in that they aren't a file format that's inherent and native to the web, but they can be very useful. In the same way that we try to find useful content within a Flash file, we try to find the useful content within a PDF file. At the same time, users don't always like being sent to a PDF. If you can make your content in a Web-Native format, such as pure HTML, that's often a little more useful to users than just a pure PDF file.
-
This person seems to think no: http://www.google.fr/support/forum/p/Webmasters/thread?tid=14c5fe970fe84361&hl=en
but i'm not sure how much i can trust a random comment from a random source. any evidence for either argument?
EDIT: And this person seems to think they do pass link juice: http://www.whydowork.com/blog/link-building/274/
Could a mod remove the marked as answered? i don't think i am able to remove it, and the question isn't really answered.
-
yes, but do they crawl the links they find in these documents, or do they just index their contents.
-
Hmmm although i thought you had answered my question, i actually feel that you have not... Yes the links you provided state that google scrapes pdfs and even OCRs pdfs to get a better idea what is in them, but i don't see anywhere that they mention crawling the urls they find in these pdf documents.
-
Google definitely does index the contents of pdf files. I found this out the hard way as I had a real estate pdf on my site that I wanted to have listed in the index, but I didn't know that the contents would be crawled. The pdf contained some listings that I was not legally allowed to advertise on my site. (It was legal for me to give someone a report with the listings in it though).
When another realtor was searching for their own listing, my pdf came up. I got in trouble. I'm ok now though.
-
Have a look at this article http://searchenginewatch.com/article/2067225/Google-Does-PDF-Other-Changes it explains some of the doc library search for pdf files and Google's statement here http://googleblog.blogspot.com/2008/10/picture-of-thousand-words.html.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Shortened URLs Passing Link Equity?
Hi everyone, I was going through a competitor's backlinks, and I noticed they had a number of links from ow.ly and bit.ly that according to Open Site Explorer were passing link equity with PA in the 40 and DA in the 90s. How does that happen? And, how can I duplicate that? I thought those services just shortened your URLs for Twitter feeds mostly, and Twitter no-followed everything. Thanks for any assistance you can provide! Ruben
Link Building | | KempRugeLawGroup0 -
Spammy links
Hi Guys, I have a case which seems to occur more often for our customers. The websites of our customers seem to receive tons of backlinks from websites all over the world (China, Russia, Ukrain, etc). It’s spam we never asked for, we didn’t buy any dodgy linkbuilding packages or anything. Do any of you guys have experience with this matter? We try to disavow the links but it takes too much time and we will never manage to disavow 100% of all links. Examples are www.keukensduitsland.nl and www.m2beveiliging.nl Hope anyone has experience and maybe even solutions for this matter. Thanks!
Link Building | | Happy-SEO1 -
Any benefits to having Wikipedia links now they are 'no-followed' (apart from traffic and natural link prof.)
I see that Wikipedia outbound links are all no-followed, is there any benefit (aside from the traffic) for having links here now ? For example is their co-citation and co-occurance benefits. I know there is without the links since from seeing previous Moz content about this saying Google getting good at connecting brand/s and topic mentions on a page (without any links) so appreciate Wikipedia is still good for that sort of thing. And a no-followed link is obviously good for the potential traffic. But is there any additional SEO benefit to having a no followed link on a wikipedia entry/stub too ? (aside from its contribution to your no-followed links which in turn contribute to a natural looking link profile) Cheers Dan
Link Building | | Dan-Lawrence0 -
Wierd link
We have recently receive linek from one website. The site is quite very powerful for the word we try to rank, however the link is kind of wired. This is how it looks like: http://villasdiani.com/?db The webmaster of the site said that he ads (?db) on the end so as we would see on our analytics the traffic from his site.. Also the link is from footer...is this good or it does not have value
Link Building | | VillasDiani0 -
Link Building
Guest blogging, guest blogging, guest blogging. Since I started my career as a "brand manager" I've heard the term "guest blogging" at least a million times. So I've put a fair amount of energy into it and for a long time it worked beautifully, still is in many ways. However, in the last month or two nearly every blog I have contacted about guest blogging has said that "due to an increase in guest blogging request we are now charging a fee of x" so on and so forth. Doesn't paying for links put you at great risk for being deindexed? And can't bloggers get in trouble for this as well? Do they not know, not care or think it doesn't apply to them? And if it's a sponsored post, say I send them $100 of free product and pay them $100 to do it, isn't that just hiring someone to talk about my brand? Why would google punish me for that? Anyway around it? Thank you so much! I look forward to your suggestions/advice/criticism.
Link Building | | WNL0 -
Can high SERPS and/or social signals minimize Google penalties and a back linking removal question
As I am continually sizing up my competition in the SERPS I have scanned their sites with a fine tooth and comb. I have found that these sites practice in the very things that I have practiced in the past and have removed thinking that may be some of the reasons I was hit with Penguin. Some of these factors are: Link Scheme with sites they own (C Blocks) Content for Search Engines (Keyword rich text) Exact anchor text in back linking profile Yet even though my competition practices in these methods (One site even places exact anchor text in the footer and header of every page for one of their other forum site) they seem to have not even been touched with any of the recent updates. In fact it seems their ranking have increased. In scanning these sites the only major difference that I have been able to see between them and I is that their SERPS are higher than mine and they have way more social signals than me. One site has about 73k facebook likes where I only have about 300. My question is Can Google ignore penalties for sites that have higher SERPS and /or social signals that would effect another site that had lower ones? My other question is related to back links My main site has links from another site I built a long time ago (Pre SEO and not knowing what I was doing) somewhere in the 73k range. Obviously a HUGE signal to Google that this might be spam and I am aware. I have removed the links from that site but unfortunately the average crawl rate per day is very low so it is taking a very long time for Google to find those pages and re-crawl them to find the links gone. Since that site I have than has those links pointing to my main site has very low traffic I am totally willing to kill that entire site with a 404. Can this help speed up the removal of those links from that site? I figure since the site no longer exists all links from that site will be removed almost immediately from my main site. Any thoughts?
Link Building | | cbielich0 -
Blog traffic / link ratio? (Esimated of how much traffic will result in a link)
Hi, Was wondering if people could please tell me some estimates of how much traffic is likely to gain links to a blog post? For example 1,000 hits = 1 link, Hence 10,000 hits = 10 links to a blog post? I understand there is no magic ratio I just want to know what people have achieved. I’m after averages not just a one off really successful blog post too. Please specify the topic you achieve this in e.g. SEO, photography, business, heath... etc.
Link Building | | charles10 -
700,000+ Google Webmaster Messages Sent - unnatural linking Profile. What to do?
Google has confirmed they have sent out of over 700,000 messages through its google webmaster interface in February. Thats more than what they sent in all of 2011. Where does this leave us? What have we done wrong? What works going forward? Im sure many business's will be left in a very bad position over this update, people will lose their jobs. I always considered myself to be very careful with my link building as I am totally reliant on search for my business. I think something so big requires a better explanation from google. Has SEOmoz any more info on such a big update? This really needs input from the big SEO heads.
Link Building | | dean19860