Getting Google to index a large PDF file
-
Hello! We have a 100+ MB PDF with multiple pages that we want Google to fully index on our server/website. First of all, is it even possible for Google to index a PDF file of this size?
It's been up on our server for a few days, and my colleague did a Googlebot fetch via Webmaster Tools, but it still hasn't happened yet.
My theories as to why this may not work:
A) We have no actual link(s) to the pdf anywhere on our website.
B) This PDF is approx 130 MB and very slow to load. I added some compression to it, but that only got it down to 105 MB.
Any tips or suggestions on getting this thing indexed in Google would be appreciated. Thanks!
-
- Link to it
- Add it to sitemap
- Compress it and make it load faster
My guess is that the file size is too large. Why not turn it into an HTML page? Or several pages?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google is indexing our old domain
We changed our primary domain from vivitecsolutions.com to vivitec.net. Google is indexing our new domain, but still has our old domain indexed too. The problem is that the old site is timing out because of the https: Thought on how to make the old indexing go away or properly forward the https?
Technical SEO | | AdsposureDev0 -
Help with Getting Googlebot to See Google Charts
We received a message from Google saying we have an extremely high number of URLs that are linking to pages with similar or duplicate content. The main difference between these pages are the Google charts we use. It looks like Google isn't able to see these charts (most of the text are very similar) and the charts (lots of it) are the main differences between these pages. So my question is what is the best approach to allowing Google to see the data that exists in these charts? I read from here http://webmasters.stackexchange.com/questions/69818/how-can-i-get-google-to-index-content-that-is-written-into-the-page-with-javascr that a solution would be to have the text that is displayed on the charts coded into the html and hidden by CSS. I'm not sure but it seems like a bad idea to have it seen by Google but hidden to the user by CSS. It just sounds like a cloaking hack. Can someone clarify if this is even a solution or is there a better solution?
Technical SEO | | ERICompensationAnalytics1 -
Why my website does not index?
I made some changes in my website after that I try webmaster tool FETCH AS GOOGLE but this is 2nd day and my new pages does not index www. astrologersktantrik .com
Technical SEO | | ramansaab0 -
Getting mixed signals regarding how Google treats subdomains
All the posts I've read here and elsewhere regarding subdomains come to a similar conclusion, avoid using them because they are treated as a separate site -- and everything that goes along with that. But on my site we have a subdomain on a separate server and it's treated as internal. Also this from Hubspot - "**Use a subdomain of your website like Blog.HubSpot.com. **This is a great idea and this is what we do currently at HubSpot. Many companies have their blog on a subdomain, and it seems to be starting to be somewhat of a standard. The search engines are treating subdomains more and more as just portions of the main website, so the SEO value for your blog is going to add to your main website domain." Any help clarifying this would be greatly appreciated!
Technical SEO | | titleist1 -
Modx revolution- getting around index.php vs. root duplicate content issue?
Basically, SEOMoz bots are flagging our index.php and root files as duplicate content of one another, therefore cutting the page authority of each. What we want to do is make the root the canonical preference over index.php. Ordinarily, we should be able to do this in the htaccess file. For some reason, as the site has been built into a cms using ModX Revolution, this does not seem to work. We've tried A TON of htaccess rewrite mods to resolve this issue to no avail. We have also tried revising our sitemap to include only the root address. Any ideas? We'll try most anything at this point. Thanks in advance.
Technical SEO | | G2W0 -
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?
I was googling the word "best web hosting" and i notice the 1st and 3rd result were results with google plus. Does Google plus now play a role in improving ranking for the website?I see a person's name next to the website too
Technical SEO | | mainguy0 -
Pages not indexed by Google
We recently deleted all the nofollow values on our website. (2 weeks ago) The number of pages indexed by google is the same as before? Do you have explanations for this? website : www.probikeshop.fr
Technical SEO | | Probikeshop0 -
Page not being indexed
Hi all, On our site we have a lot of bookmaker reviews, and we are ranking pretty good for most bookmaker names as keywords, however a single bookmaker seems to have been shunned by Google. For a search "betsafe" in Denmark, this page does not appear among the top 50: http://www.betxpert.com/bookmakere/betsafe All of our other review pages rank in top 10-20 for the bookmaker name as keyword. What to do if Google has "banned" a page? Best regards, Rasmus
Technical SEO | | rasmusbang0