Getting Google to index a large PDF file
-
Hello! We have a 100+ MB PDF with multiple pages that we want Google to fully index on our server/website. First of all, is it even possible for Google to index a PDF file of this size?
It's been up on our server for a few days, and my colleague did a Googlebot fetch via Webmaster Tools, but it still hasn't happened yet.
My theories as to why this may not work:
A) We have no actual link(s) to the pdf anywhere on our website.
B) This PDF is approx 130 MB and very slow to load. I added some compression to it, but that only got it down to 105 MB.
Any tips or suggestions on getting this thing indexed in Google would be appreciated. Thanks!
-
- Link to it
- Add it to sitemap
- Compress it and make it load faster
My guess is that the file size is too large. Why not turn it into an HTML page? Or several pages?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to get into Google's Tops Stories?
Hi All, I have been doing research for a few weeks and I cannot for the life of me figure out why I cannot get my website (Racenet) into the top stories in Google. We are in Google News, have "news article" schema, have AMP pages. Our news articles also perform quite well organically and we typically dominate the Google News section. We have two main competitors (Punters and Just Horse Racing) who are both in top stories and I cannot find anything that we are doing that they aren't. Apparently the AMP "news article" schema is incorrect and that could be the reason why we aren't showing up in Google Top Stories, but I can't find anything wrong with the schema and it looks the same as our competitors. For example: https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fwww.racenet.com.au%2Fnews%2Fblake-shinn-booked-to-ride-doncaster-handicap-favourite-alizee-20190331%3FisAmp%3D1 Does anyone have any ideas of why I cannot get my site into Google Top Stories? Any and all help would be greatly appreciated. Thanks! 🙂
Technical SEO | | Saba.Elahi.M.0 -
Issues with getting a web page indexed
Hello friends, I am finding it difficult to get the following page indexed on search: http://www.niyati.sg/mobile-app-cost.htm It was uploaded over two weeks back. For indexing and trouble shooting, we have already done the following activities: The page is hyperlinked from the site's inner pages and few external websites and Google+ Submitted to Google (through the Submit URL option) Used the 'Fetch and Render' and 'Submit to index' options on Search Console (WMT) Added the URL on both HTML and XML Sitemaps Checked for any crawl errors or Google penalty (page and site level) on Search Console Checked Meta tags, Robots.txt and .htaccess files for any blocking Any idea what may have gone wrong? Thanks in advance!
Technical SEO | | RameshNair
Ramesh Nair0 -
Google Index Speed Opinions
Hello Everyone, Under normal circumstances, new posts to my site are indexed almost instantly by Google. I know this because an occasional search with quotation marks surrounding the 1st paragraph of text displays my newly published page. I use this tactic from time to time to ensure contributors aren't syndicating content. My question is this: I've noticed over the last day or so that my newly published articles are not yet indexed. For example, an article that was published over 24 hours ago does not appear to be indexed yet. Is this cause for concern? Is there an average wait time for indexation? XML issue? Thanks in advance for the help/insight.
Technical SEO | | JSOC0 -
Getting images indexed in the SERPS
Good Afternoon form 13 degrees C totally Sunny Wetherby UK 🙂 Am i right in thinking that the only way to get images appearing like this in your serps: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/innovia-merchant-immages-serpscopy.jpg is to be hooked up to Google Merchant? Which kind of means if the sight your working on has no images then this type of enhancement is out of bounds? Thanks in advance, David
Technical SEO | | Nightwing0 -
Google Indexing
Hi Everybody, I am having kind of an issue when it comes to the results Google is showing on my site. I have a multilingual site, which is main language is Catalan. But of course if I am looking results in Spanish (google.es) or in English (google.com) I want Google to show the results with the proper URL, title and descriptions. My brand is "Vallnord" so if you type this in Google you will be displayed the result in Catalan (Which is not optimized at all yet) but if you search "vallnord.com/es" only then you will be displayed the result in Spanish What do I have to do in order for Google to read this the way I want? Regards, Guido.
Technical SEO | | SilbertAd0 -
Google Sandboxing
I have a new site with a new domain that ranked well the 1st week or so after it was indexed then it totally dropped off the SERP. My question is, does Google Sandboxing affect new sites on new domains that don't have any incoming links? The site dropped off before I began link building - from what I've read unnatural link build is often the cause. Can you still be sandboxed without any link building? If this is the case, are there things I can do to get out of the sandbox? Thanks folks, Jason
Technical SEO | | OptioPublishing0 -
Https indexed - though a no index no follow tag has been added
Hi, The https-pages of our booking section are being indexed by Google. We added But the pages are still being indexed. What can I do to exclude these URL's from the Google index? Thank you very much in advance! Kind regards, Dennis Overbeek ACSI Publishing | dennis@acsi.eu
Technical SEO | | SEO_ACSI0 -
Why Google did not index our domain?
Hi, We launched tmart 60 days ago and submitted to google, bing, yahoo 20 days later. But google had never indexed our website still when yahoo indexed it in one week. What we have checked or tried: 1. We got 20~50 inlinks in one month and now 81 inlinks via yahoo site explorer. 2. This domain has registered for 13 years and we purchased it from sedo last year. We
Technical SEO | | zt673
did not find any problems from domain archive pages. 3. Page similar: the homepage is 50% similar to one of our competitors when we just launched.
So we adjusted the page structure and modified the content one month later and decreased the similarity to 30% (by tools from webconfs.com) 4. Google Robots: googlebot crawled our website every day after we submitted for indexing.
We opened GWT account for it and added the xml sitemap last week. GWT said nothing
was wrong except the time of page loading. Our questions: Why google did not indexed our website? What should we do? Thanks, wu0