Google Rewriting PDF Titles
-
Has anyone else noticed Google rewriting the title of PDF documents?
-
Sure Wayne.
While there are differences between a web page and a PDF, from the concept of how Google handle's the data there is little difference. A crawler reads text and processes the data, which is then ranked and appears in search results. The same basic rules apply.
Here is an example:
-
Go to the following URL: http://centerforhealthysex.com/wp-content/uploads/. You can see this site allows the contents of this folder to be displayed (not a recommended practice).
-
Notice the first pdf file in the list: "alexandra-katehakis-biography.pdf"
-
Go to Google.com and search for the following without quotes: ".pdf site:centerforhealthysex.com". Notice the title shows as "download bio pdf - Center for Healthy Sex".
-
Return to Google.com and search for "alexandra katehakis biography". You will see the same file now has a title of "Alexandra Katehakis is a licensed Marriage, Family Therapist ..." In this case, Google grabbed the first line of text and used it as the title.
You can repeat this type of testing with almost any pdf or web page.
-
-
Yes, I've seen it with web pages but this is my first experience with PDF's. Anyone else seeing this?
-
Google reserves the right to change titles to represent what they feel is most appropriate for the user. A pdf document online is similar to a web page in that regard.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Penalty and Adwords
Hi guys, I am wondering if the google manual penalty or penalty in general (because of bad backlink profile) also means that your website is blocked for Adwords? Thanks
Technical SEO | | barobijav0 -
Google not Indexing images on CDN.
My URL is: https://bit.ly/2hWAApQ We have set up a CDN on our own domain: https://bit.ly/2KspW3C We have a main xml sitemap: https://bit.ly/2rd2jEb and https://bit.ly/2JMu7GB is one the sub sitemaps with images listed within. The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: https://bit.ly/2FAWJjk. Yet, GWT still reports none of our images on the CDN are indexed. I ve followed all the steps and still none of the images are being indexed. My problem seems similar to this ticket https://bit.ly/2FzUnBl but however different because we don't have a separate image sitemap but instead have listed image urls within the sitemaps itself. Can anyone help please? I will promptly respond to any queries. Thanks
Technical SEO | | TNZ
Deepinder0 -
Duplicate Title Tag
We are getting a Duplicate Title Tag error on our pages but we have different titles and the differences are being seen by Google. We are using the code <%@ Page Title="School Lunch Software Pricing || EZ School Apps"%> Any ideas?
Technical SEO | | EZParentCenter0 -
Dup Title tags
I am frustrated....Google Webmaster tools shows this as dup title tags....I've fixed other oages with this issue, but can't figure this out?! here is the page itself... http://www.seadwellers.com/tag/padi-1/ I can't figure out where this freakin page even iS?! | 2 |
Technical SEO | | sdwellers
| <a id="zip_1-anchor" class="zippedsection_title"></a>padi Archives - Sea Dwellers Dive Center of Key Largo, Florida Keys/category/padi//tag/padi/ | Any help with this thing wold be greatly appreciated...0 -
Hit by Google
My site - www.northernlightsiceland.com - has been hit by google and Im not sure why. The traffic dropped 75% last 24 hours and all the most important keywords have dropped significantly in the SERP. The only issue I can think of are the subpages for the northern lights forecasting I did every day e.g. http://www.northernlightsiceland.com/northern-lights-forecast-iceland-3-oct-2012/ I have been simply doing a copy/paste for 1 month the same subpage, but only changing the top part (Summary) for each day. Could this be the reason why Im penalized? I have now simply taken them all down minus the last 3 days (that are relevant). What can I do to get up on my feet again? This is mission critical for me as you can imagine. Im wondering if it got hit by this EMD update on 28 sept that was focusing on exact match domains http://www.webmasterworld.com/google/4501349-1-30.htm
Technical SEO | | rrrobertsson0 -
What else: struggling with google position
Hi. I understand everyone is offering their time for free here so any advice or support is much appreciated. http://www.cytronex.com
Technical SEO | | AdamJamesCytronex
PA 44 || mR 4.6 || mT 5.73 || 986 links from 43 Root Domains
DA 33 || 3,942 links from 71 Domains We've dropped from position 25ish to position 70ish in keyword searches for 'electric bikes'. I've tried everything and I just don't understand! It's genuine content, the actual product is increasingly popular, we have several links from sites which are (well, to my mind) reasonable quality. I've only just been brought in to look at this and my lack of any SEO or web experience is not putting my boss off expecting an instant solution 😞 As I'm only just getting to grips with it, Analytics was only installed about a month ago so I can't pin point a moment when it dropped. We're consistently out-positioned by sites with lower PA/DA scores. Any insight anyone might have would be amazing! Thanks
Adam0 -
How to block google robots from a subdomain
I have a subdomain that lets me preview the changes I put on my site. The live site URL is www.site.com, working preview version is www.site.edit.com The contents on both are almost identical I want to block the preview version (www.site.edit.com) from Google Robots, so that they don't penalize me for duplicated content. Is it the right way to do it: User-Agent: * Disallow: .edit.com/*
Technical SEO | | Alexey_mindvalley0