How long does it take for customized Google Site Search to show results from pdf files?
-
The site in question is http://www.ejmh.eu
I am pretty unsatisfied with the results I am getting from the Site Search provided by Google.
We have over 160 pdf files in this subfolder: http://www.ejmh.eu/mellekletek
The files are the digital versions of articles. When I search for content in those pdf files, Google does not show results. It does show results from older pages, dating back 1-2 years but it is certainly not showing anything from pdf files that I have just put up 3 weeks ago.
My questions:
If I place a Google Search on a site, does it not automatically display results from ALL the content in the root domain?
Is there any correlation between how the Site Search is indexing the files and how Google is indexing the urls in general?
Should I just wait and see whether site search performance improves or should I switch to another Search software like Zoom Search?
It is vital to have a proper, high-quality search functioning on that site in the very near future.
What are your experiences? Any tips are greatly appreciated.
-
Hi, everyone: problem solved.
Here is what I did: I created a seperate sitemap-xml and linked to all the new pdfs.
I updated the general sitemap.xml and linked to the new sitemap as well.
I (re)submitted both sitempas via the Webmaster Tools.
Within a few hours, most of pdfs got indexed and the overall quality of search has improved dramatically. Thanks for all your help.
-
It may be a good idea to include all the pdf files on the sitemap, even if it is a troublesome process.
Otherwise it just takes too long for Google to index them.
What still surprises me is that even for a site search, you need to win the 'indexing battle'. I thought that Google indexes everythig within the map for the 'sake of the site search' and displays the results when a visitor is searching within the site. Less fancy softwares are actually doing the job. I thought a Google Site Search provides something even better.
-
Last crawl - thanks, great info.
yes, all new pdfs are linked from the html files.
This the summary page of one article: http://www.ejmh.eu/5archives_ppr_jaggle_061.html
In the middle of the page, you see 'download full text' - this is from where the individual papers (pdf) are linked.
-
Do you have the new PDFs Linked from pages like the old ones?
Try to create a page listing all the new PDFs, and basically Google might take time to recrawl your site and add these new PDFs ( by the way the last copy saved in Google Cache is from Feb 11)
-
You are great, thanks for your time. Yeah, I did check things out with this google command: there are pdf's listed but these are all old pdfs I have put up a long time ago. None of the pdfs I have put up recently are among those indexed.
Do you think that only those urls come up through a customized site search that are indexed by Google? Does Google not crawl the site and make a list of urls for the sake of the search purely? (Zoom search does it, for example) In theory, there could be two different type of 'crawls': one for the site search and one for the larger world, searching in the browser.
As for the settings...can you plase help me further: what exactly would you change?
-
if you check here all the pdf are indexed in google
so i will check the settings on CSE
reference here http://www.google.com/cse/docs/resultsxml.html#wsQueryTerms
-
Thanks for the tip, it's a good one. But they are all 100% texts.
-
If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.
so make sure all your PDF are 100% text that was converted to a PDF and not a "Scan" (image) of the original document that was saved as a PDF
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google search console showing canonical issues
I have a problem , canonical tags present in my web pages, but still google search console showing canonical issue for example check this page https://kilid.com
Technical SEO | | ParastooDezyani0 -
Canonical Url Structure Vs. Google Search View
I recently set up a new site and set the "preferred" domain in Google Webmasters to show URLs WITHOUT the WWW for google search purposes. In the confirmation email from google, this confused me: "This setting defines which host - www or not - should be considered the canonical host when indexing your site." In the website, we have cononical URLS at the top of every page in the header, but still have the WWW in those. Any issues with that?
Technical SEO | | vikasnwu0 -
I added a WP Customer Reviews plugin but nothing seems to appear on Google search
Hi, I've added the wordpress Wp Customer Reviews plugin to a my client's website and we brought some past clients to put on reviews in order to empower the hReview factor. Google as scraped the website several times since but we don't see any change in the organic serp. Can you please tell me if I've done something wrong or I forgot something? That's the website - Capital Garage Door Thanks!
Technical SEO | | captainjoe0 -
Google not indexing /showing my site in search results...
Hi there, I know there are answers all over the web to this type of question (and in Webmaster tools) however, I think I have a specific problem that I can't really find an answer to online. site is: www.lizlinkleter.com Firstly, the site has been live for over 2 weeks... I have done everything from adding analytics, to submitting a sitemap, to adding to webmaster tools, to fetching each individual page as googlebot and then submitting to index via webmaster tools. I've checked my robot files and code elsewhere on the site and the site is not blocking search engines (as far as I can see) There are no security issues in webmaster tools or MOZ. Google says it has indexed 31 pages in the 'Index Status' section, but on the site dashboard it says only 2 URLS are indexed. When I do a site:www.lizlinketer.com search the only results I get are pages that are excluded in the robots file: /xmlrpc.php & /admin-ajax.php. Now, here's where I think the issue stems from - I developed the site myself for my wife and I am new to doing this, so I developed it on the live URL (I now know this was silly) - I did block the content from search engines and have the site passworded, but I think Google must have crawled the site before I did this - the issue with this was that I had pulled in the Wordpress theme's dummy content to make the site easier to build - so lots of nasty dupe content. The site took me a couple of months to construct (working on it on and off) and I eventually pushed it live and submitted to Analytics and webmaster tools (obviously it was all original content at this stage)... But this is where I made another mistake - I submitted an old site map that had quite a few old dummy content URLs in there... I corrected this almost immediately, but it probably did not look good to Google... My guess is that Google is punishing me for having the dummy content on the site when it first went live - fair enough - I was stupid - but how can I get it to index the real site?! My question is, with no tech issues to clear up (I can't resubmit site through webmaster tools) how can I get Google to take notice of the site and have it show up in search results? Your help would be massively appreciated! Regards, Fraser
Technical SEO | | valdarama0 -
How to get found on local google search?
Hey When look for particular local businesses on Google like "Manchester Hotels" for example; at the top of Google page come all business up that are marked on the city map (with A,B,C..), then all the others follow. So my question is: "How can I get my business on that map?". Thank you! Ve
Technical SEO | | MissVe0 -
Google having trouble accessing my site
Hi google is having problem accessing my site. each day it is bringing up access denied errors and when i have checked what this means i have the following Access denied errors In general, Google discovers content by following links from one page to another. To crawl a page, Googlebot must be able to access it. If you’re seeing unexpected Access Denied errors, it may be for the following reasons: Googlebot couldn’t access a URL on your site because your site requires users to log in to view all or some of your content. (Tip: You can get around this by removing this requirement for user-agent Googlebot.) Your robots.txt file is blocking Google from accessing your whole site or individual URLs or directories. Test that your robots.txt is working as expected. The Test robots.txt tool lets you see exactly how Googlebot will interpret the contents of your robots.txt file. The Google user-agent is Googlebot. (How to verify that a user-agent really is Googlebot.) The Fetch as Google tool helps you understand exactly how your site appears to Googlebot. This can be very useful when troubleshooting problems with your site's content or discoverability in search results. Your server requires users to authenticate using a proxy, or your hosting provider may be blocking Google from accessing your site. Now i have contacted my hosting company who said there is not a problem but said to read the following page http://www.tmdhosting.com/kb/technical-questions/other/robots-txt-file-to-improve-the-way-search-bots-crawl/ i have read it and as far as i can see i have my file set up right which is listed below. they said if i still have problems then i need to contact google. can anyone please give me advice on what to do. the errors are responce code 403 User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/0 -
Noindex search result pages Add Classifieds site
Dear All, Is it a good idea to noindex the search result pages of a classified site?
Technical SEO | | te_c
Taking into account that category pages are also search result pages, I would say it is not a good idea, but the whole information is in the sitemap, google can index individual listings (which are index, follow) anyway. What would you do? What kind of effects has in the indexing of the site, marking the search result pages as "search results" with schema.org microdata? Many thanks for your help, Best Regards, Daniel0 -
Site being indexed by Google before it has launched
We are currently coming towards the end of migrating one of our retail sites over to magento. To our horror, we find out today that some pages are already being indexed by Google, and we have started receiving orders through new site. Do you have any suggestions for what may have caused this? Or similarly, what the best solution would be to de-index ourselves? We most recently excluded anything with a certain parameter from robots.txt - could this being implemented incorrectly have caused this issue? Thanks
Technical SEO | | Sayers0