How long does it take for customized Google Site Search to show results from pdf files?
-
The site in question is http://www.ejmh.eu
I am pretty unsatisfied with the results I am getting from the Site Search provided by Google.
We have over 160 pdf files in this subfolder: http://www.ejmh.eu/mellekletek
The files are the digital versions of articles. When I search for content in those pdf files, Google does not show results. It does show results from older pages, dating back 1-2 years but it is certainly not showing anything from pdf files that I have just put up 3 weeks ago.
My questions:
If I place a Google Search on a site, does it not automatically display results from ALL the content in the root domain?
Is there any correlation between how the Site Search is indexing the files and how Google is indexing the urls in general?
Should I just wait and see whether site search performance improves or should I switch to another Search software like Zoom Search?
It is vital to have a proper, high-quality search functioning on that site in the very near future.
What are your experiences? Any tips are greatly appreciated.
-
Hi, everyone: problem solved.
Here is what I did: I created a seperate sitemap-xml and linked to all the new pdfs.
I updated the general sitemap.xml and linked to the new sitemap as well.
I (re)submitted both sitempas via the Webmaster Tools.
Within a few hours, most of pdfs got indexed and the overall quality of search has improved dramatically. Thanks for all your help.
-
It may be a good idea to include all the pdf files on the sitemap, even if it is a troublesome process.
Otherwise it just takes too long for Google to index them.
What still surprises me is that even for a site search, you need to win the 'indexing battle'. I thought that Google indexes everythig within the map for the 'sake of the site search' and displays the results when a visitor is searching within the site. Less fancy softwares are actually doing the job. I thought a Google Site Search provides something even better.
-
Last crawl - thanks, great info.
yes, all new pdfs are linked from the html files.
This the summary page of one article: http://www.ejmh.eu/5archives_ppr_jaggle_061.html
In the middle of the page, you see 'download full text' - this is from where the individual papers (pdf) are linked.
-
Do you have the new PDFs Linked from pages like the old ones?
Try to create a page listing all the new PDFs, and basically Google might take time to recrawl your site and add these new PDFs ( by the way the last copy saved in Google Cache is from Feb 11)
-
You are great, thanks for your time. Yeah, I did check things out with this google command: there are pdf's listed but these are all old pdfs I have put up a long time ago. None of the pdfs I have put up recently are among those indexed.
Do you think that only those urls come up through a customized site search that are indexed by Google? Does Google not crawl the site and make a list of urls for the sake of the search purely? (Zoom search does it, for example) In theory, there could be two different type of 'crawls': one for the site search and one for the larger world, searching in the browser.
As for the settings...can you plase help me further: what exactly would you change?
-
if you check here all the pdf are indexed in google
so i will check the settings on CSE
reference here http://www.google.com/cse/docs/resultsxml.html#wsQueryTerms
-
Thanks for the tip, it's a good one. But they are all 100% texts.
-
If a search engine cannot read the text, due to it being a graphic and not text, then it won't be able to fully index the words on the document.
so make sure all your PDF are 100% text that was converted to a PDF and not a "Scan" (image) of the original document that was saved as a PDF
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My android website not showing on results
my website will index good،and its top on some few keywords but its not top for many keywords . << some time it showed on results but will be hidden after some time >> what you thing??? url is https://android-apk.org
Technical SEO | | moztabliq10 -
Homepage no longer showing in Google.co.uk
Hi guys, Has anyone ever had this before? My clients website was appearing 4 & 5 in Google.co.uk for the keyword 'voltage optimisation'. Since moving up in the rankings after furiously optimising the website we have now gotten to position 3 in Google.co.uk but the homepage ranking has disappeared and now is just displaying our 'what is voltage optimisation' page. I'm guessing Google feels that this page is more useful to someone searching 'voltage optimisation' but my client wants answers?! It's almost like Google have said, "if you want to take position 3 were only going to list one of your pages" because everytime we drop back down to position 4 our homepage appears position 5.
Technical SEO | | TWSI0 -
Why isn't my site not searchable from google?
I am having a hard time figuring out why is it that when I search for my website name, it didn't show up in google's search result? Here's a link to my site. I've been twiddling for days looking for answers in my google webmaster tools. Here's a link of the crawl stats from google webmaster tool. As you can see it is actually crawling some pages. However my looking at my indexed status, I am getting 0 as you can see here (http://cl.ly/image/3G1R1p0b3k1P). I've double checked for my robots.txt and nothing seemed to be out of the ordinary there. I am not blocking anything. Any ideas why?
Technical SEO | | herlamba0 -
Disallow: /search/ in robots but soft 404s are still showing in GWT and Google search?
Hi guys, I've already added the following syntax in robots.txt to prevent search engines in crawling dynamic pages produce by my website's search feature: Disallow: /search/. But soft 404s are still showing in Google Webmaster Tools. Do I need to wait(it's been almost a week since I've added the following syntax in my robots.txt)? Thanks, JC
Technical SEO | | esiow20130 -
Site links show spam
Hi folks, I'm working on a website that runs on WordPress and was not updated by the owner, this has resulted in a malware injection and now when you search the companies name in Google, the site links appear with words like Viagra, et al. I've seen this a number of times, so I went through the code and have removed all the malware. I presume I now have to wait for Google to recrawl the website and update the site links? Is there anything else I should be doing to speed up the process? Thank you 🙂
Technical SEO | | ChristopherM0 -
Site being indexed by Google before it has launched
We are currently coming towards the end of migrating one of our retail sites over to magento. To our horror, we find out today that some pages are already being indexed by Google, and we have started receiving orders through new site. Do you have any suggestions for what may have caused this? Or similarly, what the best solution would be to de-index ourselves? We most recently excluded anything with a certain parameter from robots.txt - could this being implemented incorrectly have caused this issue? Thanks
Technical SEO | | Sayers0 -
Duplicate Content for our Advertising Sites Showing in Search Results
Hello, My company has a couple different sites (Magento Stores) for Organic, Adwords and AdCenter purposes.They are mirror sites of each except for phone number, contact form, ect. Here is our organic site: http://www.oxygenconcnetratorstore.com/ Adwords and Adcenter site respectively: http://www.oxygenconcnetratorstore.com/portable/
Technical SEO | | chuck-layton
http://www.oxygenconcnetratorstore.com/oxygen/ The problem is, both the Adwords and AdCenter stores appear in Google SERP when you put in the exact URL. I have "noindex/nofollow" tag on both the advertising sites but they are still showing in search results. I feel we are getting hurt for basically have 3 sites of duplicate content. Is there a reason why the sites would be showing in search results even with the nofollow/index tags?? Any help would be awesome. Thanks. seomoz.jpg0 -
Explain this search result
Hi folks, I came across a strange search result. Search on Google Australia for "income portfolio". http://www.google.com.au/search?sourceid=chrome&ie=UTF-8&q=income+portfolio See the first result? It's a login page. How is that search result showing? And in position #1! Where is it getting its title and descriptions tags from? Does Google have a way to somehow see what is behind the login? Appreciate your thought.
Technical SEO | | scotennis0