wiki:HibernateSearch

Version 12 (modified by heikki, 16 years ago) ( diff )

--

Hibernate Search

author: Heikki Doeleman

This page describes GeoNetwork's usage of the Hibernate Search library.


Introduction

Hibernate Search is a library that combines the strengths of full text search using Lucene with Hibernate's O/R mapping capabilities. Queries in Hibernate Search are expressed as wrappers around Lucene queries. Hibernate Search seems to offer 2 principal advantages over directly using Lucene plus a database: (1) Lucene information (about the index, analyzers to be used, etc.) is expressed using annotations on the domain objects involved; and (2) synchronization (re-indexing) is automatically triggered when Hibernate makes a change to the database.


Directory

A Lucene index is represented by a Directory. We will use a file system directory provider to persistently store the Lucene index; and we will use an in-memory directory provider to use with unit test.


Analyzer

What filters should we use in our Analyzer? What is necessary are at least: StandardTokenizer, StandardFilter, and LowerCaseFilter.

Will we use a StopFilter and if so, how do we decide what (language-dependent) stopwords list to use?

Do we use an NGramTokenFilter to help fuzzy searches? How is this better than using FuzzyQuery at query time?

Do we use an ISOLatin1AccentFilter to abstract over accented characters? (heikki: +1)

Do we use a PhoneticFilter? If so how does this work, with different languages and all?

Do we use a SynonymFilter? The language dependent issue is relevant here, again.

Do we use a SnowballFilter (stemming) ? Again, how will we deal with the different languages?

We should look at the multi-lingual Lucene use in !SwissTopo. François informs me that by next week the code should be in GN's SVN. They also do very interesting stuff using GeoTools for more complex spatial queries, involving a SpatialFilter in Lucene. We should carefully look at how this work is useful to our project.


Indexing

It seems straightforward to use an asynchronous thread in Hibernate Search to do the indexing. Will we use that approach ?

We should use transparent indexing except in the case of application start-up, where we must define some strategy.. the current way in GeoNetwork is to check if a Lucene index is present and if not, build it.

Note: See TracWiki for help on using the wiki.