Changes between Version 15 and Version 16 of HibernateSearch
- Timestamp:
- May 2, 2009, 8:15:17 AM (15 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
HibernateSearch
v15 v16 22 22 === Analyzer === 23 23 24 What filters should we use in our Analyzer? What is necessary are at least: !StandardTokenizer, !StandardFilter, and !LowerCaseFilter. 24 Lucene offers a lot of functions in order to search more precise. It can be done by defining tokenizers and filters. Those filters and tokenisers have been chosen who makes the most sense. Using all filters makes no sense because there are too much and would be bad for the performance. Every installation of the Ebrim application can be easily configured with its own set of tokenizers and filters. 25 25 26 Will we use a !StopFilter and if so, how do we decide what (language-dependent) stopwords list to use? 26 These are the tokenizers and filters used: 27 27 28 Do we use an NGramTokenFilter to help fuzzy searches? How is this better than using !FuzzyQuery at query time? 28 StandardTokenizer 29 The StandardTokenizer should support most needs for English (and most European languages) texts. It splits 30 words at punctuation characters and removing punctuation signs with a couple of exception rules 29 31 30 Do we use an ISOLatin1AccentFilter to abstract over accented characters? (heikki: +1) 32 StandardFilter 33 The StandardFilter removes apostrophes and remove dots in acronyms. 31 34 32 Do we use a !PhoneticFilter? If so how does this work, with different languages and all? 35 LowerCaseFilter 36 The LowerCaseFilter changes all characters to lower case. 33 37 34 Do we use a !SynonymFilter? The language dependent issue is relevant here, again. 35 36 Do we use a !SnowballFilter (stemming) ? Again, how will we deal with the different languages? 37 38 We should look at the [http://trac.osgeo.org/geonetwork/wiki/MultilingualIndexMechanism multi-lingual Lucene use in SwissTopo]. François informs me that by next week the code should be in GN's SVN. They also do very interesting stuff using !GeoTools for more complex spatial queries, involving a !SpatialFilter in Lucene. We should carefully look at how this work is useful to our project. 38 ISOLatin1AccentFilterFactory 39 Abstract over accented characters. 39 40 40 41 ----