Changes between Version 15 and Version 16 of HibernateSearch


Ignore:
Timestamp:
May 2, 2009, 8:15:17 AM (15 years ago)
Author:
erikvaningen
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HibernateSearch

    v15 v16  
    2222=== Analyzer ===
    2323
    24 What filters should we use in our Analyzer? What is necessary are at least: !StandardTokenizer, !StandardFilter, and !LowerCaseFilter.
     24Lucene offers a lot of functions in order to search more precise. It can be done by defining tokenizers and filters. Those filters and tokenisers have been chosen who makes the most sense. Using all filters makes no sense because there are too much and would be bad for the performance. Every installation of the Ebrim application can be easily configured with its own set of tokenizers and filters. 
    2525
    26 Will we use a !StopFilter and if so, how do we decide what (language-dependent) stopwords list to use?
     26These are the tokenizers and filters used:
    2727
    28 Do we use an NGramTokenFilter to help fuzzy searches? How is this better than using !FuzzyQuery at query time?
     28StandardTokenizer
     29The StandardTokenizer should support most needs for English (and most European languages) texts. It splits
     30words at punctuation characters and removing punctuation signs with a couple of exception rules
    2931
    30 Do we use an ISOLatin1AccentFilter to abstract over accented characters? (heikki: +1)
     32StandardFilter
     33The StandardFilter removes apostrophes and remove dots in acronyms.
    3134
    32 Do we use a !PhoneticFilter? If so how does this work, with different languages and all?
     35LowerCaseFilter
     36The LowerCaseFilter changes all characters to lower case.
    3337
    34 Do we use a !SynonymFilter? The language dependent issue is relevant here, again.
    35 
    36 Do we use a !SnowballFilter (stemming) ? Again, how will we deal with the different languages?
    37 
    38 We should look at the [http://trac.osgeo.org/geonetwork/wiki/MultilingualIndexMechanism multi-lingual Lucene use in SwissTopo]. François informs me that by next week the code should be in GN's SVN. They also do very interesting stuff using !GeoTools for more complex spatial queries, involving a !SpatialFilter in Lucene. We should carefully look at how this work is useful to our project.
     38ISOLatin1AccentFilterFactory
     39Abstract over accented characters.
    3940
    4041----