#476 closed enhancement (fixed)
Mini-codesprint : improvements to GeoNetworkAnalyzer
Reported by: | heikki | Owned by: | heikki |
---|---|---|---|
Priority: | trivial | Milestone: | v2.6.4 |
Component: | General | Version: | v2.6.3 |
Keywords: | lucene, stopwords, analyzer | Cc: |
Description
In 2.6.1 GeoNetworkAnalyzer was introduced instead of StandardAnalyzer because StandardAnalyzer breaks wildcard search.
However StandardAnalyzer does do many things, like tokenization when a word is followed by a bracket or a comma (with no space), recognition of acronyms like U.S.A., and more, quite nicely.
GeoNetworkAnalyzer can benefit of same goodies if it uses StandardTokenizer and StandardFilter, instead of WhitespaceTokenizer.
Wildcard-preservation is now handled in LuceneQueryBuilder, where wildcards are not analyzed (and made to disappear), only the text between wildcards is analyzed.
So basically GNA is now exactly like StandardAnalyzer, but with added ASCIIFoldingFilter (abstracts over accented characters) and if no stopwords are specified, none are used -- unlike StandardAnalyzer that uses default English stopwords.
Change History (3)
comment:1 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:3 by , 14 years ago
Milestone: | v2.7.0 → v2.6.4 |
---|
Committed to trunk, rev. 7489.