Opened 13 years ago

Closed 13 years ago

Last modified 13 years ago

#476 closed enhancement (fixed)

Mini-codesprint : improvements to GeoNetworkAnalyzer

Reported by: heikki Owned by: heikki
Priority: trivial Milestone: v2.6.4
Component: General Version: v2.6.3
Keywords: lucene, stopwords, analyzer Cc:

Description

In 2.6.1 GeoNetworkAnalyzer was introduced instead of StandardAnalyzer because StandardAnalyzer breaks wildcard search.

However StandardAnalyzer does do many things, like tokenization when a word is followed by a bracket or a comma (with no space), recognition of acronyms like U.S.A., and more, quite nicely.

GeoNetworkAnalyzer can benefit of same goodies if it uses StandardTokenizer and StandardFilter, instead of WhitespaceTokenizer.

Wildcard-preservation is now handled in LuceneQueryBuilder, where wildcards are not analyzed (and made to disappear), only the text between wildcards is analyzed.

So basically GNA is now exactly like StandardAnalyzer, but with added ASCIIFoldingFilter (abstracts over accented characters) and if no stopwords are specified, none are used -- unlike StandardAnalyzer that uses default English stopwords.

Change History (3)

comment:1 by heikki, 13 years ago

Resolution: fixed
Status: newclosed

Committed to trunk, rev. 7489.

comment:2 by heikki, 13 years ago

Committed to 2.6.x, rev. 7490.

comment:3 by heikki, 13 years ago

Milestone: v2.7.0v2.6.4
Note: See TracTickets for help on using tickets.