Context Navigation

← Previous Ticket
Next Ticket →

#476 closed enhancement (fixed)

Mini-codesprint : improvements to GeoNetworkAnalyzer

Reported by:	heikki	Owned by:	heikki
Priority:	trivial	Milestone:	v2.6.4
Component:	General	Version:	v2.6.3
Keywords:	lucene, stopwords, analyzer	Cc:

Description

In 2.6.1 GeoNetworkAnalyzer was introduced instead of StandardAnalyzer because StandardAnalyzer breaks wildcard search.

However StandardAnalyzer does do many things, like tokenization when a word is followed by a bracket or a comma (with no space), recognition of acronyms like U.S.A., and more, quite nicely.

GeoNetworkAnalyzer can benefit of same goodies if it uses StandardTokenizer and StandardFilter, instead of WhitespaceTokenizer.

Wildcard-preservation is now handled in LuceneQueryBuilder, where wildcards are not analyzed (and made to disappear), only the text between wildcards is analyzed.

So basically GNA is now exactly like StandardAnalyzer, but with added ASCIIFoldingFilter (abstracts over accented characters) and if no stopwords are specified, none are used -- unlike StandardAnalyzer that uses default English stopwords.

Change History (3)

comment:1 by heikki, 13 years ago

Resolution:	→ fixed
Status:	new → closed

Committed to trunk, rev. 7489.

comment:2 by heikki, 13 years ago

Committed to 2.6.x, rev. 7490.

comment:3 by heikki, 13 years ago

Milestone:	v2.7.0 → v2.6.4

Note: See TracTickets for help on using tickets.

Download in other formats: