Opened 12 years ago

#1185 new enhancement

Lucene / Add Synonym support

Reported by: fxp Owned by: geonetwork-devel@…
Priority: major Milestone: Future release
Component: General Version: v2.8.0RC2
Keywords: Cc:

Description

In order to support synonym in search, the following is proposed:

A new analyzer is introduced based on the following analyzer chain:

  • KeywordTokenizer
  • SynonymFilter
  • TypeTokenFilter (optional)

Analyzer parameters:

  • file : the synonym rule file (based on SolrSynonymParser format)
  • ignoreCase
  • keepOnlyMatchingTerms : to only keep field value if one match is found

Synonym rule example:

Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus => Petromyzon marinus

Field indexing is made as usual, in index-fields.xsl:

<Field name="faoFish" string="{string(.)}" store="false" index="true"/>

Field analyzer configuration in lucene-config.xml

<Field name="faoFish" analyzer="org.fao.geonet.kernel.search.analysis.SynonymAnalyzer">
      <Param name="file" type="java.io.File" value="/tmp/msfd/faoFish.txt"/>
      <Param name="ignoreCase" type="boolean" value="false"/>
      <Param name="keepOnlyMatchingTerms" type="boolean" value="true"/>
</Field>

By default, matching synonyms are in the index terms for the field. They are not store as value for the field. For example, a metadata with “Lamproie marine” will have :

  • field value: Lamproie marine
  • field terms: Lamproie marine, Petromyzon marinus (with the rule above)
  • field terms: Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus, Petromyzon marinus (with “Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus, Petromyzon marinus” rule)

For faceted search it may be relevant to create a dedicated field for the synonym. In that case, a new field is created with the matching synonym(s). Eg. for faoFish an extra field is added to the index faoFishSyn in order to only store the matching Synonym.

It may be relevant to fix:

  • #900 to be able to define synonym per language
  • #1184
  • #1183 because when having large SynonymMap it will take time to initialize them for 30 PFA.

Change History (0)

Note: See TracTickets for help on using tickets.