Opened 12 years ago
#1185 new enhancement
Lucene / Add Synonym support
Reported by: | fxp | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | Future release |
Component: | General | Version: | v2.8.0RC2 |
Keywords: | Cc: |
Description
In order to support synonym in search, the following is proposed:
A new analyzer is introduced based on the following analyzer chain:
- KeywordTokenizer
- SynonymFilter
- TypeTokenFilter (optional)
Analyzer parameters:
- file : the synonym rule file (based on SolrSynonymParser format)
- ignoreCase
- keepOnlyMatchingTerms : to only keep field value if one match is found
Synonym rule example:
Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus => Petromyzon marinus
Field indexing is made as usual, in index-fields.xsl:
<Field name="faoFish" string="{string(.)}" store="false" index="true"/>
Field analyzer configuration in lucene-config.xml
<Field name="faoFish" analyzer="org.fao.geonet.kernel.search.analysis.SynonymAnalyzer"> <Param name="file" type="java.io.File" value="/tmp/msfd/faoFish.txt"/> <Param name="ignoreCase" type="boolean" value="false"/> <Param name="keepOnlyMatchingTerms" type="boolean" value="true"/> </Field>
By default, matching synonyms are in the index terms for the field. They are not store as value for the field. For example, a metadata with “Lamproie marine” will have :
- field value: Lamproie marine
- field terms: Lamproie marine, Petromyzon marinus (with the rule above)
- field terms: Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus, Petromyzon marinus (with “Lamprea marina, Lamproie marine, Sea lamprey, Petromyzon marinus, Petromyzon marinus” rule)
For faceted search it may be relevant to create a dedicated field for the synonym. In that case, a new field is created with the matching synonym(s). Eg. for faoFish an extra field is added to the index faoFishSyn in order to only store the matching Synonym.
It may be relevant to fix:
- #900 to be able to define synonym per language
- #1184
- #1183 because when having large SynonymMap it will take time to initialize them for 30 PFA.