wiki:MultilingualMetadata

Version 7 (modified by fxp, 16 years ago) ( diff )

--

Multilingual editing

Date 2008/08/13
Contact(s) fxprunayre
Last edited Timestamp
Status draft
Assigned to release to be determined
Resources Resource available

Overview

Adding the support for multilingual metadatas. For ISO based standards, all the gco:CharacterString elements and their translation can be stored in GeoNetwork. This also allows editing mulitilingual metadatas. Add multilingual metadata support in view mode and editing

Proposal Type

  • Type: GUI Change, Module Change
  • App: GeoNetwork
  • Module: SchemaLoader, Data Manager, Metadata Import, Lucene Index, Search Interface
  • Documents:
  • Email discussions:
  • Other wiki discussions:

Voting History

  • Vote proposed by X on Y, result was +/-n (m non-voting members).

Motivations

Proposal

Allow multilingual editing in GeoNetwork. One metadata record define :

  • one main language (using gmd:language element)
  • n other languages (using gmd:locale element)

In view mode, according to GUI language :

  • if GUI language is available in the metadata, the element is displayed in this language
  • else the elemenet is displayed in metadata default language.

Then multilingual content need to be indexed in lucene.

Backwards Compatibility Issues

  • Only ISO compliant

Implementation

Metadata is defined by one main language (gmd:MD_Metadata/gmd:language:gco:CharacterString) and other locale (gmd:MD_Metadata/gmd:locale/*).

If user add new translation, a new local element has to be added:

<gmd:locale>
  <gmd:PT_Locale id="FR">
    <gmd:languageCode><gmd:LanguageCode codeList="#LanguageCode" codeListValue="fra">French</gmd:LanguageCode></gmd:languageCode>			
    <gmd:characterEncoding><gmd:MD_CharacterSetCode codeList="#MD_CharacterSetCode" codeListValue="utf8">UTF 8</gmd:MD_CharacterSetCode></gmd:characterEncoding>
  </gmd:PT_Locale>
</locale>

After editing, the new translated element is stored with:

  • an xsi:type= « gmd:PT_FreeText_PropertyType»
  • a first gco:CharacterString in the metadata language (in the example "en")
  • n gmd:PT_FreeText elements with each translation
    <scope xsi:type="gmd:PT_FreeText_PropertyType">
      <gco:CharacterString>Codelists for description of metadata datasets compliant with ISO/TC 211 19115:2003 and 19139</gco:CharacterString>
      <gmd:PT_FreeText>
        <gmd:textGroup>
          <gmd:LocalisedCharacterString locale="#FR ">Listes de codes pour la description de lots de métadonnées conforme ISO TC/211 19115:2003 et 19139</gmd:LocalisedCharacterString>
        </gmd:textGroup>
      </gmd:PT_FreeText>
    </scope>
    

In the editor any gco:CharacterString could be multilingual. The gco:CharacterString is in default language, the other PT_FreeText elements has to use a locale declared in the metadata document.

Index

One index by language is created and specific language analyzer could be define (eg. FrenchAnalyzer, GermanAnalyzer provided by Lucene). Lucene index is stored in WEB-INF/lucene directory :

lucene
 +-- nonspatial_eng
 +-- nonspatial_fra
 +-- nonspatial_deu

Metadata indexing is done in default language using index-fields.xsl and multilingual content using language-index-fields.xsl which will extract all fragments to be stored in index.

More details on indexing mechanism MultilingualIndexMechanism .

Search is done using a MultiSearcher (ie. in all index) and the index corresponding to GUI language is "boost" to be on top. A duplicate filter filter search result in order to not to have duplicate in results as one record will appear in more than on index.

Risks

Participants

  • List of participants and role (if necessary) in current GIP

Attachments (4)

Note: See TracWiki for help on using the wiki.