wiki:ComposedMetadataRecords

Version 47 (modified by simonp, 15 years ago) ( diff )

--

Composed Metadata Records

Date 2009/09/01
Contact(s) Simon Pigot
Last edited Timestamp
Status draft, being discussed, in progress, early stage complete
Assigned to release 2.5
Resources Available for first stage

Overview

For GeoNetwork to become part of an institution's metadata and data management fabric, it must be able to compose metadata from content held in databases external to GeoNetwork. This proposal would add two new components to the kernel and harvester modules of GeoNetwork:

  • WFS fragment harvester - a harvester that can import metadata fragments (also known as subtemplates) from an external database with a WFS interface and (optionally) construct composed metadata records (one per feature) with fragments linked in
  • XLink resolver and cache - a facility that will resolve links to metadata fragments (both local and remote if necessary) with the ability to cache fragments for efficient retrieval

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Kernel: Data Manager, XmlSerializer, Jeeves, Admin->System configuration options
  • Subtemplates: Metadata fragments are equivalent to subtemplates (which were only partially implemented in GeoNetwork). Subtemplates appear to have been an extension of the template concept in GeoNetwork. Templates are complete metadata records with some elements filled in. A user can clone such a record for use in the editor as a template. At this stage all connection between the template and the cloned record is broken ie. changes to the template are not visible in the cloned record and vice versa. Subtemplates (as they were partially implemented in GeoNetwork) took the template concept down to the level of individual elements in the metadata record. So for example, contact information could be saved as a subtemplate and then reused when editing different elements. The implementation of subtemplates didn't make clear whether the link between a subtemplate and the record it had been added to was maintained. This proposal intends to implement subtemplates as fragments of metadata harvested from an external database - the link between a metadata record and a fragment will be maintained ie. changes in the fragment will be visible in the record.
  • Composed, Componentized and Relational Metadata: The idea of composed or componentized metadata and the term composed metadata is not new, it appears to be common to many discussions on the net (see for example Ted Habermann's proposal http://trac.osgeo.org/geonetwork/wiki/ComponentsAndComposites) and in the literature and its implementation is probably an aim of many metadata tools. Another term with similar aims but which uses the concepts of reuse/normalization/removal of redundancy from relational database terminology is "relational" metadata (eg. LISASoft metadata report). Although there has been discussion in and around these topics and even some implementation of fragments in GeoNetwork as subtemplates, this proposal suggests a mechanisms for implementing these concepts in GeoNetwork based on the harvester concept. A harvester brings in metadata fragments (initially from a WFS but conceivably any harvest source could do this) and can build composed metadata records from a template and fragments.

Voting History

  • Not voted on as yet.

Motivations

The motivation for this proposal comes from the need to fit GeoNetwork into organisations that already manage metadata in a number of different databases external to GeoNetwork.

Proposal

The two components to be added by this proposal (in more detail):

  • WFS fragment harvester - this is a harvester that accepts (along with the usual harvester parameters):
    • a WFS GetFeature query
    • an XSLT to convert WFS GML output to metadata fragments (two XSLTs: the first for use with the default deegree Philosopher WFS and the other for use with the GeoServer default countries WFS are supplied for example purposes)
    • a template metadata document that describes how to link metadata fragments (more than one fragment can be returned for each feature) - this is only used for the first run of the harvester.
    • permissions and categories for fragments and records created.
  • XLink resolver and cache - the geocat.ch sandbox has developed an XLink resolver and cache mechanism. The implementation in the BlueNetMEST sandbox uses the same cache mechanism (based on Apache JCS) but parts of the resolver have been rewritten to handle (amongst other things) relative links to fragments in the same record and to avoid caching of such fragments. (See http://geonetwork.svn.sourceforge.net/viewvc/geonetwork/sandbox/BlueNetMEST/jeeves/src/jeeves/xlink). More details:
    • Metadata Records in GeoNetwork would have their !XLinks resolved when read from the database (takes place in XmlSerializer) - only resolved records are seen by users and harvesters
    • Reindexing of metadata records that used to be done before servlet startup has been delayed until the servlet is up - this is to allow links between records in the local GeoNetwork node to be resolved.
    • The cache and the lucene index are kept in sync - XLink'd fragments that are held in the cache are eternal (they never time out) unless forced by a new admin operation which clears the cache and then rebuilds the lucene index for all records with XLinks (this operation can be scheduled or run immediately) or by a fragments harvester.
    • XLinks and metadata records composed from xlink'd fragments can be made optional through the use of a system configuration option that can turn this feature on or off.

Minor changes were required to the metadata editor so that metadata fragments are recognizable and cannot be edited.

Suggestions for Future/Other proposals

This proposal is only the beginning of addressing full support for XLinks. Future work could address some of the following issues/opportunities:

  • URN resolver to provide a level of indirection that can be used to cope with changing URLs and ensure referential integrity. Metadata fragments are linked into records using XLinks. XLinks can use URLs or URNs in the link attribute (xlink:href). URNs are intended to provide a permanent A service that provides the ability to register a urn and associated URL, and lookup a URL given a URN, would allow the implementation to use URNs in place of URLs, thus providing a measure of control over broken links/missing content which can occur if we were to use URLs.
  • Support for updating a fragment in GeoNetwork- sometimes it makes sense for a fragment to be edited and saved back into the external database from which it was harvested. WFS-T support would be used to provide this facility.
  • Change other suitable GeoNetwork harvesters (eg. OGC WxS capabilities harvester) to harvest fragments rather than complete metadata records using the same approach as the WFS fragment harvester.
  • Support in the editor for fragments: the original intention of subtemplates was that they be accessible from the editor ie. a user could select a fragment (eg. contact info) to use when editing that portion of a metadata record. Some work appears to have already been done in the geocat.ch sandbox on this function.
  • Access to fragments by other editor tools: Other editing tools (eg. the wizard based ANZMETLite tool) can use fragments in their interface to ease the metadata entry and editing process. Fragments harvested into GeoNetwork should be accessible to these tools.

Backwards Compatibility Issues

Metadata records as traditionally handled by GeoNetwork will not be affected by the addition of this feature.

No issues for harvesters that harvest from GeoNetwork: Composed metadata records will have their XLinks resolved before they are harvested as a harvester usually updates the set of records harvested from a remote site on a regular basis.

No issues for export (eg. MEF): Composed metadata records exported as MEF files would have their XLinks resolved before export.

Risks

  • Some XLink concepts are open to a number of interpretations eg. the notion of a relative URL with fragment identifier such as:
<gmd:temporalExtent xlink:href="#temporalExtent">
  • These are interpreted as a link to a metadata fragment within the same document. However, from discussions with the deegree developers (who have an advanced xlink implementation in their WFS), it appears that some organisations are interpreting such a link as being a fragment in any document within the local database?
  • Since this proposal and the fragment based harvester included allow xlink attributes on any element in the metadata record, composed metadata records with unresolved xlinks would fail validation. This is not an issue that is confined to composed metadata records though as GeoNetwork does not provide any config options or other controls for the administrator to handle records that fail validation.

Participants

  • Simon Pigot, CSIRO Marine and Atmospheric Research
  • Craig Jones, IMOS/eMii
  • geocat.ch developers - have similar requirements including support for updating metadata fragments and using fragments in the editor and have implemented XLinks and caching
  • URN resolver and GeoServer community schema support: AuScope/Spatial Information Services Stack (SISS) team
  • "Relational" metadata - LISASoft developers?
  • Ted Habermann and team at NOAA National Geophysical Data Center
  • Others?
Note: See TracWiki for help on using the wiki.