wiki:extractSubtemplates

Batch Operation to Extract Subtemplates

Date 2012/05/03
Contact(s) Simon Pigot
Last edited 2012/05/03
Status Committed svn rev 9121
Assigned to release 2.7.x
Resources Available
Ticket # #878

Overview

With the addition of the XLink processing, fragment harvesting, subtemplate (= fragment with an id in the Metadata table of the GeoNetwork database) support and tools for managing directories of subtemplates, GeoNetwork can now begin to support reusable fragments of metadata linked into records. However many sites have metadata records with common fragments of metadata that they would like to extract into directories of subtemplates. This proposal adds a batch operation for admin users that will extract subtemplates from a selected set of records. Subtemplates are identified as follows: if the root element of the subtemplate has a uuid attribute, then this will be the uuid of the extracted subtemplate. If there is no uuid attribute on the root element of the subtemplate, then one is obtained by calculating the checksum of its text content.

Proposal Type

  • Type: New batch function for admin users
  • App: GeoNetwork
  • Module: Batch Operations

Voting History

  • Proposed for voting on May 3, 2012, Francois +1, Jesse +1, Jeroen +1

Motivations

Many sites have existing metadata records with common information eg. contact information in an ISO CI_ResponsibleParty element. With the addition of subtemplate support and maintenance functions to GeoNetwork, it should be possible to extract these fragments of metadata, remove duplicates and store them as subtemplates. This proposal describes a function that does this.

Proposal

This function works as follows:

  • Identify fragments of metadata that they would like to manage as reusable subtemplates. This can be done using an XPath. eg. the XPath /grg:RE_Register/grg:containedItem/gnreg:RE_RegisterItem identifies register items in an ISO19135 register record such as that describing the ANZLIC Geographic Extent Names vocabulary and shown in the following example:
<grg:containedItem>
      <gnreg:RE_RegisterItem gco:isoType="grg:RE_RegisterItem" uuid="da078149-ba39-4cb9-817d-7229e479243b">
         <grg:itemIdentifier>
            <gco:Integer>59</gco:Integer>
         </grg:itemIdentifier>
         <grg:name>
            <gco:CharacterString>AUSTRALIA EXCLUDING EXTERNAL TERRITORIES</gco:CharacterString>
         </grg:name>
         <grg:status>
            <grg:RE_ItemStatus>valid</grg:RE_ItemStatus>
         </grg:status>
         <grg:dateAccepted>
            <gco:Date>2006-10-10</gco:Date>
         </grg:dateAccepted>
         <grg:definition>
            <gco:CharacterString>AUSTRALIA EXCLUDING EXTERNAL TERRITORIES|-9|-44|154|112|Australia</gco:CharacterString>
         </grg:definition>
         .......
         <gnreg:itemExtent>
            <gmd:EX_Extent>
               <gmd:geographicElement>
                  <gmd:EX_GeographicBoundingBox>
                     <gmd:westBoundLongitude>
                        <gco:Decimal>112</gco:Decimal>
                     </gmd:westBoundLongitude>
                     <gmd:eastBoundLongitude>
                        <gco:Decimal>154</gco:Decimal>
                     </gmd:eastBoundLongitude>
                     <gmd:southBoundLatitude>
                        <gco:Decimal>-44</gco:Decimal>
                     </gmd:southBoundLatitude>
                     <gmd:northBoundLatitude>
                        <gco:Decimal>-9</gco:Decimal>
                     </gmd:northBoundLatitude>
                  </gmd:EX_GeographicBoundingBox>
               </gmd:geographicElement>
            </gmd:EX_Extent>
         </gnreg:itemExtent>
         <gnreg:itemIdentifier>
            <gco:CharacterString>http://www.ga.gov.au/anzmeta/gen/AUS</gco:CharacterString>
         </gnreg:itemIdentifier>
      </gnreg:RE_RegisterItem>
   </grg:containedItem>
  • Identify and record the XPath of a field or fields within the fragment whose text content will be used as the title of the subtemplate. It is important to choose a set of fields that will allow a human to identify the subtemplate when they choose to either reuse the subtemplate in a new record or edit in the subtemplate directories interface (see below).
  • In GeoNetwork main page, search for and then select the records from which the subtemplates will be extracted. See the following example:

  • Choose 'Extract subtemplates' from the 'Actions on selected set' drop down menu at the top right of the search interface.
  • Enter the XPath of the fragment, XPath of the element(s) in the fragment to extract a title for the subtemplate to be created from the fragment and a category to which the new subtemplates will be assigned. See the following example:

  • Run the command and look at the test results that are displayed (see example following) to see whether the XPath and title extraction XPath are doing what you expect. If you scroll down, the test results show: XPath of fragment, number of subtemplates that will be created, first subtemplate extracted, title extracted from subtemplate and the xlink element that will replace the fragment in the metadata record. No changes are made to the metadata records.

  • Check the checkbox alongside the 'I really want to do this!' when you're sure that everything is ok. The end result will be that the fragments of metadata specified by the XPath will be removed from the records in the selected set and saved as subtemplates and then linked into the records that use them. Here is what part of the register record example used here looks like with an XLink replacing the original gnreg:RE_RegisterItem:
   <grg:containedItem>
      <gnreg:RE_RegisterItem gco:isoType="grg:RE_RegisterItem" xlink:href="http://localhost:8080/geonetwork/src/eng/?da078149-ba39-4cb9-817d-7229e479243b">
   </grg:containedItem>

  • Check the subtemplates created in the Administration->Manage Directories function. Here is an example of how this looks after we have extracted subtemplates from the ANZLIC Geographic Extent Names register record:

Removing duplicates in the extraction process

As mentioned above, subtemplates created from an extraction can be assigned a uuid from the uuid attribute on the root element of the subtemplate or if that doesn't exist, a uuid will be calculated using a sha1 checksum on the text content of the subtemplate.

The advantage of this procedure is that the metadata records can be preprocessed using a batch XSLT operation in GeoNetwork that calculates a uuid and stores it as an attribute using rules appropriate to the site - eg. if extracting contact information as subtemplates, a site may decide that all fragments of contact information with the same organisation name should be linked to one subtemplate. To achieve this, a batch XSLT operation can be run before the subtemplate extraction to assign the same uuid to all CI_ResponsibleParty fragments with a common organisation name (eg. by calculating a checksum or by using a lookup table).

Backwards Compatibility Issues

None?

Risks

  • Need more functions to manage the links between records and subtemplates.

Participants

  • Simon Pigot
Last modified 13 years ago Last modified on 05/15/12 09:49:01

Attachments (5)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.