wiki:metadatachanges

Version 23 (modified by simonp, 12 years ago) ( diff )

--

Metadata change logging/revision log

Date 2012-01-08
Contact(s) Simon Pigot
Last edited 2012-01-17T00:30:00
Status Basic Implementation Complete
Assigned to release 2.7.x
Resources Available
Ticket # #726

Overview

There are many use cases where the complete history of changes to a metadata record and its properties (eg. privileges, categories and status) needs to be captured. This proposal adds a local filesystem subversion repository to GeoNetwork and code to capture these changes.

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Kernel, Data Manager
  • Documents:
  • Email discussions:
  • Other wiki discussions:

Voting History

  • Proposed on 2012-01-13, Francois +1,

Motivations

In its current form GeoNetwork does not capture any details about changes to metadata records or properties of metadata records (eg. privileges, categories, status). Instead, only the latest version of the metadata record and its current properties are available. However there are many use cases where it is important to be able to track (over time):

  • changes to the metadata record ie. changes to individual elements
  • changes to properties of the metadata record eg. privileges, categories, status

As we use a database to hold both the metadata records and their properties, we could implement history tables to capture these changes and provide a user interface that allows the user to query the information in these tables. Alternatively, we could use a subversion repository to capture these changes and allow the user to examine the changes through the various visual interfaces to subversion repositories that already exist eg. viewvc. Apart from the advantage of ready to use tools for examining the changes, the subversion approach is efficient for XML files and simple to maintain.

Proposal

Using an open source java api to subversion from tmatesoft, we will implement change tracking for metadata records and their properties in a subversion repository created and maintained by the GeoNetwork code.

Not all records will be tracked as the compute and systems admin cost of this tracking for every record, particularly in larger catalogs is too high. Instead only those records that are edited and updated within the local GeoNetwork instance will be tracked in the subversion repository.

The database will remain the point of truth for GeoNetwork. That is, changes will be tracked in subversion, but the database will continue to be the facility used by all services. For example, although it is possible to extract the latest version of a metadata record from the subversion repository, all services will continue to extract the latest version of the metadata record from the database table.

Using a subversion repository in place of database history tables, forces us to think about maintaining both the subversion repository and the database in a consistent manner ie. committing or aborting the database and subversion repository. In developing this proposal we've examined three approaches:

  • auto commit the subversion repository after every change to the metadata record or its properties
  • apply the changes to the subversion repository as they are made, then commit/abort the subversion repository and database at the same time
  • set a flag saying that changes have been made and then at database commit, query the database and commit changes to the subversion repository

The first approach is the easiest to code particularly as regards maintaining consistency between the subversion repository and the database: if a database operation fails we don't make any changes to the subversion repository. If any of the subversion repository commits fail, then we could abort the current database commit. However, excepting the simplest operations on a single record, the changes recorded in the subversion repository will bare little or no resemblance to the changes that are made by GeoNetwork services. For example, if the user decides to change the privileges on a metadata record, this would result in more than one commit to the subversion repository (in fact the number of commits would be equal to the number of group permissions selected in the privilege interface as they are set one by one in the DataManager).

The second approach is slightly more difficult to code: subversion changes need to be bundled by keeping the subversion commit editor open and using a listener to commit/ignore the changes to the subversion repository when the database is committed/aborted. This scenario is complicated by the design of the tmatesoft api which does not allow reentrant calls on a subversion repository object and does not allow files and directories in the repository to be opened more than once in a transaction eg. as described at http://osdir.com/ml/version-control.subversion.javasvn.user/2007-10/msg00053.html.

The third approach is the one that has been implemented. The coding is more straightforward than the second approach, only slightly more complex than the first and it captures the state of both the metadata record and its properties at database commit time.

To illustrate the third approach, let's examine a typical scenario where we wish to capture changes to the privileges of a metadata record made by a user in the 'Set Privileges' function:

  • This function ultimately calls the setOperation method in the DataManager to change the privileges for the metadata in the database.
  • In setOperation we add a call to setHistory in the SvnManager which records the id of the metadata record against the database channel.
  • Just prior to the database channel being committed at the end of the 'Set Privileges' function, the listener on the database channel reads the privileges for the metadata record and commits any changes to the subversion repository.

Note that the current transaction isolation setting for database connections from the apache database connection pool used in GeoNetwork is read-committed. The transaction level for these connections will need to be set to the more strict serialized transaction isolation level if metadata versioning is config'd on, so that changes made by one transaction to the record and its properties will not overlap with changes committed by another transaction. Note: The transaction isolation level in GeoNetwork used to be serialized before version 2.6. GeoNetwork admins who configure the database pool through JNDI will need to be warned to set the transaction isolation level to serialized in the documentation if they want to use metadata versioning.

The metadata properties are stored in the subversion repository as XML files. The typical structure of a directory for a metadata record in the repository consists of a directory (named after the id of the metadata record) which contains:

  • metadata.xml - a record of changes to the content of the metadata record itself
  • owner.xml - an XML file describing the owner of the metadata record
  • privileges.xml - an XML file describing the privileges of the metadata record
  • categories.xml - an XML file describing the categories to which the metadata record has been assigned
  • status.xml - an XML file describing the status of the metadata (eg. Approved, Rejected, etc)

A typical example of a privileges.xml file stored in the repository:

<response>
  <record>
    <group_name>intranet</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>sample</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>1</operation_id>
    <operation_name>download</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>1</operation_id>
    <operation_name>download</operation_name>
  </record>
  <record>
    <group_name>sample</group_name>
    <operation_id>3</operation_id>
    <operation_name>notify</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>5</operation_id>
    <operation_name>dynamic</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>5</operation_id>
    <operation_name>dynamic</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>6</operation_id>
    <operation_name>featured</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>6</operation_id>
    <operation_name>featured</operation_name>
  </record>
</response>

Difference between revisions 3 and 4 for the privileges.xml file for metadata record 10:

svn diff -r 3:4
Index: 10/privileges.xml
===================================================================
--- 10/privileges.xml   (revision 3)
+++ 10/privileges.xml   (revision 4)
@@ -1,12 +1,52 @@
 <response>
   <record>
+    <group_name>intranet</group_name>
+    <operation_id>0</operation_id>
+    <operation_name>view</operation_name>
+  </record>
+  <record>
     <group_name>sample</group_name>
     <operation_id>0</operation_id>
     <operation_name>view</operation_name>
   </record>
   <record>
+    <group_name>all</group_name>
+    <operation_id>0</operation_id>
+    <operation_name>view</operation_name>
+  </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>1</operation_id>
+    <operation_name>download</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>1</operation_id>
+    <operation_name>download</operation_name>
+  </record>
+  <record>
     <group_name>sample</group_name>
     <operation_id>3</operation_id>
     <operation_name>notify</operation_name>
   </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>5</operation_id>
+    <operation_name>dynamic</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>5</operation_id>
+    <operation_name>dynamic</operation_name>
+  </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>6</operation_id>
+    <operation_name>featured</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>6</operation_id>
+    <operation_name>featured</operation_name>
+  </record>
 </response>

Examination of this diff file shows that privileges for the 'All' and 'Intranet' groups have been added between revision 3 and 4 - in short, the record has been published.

Here is an example of a change that has been made to a metadata record:

svn diff -r 2:3
Index: 10/metadata.xml
===================================================================
--- 10/metadata.xml     (revision 2)
+++ 10/metadata.xml     (revision 3)
@@ -61,7 +61,7 @@
     </gmd:CI_ResponsibleParty>
   </gmd:contact>
   <gmd:dateStamp>
-    <gco:DateTime>2012-01-10T01:47:51</gco:DateTime>
+    <gco:DateTime>2012-01-10T01:48:06</gco:DateTime>
   </gmd:dateStamp>
   <gmd:metadataStandardName>
     <gco:CharacterString>ISO 19115:2003/19139</gco:CharacterString>
@@ -85,7 +85,7 @@
       <gmd:citation>
         <gmd:CI_Citation>
           <gmd:title>
-            <gco:CharacterString>Template for Vector data in ISO19139 (preferr
ed!)</gco:CharacterString>
+            <gco:CharacterString>fobblers foibblers</gco:CharacterString>
           </gmd:title>
           <gmd:date>
             <gmd:CI_Date>

This example shows that the editor has made a change to the title and the dateStamp.

Note these examples were created using command line subversion tools. The viewvc subversion repository tool has a graphical interface that allows side-by-side comparison of changes/differences between files:

All XML files describing the properties of the metadata record are generated by SELECT statements on the relevant tables in the database.

Finally, as mentioned above, metadata records are not automatically versioned as this would impose too many overheads and is not necessary (eg. for harvested record sets). So this proposal also adds the capability for the user to select a set of records or a single record to be versioned. These interfaces are available using the usual methods below:

Metadata fragments (from directories local to GeoNetwork or from external URLs on the internet) can be linked into metadata records to support reuse.

The current patch will support versioning resolved records only. This means that all metadata fragments (both local and external) will be resolved before the record is versioned.

The user will be able to switch on versioning for fragments held in directories in the local GeoNetwork catalog in much the same way as they can for metadata records (see user interface examples above). At the moment, a change made to a local fragment will not force a new version of any record that links this fragment. Instead these changes will be picked up next time the record or its properties are changed.

Patch

Patch attached to ticket #726

Backwards Compatibility Issues

None - this function can be configured off if not required.

New libraries added

tmatesoft subversion api in Java

<!-- svnkit stuff -->
      <dependency>
        <groupId>org.tmatesoft.svnkit</groupId>
        <artifactId>svnkit</artifactId>
        <version>1.3.6-v1</version>
      </dependency>

This jar is available from the maven repository at http://maven.tmatesoft.com/content/repositories/releases.

Risks

Risks?

Participants

  • Simon Pigot

Attachments (4)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.