wiki:metadatachanges

Version 16 (modified by simonp, 12 years ago) ( diff )

--

Metadata change logging/revision log

Date 2012-01-08
Contact(s) Simon Pigot
Last edited 2012-01-08T23:00:00
Status being discussed, in progress with implementation
Assigned to release Release number, to be determined
Resources Available
Ticket # #726

Overview

There are many use cases where the complete history of changes to a metadata record and its properties (eg. privileges, categories and status) needs to be captured. This proposal adds a local filesystem subversion repository to GeoNetwork to do this.

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Kernel, Data Manager
  • Documents:
  • Email discussions:
  • Other wiki discussions:

Voting History

  • Not yet proposed.

Motivations

In its current form GeoNetwork does not capture any details about changes to metadata records or properties of metadata records (eg. privileges, categories, status). Instead, only the latest version of the metadata record and its current properties are available. However there are many use cases where it is important to be able to track (over time):

  • changes to the metadata record ie. changes to individual elements
  • changes to properties of the metadata record eg. privileges, categories, status

As we use a database to hold both the metadata records and their properties, we could implement history tables to capture these changes and provide a user interface that allows the user to query the information in these tables. Alternatively, we could use a subversion repository to capture these changes and allow the user to examine the changes through the various visual interfaces to subversion repositories that already exist eg. viewvc. Apart from the advantage of ready to use tools for examining the changes, the subversion approach is efficient for XML files and simple to maintain.

Proposal

Using an open source java api to subversion from tmatesoft, we will implement change tracking for metadata records and their properties in a subversion repository created and maintained by the GeoNetwork code.

Not all records will be tracked as the compute and systems admin cost of this tracking for every record, particularly in larger catalogs is too high. Instead only those records that are edited and updated within the local GeoNetwork instance will be tracked in the subversion repository.

The database will remain the point of truth for GeoNetwork. That is, changes will be tracked in subversion, but the database will continue to be the facility used by all services. For example, although it is possible to extract the latest version of a metadata record from the subversion repository, all services will continue to extract the latest version of the metadata record from the database table.

Using a subversion repository in place of database history tables, forces us to think about maintaining both the subversion repository and the database in a consistent manner ie. committing or aborting the database and subversion repository. In developing this proposal we've examined three approaches:

  • auto commit the subversion repository after every change to the metadata record or its properties
  • apply the changes to the subversion repository and commit/abort the subversion repository and database at the same time
  • set a flag saying that changes have been made and then at commit, query the database and apply changes to the subversion repository

The first approach is the easiest to code particularly as regards maintaining consistency between the subversion repository and the database: if the database ops fails we can simply avoid committing any changes to the subversion repository. If any of the subversion repository commits fail, then we can abort the database commit as well (it's not quite as simple as this actually!!!). However, excepting the simplest operations on a single record, the changes recorded in the subversion repository will bare little or no resemblance to the changes that are made by GeoNetwork services. For example, if the user decides to change the privileges on a metadata record, this would result in more than one commit to the subversion repository (in fact the number of commits would be equal to the number of group permissions selected in the privilege interface as they are set one by one in the DataManager).

The second approach is more difficult to code: subversion changes need to be bundled by keeping the subversion commit editor open and using a listener to commit/abort the changes to the subversion repository when the database is committed/aborted. This scenario is complicated by the design of the tmatesoft api which does not allow reentrant calls on a subversion repository object and does not allow files and directories in the repository to be opened more than once in a transaction as described at http://osdir.com/ml/version-control.subversion.javasvn.user/2007-10/msg00053.html.

The third approach is the one that has been implemented. The coding is much more straightforward than the second approach and only slightly more complex than the first.

To illustrate the third approach, let's examine a typical scenario where we wish to capture changes to the privileges of a metadata record made by a user in the 'Set Privileges' function:

  • This function ultimately calls the setOperation method in the DataManager to change the privileges for the metadata in the database.
  • In setOperation we add a call to setHistory in the SvnManager which records the id of the metadata record against the database channel.
  • When the database channel is committed at the end of the 'Set Privileges' function, the listener on the database channel reads the privileges for the metadata record and commits any changes to the subversion repository.

Lastly we should mention that the metadata properties are stored in the subversion repository as XML files. The typical structure of a directory for a metadata record in the repository consists of a directory (named after the id of the metadata record) which contains:

  • metadata.xml - a record of changes to the content of the metadata record itself
  • owner.xml - an XML file describing the owner of the metadata record
  • privileges.xml - an XML file describing the privileges of the metadata record
  • categories.xml - an XML file describing the categories to which the metadata record has been assigned
  • status.xml - an XML file describing the status of the metadata (eg. Approved, Rejected, etc)

A typical example of a privileges.xml file stored in the repository:

<response>
  <record>
    <group_name>intranet</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>sample</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>0</operation_id>
    <operation_name>view</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>1</operation_id>
    <operation_name>download</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>1</operation_id>
    <operation_name>download</operation_name>
  </record>
  <record>
    <group_name>sample</group_name>
    <operation_id>3</operation_id>
    <operation_name>notify</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>5</operation_id>
    <operation_name>dynamic</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>5</operation_id>
    <operation_name>dynamic</operation_name>
  </record>
  <record>
    <group_name>intranet</group_name>
    <operation_id>6</operation_id>
    <operation_name>featured</operation_name>
  </record>
  <record>
    <group_name>all</group_name>
    <operation_id>6</operation_id>
    <operation_name>featured</operation_name>
  </record>
</response>

Difference between revisions 3 and 4 for the privileges.xml file for metadata record 10:

svn diff -r 3:4
Index: 10/privileges.xml
===================================================================
--- 10/privileges.xml   (revision 3)
+++ 10/privileges.xml   (revision 4)
@@ -1,12 +1,52 @@
 <response>
   <record>
+    <group_name>intranet</group_name>
+    <operation_id>0</operation_id>
+    <operation_name>view</operation_name>
+  </record>
+  <record>
     <group_name>sample</group_name>
     <operation_id>0</operation_id>
     <operation_name>view</operation_name>
   </record>
   <record>
+    <group_name>all</group_name>
+    <operation_id>0</operation_id>
+    <operation_name>view</operation_name>
+  </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>1</operation_id>
+    <operation_name>download</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>1</operation_id>
+    <operation_name>download</operation_name>
+  </record>
+  <record>
     <group_name>sample</group_name>
     <operation_id>3</operation_id>
     <operation_name>notify</operation_name>
   </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>5</operation_id>
+    <operation_name>dynamic</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>5</operation_id>
+    <operation_name>dynamic</operation_name>
+  </record>
+  <record>
+    <group_name>intranet</group_name>
+    <operation_id>6</operation_id>
+    <operation_name>featured</operation_name>
+  </record>
+  <record>
+    <group_name>all</group_name>
+    <operation_id>6</operation_id>
+    <operation_name>featured</operation_name>
+  </record>
 </response>

Examination of this diff file shows that privileges for the 'All' and 'Intranet' groups have been added between revision 3 and 4 - in short, the record has been published.

Here is an example of a change that has been made to a metadata record:

svn diff -r 2:3
Index: 10/metadata.xml
===================================================================
--- 10/metadata.xml     (revision 2)
+++ 10/metadata.xml     (revision 3)
@@ -61,7 +61,7 @@
     </gmd:CI_ResponsibleParty>
   </gmd:contact>
   <gmd:dateStamp>
-    <gco:DateTime>2012-01-10T01:47:51</gco:DateTime>
+    <gco:DateTime>2012-01-10T01:48:06</gco:DateTime>
   </gmd:dateStamp>
   <gmd:metadataStandardName>
     <gco:CharacterString>ISO 19115:2003/19139</gco:CharacterString>
@@ -85,7 +85,7 @@
       <gmd:citation>
         <gmd:CI_Citation>
           <gmd:title>
-            <gco:CharacterString>Template for Vector data in ISO19139 (preferr
ed!)</gco:CharacterString>
+            <gco:CharacterString>fobblers foibblers</gco:CharacterString>
           </gmd:title>
           <gmd:date>
             <gmd:CI_Date>

This example shows that the editor has made a change to the title and the dateStamp.

Note these examples were created using command line subversion tools. The viewvc subversion repository tool has a graphical interface that allows side-by-side comparison of changes/differences between files:

All XML files describing the properties of the metadata record are generated by SELECT statements on the relevant tables in the database.

Finally, as mentioned above, metadata records are not automatically versioned as this would impose too many overheads and is not necessary (eg. for harvested record sets). So this proposal also adds the capability for the user to select a set of records or a single record to be versioned. These interfaces are available using the usual methods below:

Patch attached to ticket #726

Backwards Compatibility Issues

None - this function can be configured off if not required.

New libraries added

tmatesoft subversion api in Java

<!-- svnkit stuff -->
      <dependency>
        <groupId>org.tmatesoft.svnkit</groupId>
        <artifactId>svnkit</artifactId>
        <version>1.3.6-v1</version>
      </dependency>

This jar is available from the maven repository at http://maven.tmatesoft.com/content/repositories/releases.

Risks

Participants

  • Simon Pigot

Attachments (4)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.