wiki:metadatachanges

Version 5 (modified by simonp, 12 years ago) ( diff )

--

Metadata change logging/revision log

Date 2012-01-08
Contact(s) Simon Pigot
Last edited 2012-01-08T23:00:00
Status being discussed, in progress with implementation
Assigned to release Release number, to be determined
Resources Available
Ticket # Not ticket yet

Overview

There are many use cases where the complete history of changes to a metadata record and its properties (eg. privileges, categories and status) needs to be captured. This proposal adds a local filesystem subversion repository to GeoNetwork to do this.

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Kernel, Data Manager
  • Documents:
  • Email discussions:
  • Other wiki discussions:

Voting History

  • Not yet proposed.

Motivations

In its current form GeoNetwork does not capture any details about changes to metadata records or properties of metadata records (eg. privileges, categories, status). Instead, only the latest version of the metadata record and its current properties are available. However there are many use cases where it is important to be able to track (over time):

  • changes to the metadata record ie. changes to individual elements
  • changes to properties of the metadata record eg. privileges, categories, status

As we use a database to hold both the metadata records and their properties, we could implement history tables to capture these changes and provide a user interface that allows the user to query the information in these tables. Alternatively, we could use a subversion repository to capture these changes and allow the user to examine the changes through the various visual interfaces to subversion repositories that already exist eg. viewvc.

Proposal

Using an open source java api to subversion from tmatesoft, we will implement change tracking for metadata records and their properties in a subversion repository created and maintained by the GeoNetwork code.

Not all records will be tracked as the compute and systems admin cost of this tracking for every record, particularly in larger catalogs is too high. Instead only those records that are edited and updated within the local GeoNetwork instance will be tracked in the subversion repository.

The database will remain the point of truth for GeoNetwork. That is, changes will be tracked in subversion, but the database will continue to be the facility used by the code. For example, although it is possible to extract the latest version of a metadata record from the subversion repository, the code will continue to extract the latest version of the metadata record from the database table.

Using a subversion repository in place of database history tables, forces us to think about maintaining both the subversion repository and the database in a consistent manner ie. committing or aborting the database and subversion repository. In developing this proposal we've examined two approaches:

  • auto commit the subversion repository after every change to the metadata record or its properties
  • commit/abort the subversion repository and database at the same time

The first approach is the easiest to code particularly as regards maintaining consistency between the subversion repository and the database: if the database commit fails we can simply avoid committing any changes to the subversion repository. If any of the subversion repository commits fail, then we can abort the database commit as well. However, excepting the simplest operations on a single record and without some possibly substantial changes to the existing code, the changes recorded in the subversion repository will bare little or no resemblance to the changes that are made by GeoNetwork services. For example, if the user decides to change the privileges on a metadata record, this would result in more than one commit to the subversion repository (in fact the number of commits would be equal to the number of group permissions selected in the privilege interface).

The second approach is more difficult to code: subversion changes need to be bundled by keeping the subversion commit editor open and using a listener to commit/abort the changes to the subversion repository when the database is committed/aborted. This scenario is further complicated by the design of the tmatesoft api which does not allow reentrant calls on a subversion repository object and the fact that the editor cannot open files and directories in the repository more than once as described at http://osdir.com/ml/version-control.subversion.javasvn.user/2007-10/msg00053.html.

Backwards Compatibility Issues

None - this function can be configured off if not required.

New libraries added

tmatesoft subversion api in Java

<!-- svnkit stuff -->
      <dependency>
        <groupId>org.tmatesoft.svnkit</groupId>
        <artifactId>svnkit</artifactId>
        <version>1.3.6-v1</version>
      </dependency>

This jar is available from the maven repository at http://maven.tmatesoft.com/content/repositories/releases.

Risks

Participants

  • Simon Pigot

Attachments (4)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.