wiki:rdfstore

Version 8 (modified by simonp, 12 years ago) ( diff )

--

Using RDF

Date 2012-10-28
Contact(s) Simon Pigot
Last edited
Status draft, being discussed, in progress
Assigned to release Not yet assigned to a release
Resources Not allocated yet
Ticket # #XYZ

Overview

GeoNetwork stores metadata records from different schemas as rows in a database table. To provide search, a metadata record is:

  • transformed into to a common XML index document via XSLT;
  • the common XML document is ingested by Lucene, which creates an index of the fields within the document;
  • the Lucene index and query format is used for searching

The essence of this proposal is to change this process as follows:

  • transform the metadata record into an RDF (resource data format) document when it is ingested by GeoNetwork
  • store the RDF document in an RDF triple store
  • use the RDF triple store and the SPARQL query language for searching

Why would we do this?

  • Simplify the architecture of GeoNetwork (Lucene would no longer be needed and metadata would be stored and searched in the same persistence solution)
  • RDF is purpose designed for representing facts and relationships between facts
  • RDF triple stores and the SPARQL query language are designed to query facts and relationships between facts

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Data Manager, Search,
  • Documents:
  • Email discussions:
  • Other wiki discussions:

Voting History

  • Not proposed for voting yet.

Motivations

The current configuration is .... A change to ....

Proposal

An in depth proposal can be found here : link ...

Unanswered Questions

Object Identifiers: One of the stated key advantages of RDF is that objects are identified once and then reused. In the work done to date, I don't see how converting a record to RDF will identify the individual objects for reuse eg. if a piece of contact info is present in two different metadata records, then how is that object uniquely identified? Perhaps the object identifier could be derived from an md5sum on the content of the object?

Backwards Compatibility Issues

Heaps: we have begun to use Lucene as a very fast persistence in place of the database (cf for example, search service q).

New libraries added

Explain which and why new libraries are required for that proposal ...

Risks

Participants

  • Simon Cox, CESRE (CSIRO Australia)
  • Wahhaj Ali, Tianyi Chen, Cameron Fitzgerald, Joshua Hollick, Saxon Jensen, Rebecca Papadopoulos - University of Western Australia
  • Simon Pigot

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.