= Using RDF as metadata storage = || '''Date''' || 2012-10-28 || || '''Contact(s)''' || Simon Pigot || || '''Last edited''' || || || '''Status''' || draft, being discussed, in progress || || '''Assigned to release''' || Not yet assigned to a release || || '''Resources''' || Not allocated yet || || '''Ticket #''' || #XYZ || == Overview == !GeoNetwork stores metadata records from different schemas as rows in a database table. To provide search, a metadata record is: * transformed into to a common XML index document via XSLT; * the common XML document is ingested by Lucene, which creates an index of the fields within the document; * the Lucene index and query format is used for searching The essence of this proposal is to change this process as follows: * transform the metadata record into an RDF (resource data format) document when it is ingested by !GeoNetwork * store the RDF document in an RDF triple store * use the RDF triple store and the SPARQL query language for searching Why would we do this? * Simplify the architecture of !GeoNetwork: metadata would be stored and searched in the same persistence solution - the RDF triple store * RDF is purpose designed for representing facts and relationships between facts * RDF triple stores and the SPARQL/GEOSPARQL query language are designed to query facts and relationships between facts === Proposal Type === * '''Type''': Core Change * '''App''': !GeoNetwork * '''Module''': Data Manager, Search, === Links === * '''Introduction to RDF''': http://www.rdfabout.com/quickintro.xpd * '''Mappings from ISO metadata standards to RDF''': http://def.seegrid.csiro.au/isotc211/iso19115/2003 (mapping from ISO19115 to RDF), http://def.seegrid.csiro.au/isotc211/iso19119/2005/ (mapping from ISO19119 to RDF) * '''GIT Repository''': https://github.com/cipherj/core-geonetwork.git (rdf-store branch) * '''Apache JENA''': http://jena.apache.org/ (rdf triple store used in UWA patch) * '''Geospatial reasoning for Apache JENA''': http://code.google.com/p/geospatialweb/ * '''Geospatial reasoning for OpenRDF-sesame''': https://dev.opensahara.com/projects/useekm === Voting History === * Not proposed for voting yet. ---- == Motivations == TBA == Proposal == TBA === Issues === Object Identifiers: One of the stated key advantages of RDF is that objects are identified once and then reused. In the work done to date, I don't see how converting a record to RDF will identify the individual objects for reuse eg. if a piece of contact info is present in two different metadata records, then how is that object uniquely identified? Perhaps the object identifier could be derived from an md5sum on the content of the object? Profile support in ISO19115 mapping: introduce additional rdf namespaces/concepts? Relationship to DCAT proposal? Speed of RDF triple stores versus Lucene? Free text search in Apache JENA RDF triple store/sparql queries is supported by using Lucene to help - see LARQ sub-project: http://jena.apache.org/documentation/larq/index.html Spatial searching: At present we can do mixed spatial and textual searches for OGC CSW support by filtering Lucene searches with query results from spatial database. How would this work in SPARQL? OGC GeoSPARQL would be the approach here I suppose: http://code.google.com/p/geospatialweb/ How mature is the GeoSPARQL implementation for Apache JENA? Two RDF triple stores now used in !GeoNetwork: OpenRDF/sesame and now Apache JENA? === Backwards Compatibility Issues === We have begun to use Lucene as a very fast persistence in place of the database (cf for example, search service q). Need to determine whether these queries can also be run quickly against the RDF triple store. RDF mappings for other standards? Probably most popular standards will have projects in place or ongoing to do mappings to RDF (eg. dublin core), however some of the more substantial ISO efforts (ISO19110, ISO19135 and others) are less likely to have these so could be a fair body of work to do these (that said, some of the metadata standards have some concepts mapped to ISO19115 so could be reasonably straightforward to use that mapping to the RDF for ISO19115). ? === New libraries added === Apache JENA - used as RDF triple store in UWA patch (but OpenRDF-Sesame is already being used in !GeoNetwork). == Risks == RDF and the semantic web has been the 'coming' technology for some time now. Cynically speaking, could this be another ebRIM? Somewhat mitigated by the maturity of OpenRDF-sesame and Apache JENA and the fact that !GeoNetwork already relies upon OpenRDF-sesame for vocabulary support. == Participants == * Simon Cox, CESRE (CSIRO Australia) * Wahhaj Ali, Tianyi Chen, Cameron Fitzgerald, Joshua Hollick, Saxon Jensen, Rebecca Papadopoulos - University of Western Australia * Simon Pigot, CSIRO Australia and !GeoNetwork PSC member