wiki:proposals/DCATandRDFServices

GeoNetwork Data Catalog Vocabulary services

Date 2012/05/10
Contact(s) François Prunayre, Paul Hasenohr
Last edited
Status Motion passed - Done
Assigned to release 2.9
Resources Available (funding EEA)
Ticket # #912
Github source https://github.com/fxprunayre/core-geonetwork/tree/feature/dcat-rdf

  1. Overview
    1. Proposal Type
    2. Links
    3. Voting history
  2. Proposal
    1. RDF Model
    2. Formats
    3. Services
    4. Site map
  3. Using RDF outputs
    1. Save metadata as xml
    2. Visualization tools
    3. SPARQL queries
  4. Risks
  5. Future improvement
  6. Participants

Overview

Data Catalog Vocabulary services in GeoNetwork opensource increase discoverability and enable applications easily to consume metadata. Those services provide information about 3 types of objects :

  • the catalogue,
  • the datasets and services in the catalogue
  • and the link to distributed resources.

The description contains relation to thesaurus (eg. GEMET), keywords and organization. The document could be used in a linked data context.

The output format produced by the services are based on DCAT, an RDF vocabulary. Two types of services are created:

  • Metadata service to access to one metadata record
  • Search service to search the catalogue and retrieve a set of metadata

The Data Catalog Vocabulary services could be used by Semantic web tools to harvest, search (eg. using SPARQL) and link catalogue content with other interlinked resources.

A semantic portal sitemap is created in order to be able to harvest the catalogue.

Proposal Type

  • Type: Discoverability
  • App: GeoNetwork
  • Module: Metadata and search services

Voting history

Vote proposed by Francois on 2012/07/04, result was

  • +1 from Jeroen, Simon, Francois

Proposal

RDF Model

RDF model is defined for ISO19139, ISO19110 and Dublin Core standards in order to cover most of the metadata of the catalogue. Model is based on DCAT which is "an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web".

Vocabularies:

Prefix Specification Namespace
dcat http://www.w3.org/TR/vocab-dcat/#class--catalog http://www.w3.org/ns/dcat#
void http://www.w3.org/TR/void/ http://rdfs.org/ns/void#
dc http://dublincore.org/ http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
dctype http://purl.org/dc/dcmitype/
foaf http://xmlns.com/foaf/spec/ http://xmlns.com/foaf/0.1/
skos http://www.w3.org/2009/08/skos-reference/skos.html# http://www.w3.org/2004/02/skos/core#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://www.w3.org/TR/void/"
    xmlns:dcat="http://www.w3.org/ns/dcat#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dct="http://purl.org/dc/terms/"
    xmlns:dctype="http://purl.org/dc/dcmitype/"
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    >

Classes:

  • Catalogue (dcat:Catalog) is the local catalogue or any harvested nodes.
     <!-- First, the local catalog description using dcat:Catalog.
            "Typically, a web-based data catalog is represented as a single instance of this class."
            ... also describe harvested catalogues if harvested records are in the current dump.
        -->
        <dcat:Catalog rdf:about="http://localhost:8080/geonetwork">
            <!-- A name given to the catalog. -->
            <dct:title xml:lang="en">My GeoNetwork geospatial metadata catalogue</dct:title>
            
            <!-- free-text account of the catalog. -->
            <dct:description></dct:description>
            
            <rdf:label xml:lang="en">My GeoNetwork geospatial metadata catalogue</rdf:label>
            
            <!-- The homepage of the catalog -->
            <foaf:homepage>http://localhost:8080/geonetwork</foaf:homepage>
            
            <!-- FIXME : void:Dataset -->
            <void:openSearchDescription>http://localhost:8080/geonetwork/srv/eng/portal.opensearch</void:openSearchDescription>
            <void:uriLookupEndpoint>http://localhost:8080/geonetwork/search/rdf?any=</void:uriLookupEndpoint>
            
            
            <!-- The entity responsible for making the catalog online. -->
            <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/1"/>
            
            <!-- The knowledge organization system (KOS) used to classify catalog's datasets. -->
            <dcat:themes rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
            
            <!-- The language of the catalog. This refers to the language used 
                in the textual metadata describing titles, descriptions, etc. 
                of the datasets in the catalog. 
            
                http://www.ietf.org/rfc/rfc3066.txt
                
                Multiple values can be used. The publisher might also choose to describe 
                the language on the dataset level (see dataset language).
            -->
            <dct:language>en</dct:language>
            
            
            <!-- This describes the license under which the catalog can be used/reused and not the datasets. 
                Even if the license of the catalog applies to all of its datasets it should be 
                replicated on each dataset.-->
            <dct:license>
                <!-- TODO using VoID-->
            </dct:license>
            
            <!-- The geographical area covered by the catalog. -->
            <dct:Location>
                <!-- TODO -->
            </dct:Location>
            
            <!-- List all catalogue records -->
            <dcat:dataset rdf:resource="http://localhost:8080/geonetwork/dataset/1"/>
            <dcat:record rdf:resource="http://localhost:8080/geonetwork/metadata/1"/>
        </dcat:Catalog>
        
    
  • Organization (foaf:Organization)
        <!-- Organization description. 
            Organization could be linked to a catalogue, a catalogue record.
            
            xpath: //gmd:organisationName
        -->
        <foaf:Organization rdf:about="http://localhost:8080/geonetwork/organization/1">
            <foaf:name></foaf:name>
            <!-- xpath: gmd:organisationName/gco:CharacterString -->
            <foaf:member>
                <foaf:Person rdf:resource=""/>
            </foaf:member>
        </foaf:Organization>
        
        <!-- Organization memeber
            
            xpath: //gmd:CI_ResponsibleParty-->
        <foaf:Person rdf:about="http://localhost:8080/geonetwork/person/ID">
            <foaf:name></foaf:name>
            <!-- xpath: gmd:individualName/gco:CharacterString -->
            <foaf:phone></foaf:phone>
            <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:CI_Telephone/gmd:voice/gco:CharacterString -->
            <foaf:mbox></foaf:mbox>
            <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString -->
        </foaf:Person>
        
    
  • Dataset (dcat:CatalogRecord+dcat:Dataset)
        
        <!-- Catalogue records
            "A record in a data catalog, describing a single dataset."        
            
            xpath: //gmd:MD_Metadata|//*[@gco:isoType='gmd:MD_Metadata']
        -->
        <dcat:CatalogRecord>
            <!-- Link to a dcat:Dataset or a rdf:Description for services and feature catalogue. -->
            <foaf:primaryTopic rdf:resource="http://localhost:8080/geonetwork/metadata/uuid"/>
            
            <!-- Metadata change date.
                "The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats)." -->
            <dct:issued></dct:issued>
            <dct:modified></dct:modified>
            <!-- xpath: gmd:dateStamp/gco:DateTime -->
        </dcat:CatalogRecord>
    
    
    
        <!-- Dataset
            "A collection of data, published or curated by a single source, and available for access or 
            download in one or more formats."
            
            xpath: //gmd:MD_DataIdentification|//*[@gco:isoType='gmd:MD_DataIdentification']
        -->
        <dcat:Dataset rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            
            <!-- "A unique identifier of the dataset." -->
            <dct:identifier></dct:identifier>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:identifier/*/gmd:code --> 
    
    
            <dc:title></dc:title>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:title/gco:CharacterString -->
            
            
            <dc:abstract></dc:abstract>
            <!-- xpath: gmd:identificationInfo/*/gmd:abstract/gco:CharacterString -->
            
                        
            <!-- "A keyword or tag describing the dataset."
                Create dcat:keyword if no thesaurus name information available.
            -->
            <dcat:keyword></dcat:keyword>
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords[not(gmd:thesaurusName)]/gmd:keyword/gco:CharaceterString --> 
            
            
            <!-- "The main category of the dataset. A dataset can have multiple themes." 
                Create dcat:theme if gmx:Anchor or GEMET concepts or INSPIRE themes
            -->
            <dcat:theme rdf:resource="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gmx:Anchor --> 
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharaceterString -->
            <!-- xpath: gmd:identificationInfo/*/gmd:topicCategory/gmd:MD_TopicCategoryCode -->
            
            
            <!-- Thumbnail -->
            <foaf:thumbnail rdf:resource=""/>
            <!-- xpath: gmd:identificationInfo/*/gmd:graphicOverview/gmd:MD_BrowseGraphic/gmd:fileName/gco:CharacterString -->
            
            
            <!-- "Spatial coverage of the dataset." -->
            <dct:spatial>
               <ogc:Polygon xmlns:ogc="http://www.opengis.net/rdf#">
                 <ogc:asWKT rdf:datatype="http://www.opengis.net/rdf#WKTLiteral">
                   <http://www.opengis.net/def/crs/OGC/1.3/CRS84> Polygon((13.208233 50.71671, 13.208233 51.24864, 14.40099 51.24864, 14.40099 50.71671, 13.208233 50.71671))
                 </ogc:asWKT>
               </ogc:Polygon>
           </dct:spatial>
            <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:geographicElement/gmd:EX_GeographicBoundingBox --> 
            
            
            <!-- "The temporal period that the dataset covers." -->
            <dct:temporal></dct:temporal>
            <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:temporalElement --> 
            
            
            <dct:issued></dct:issued>
            <dct:updated></dct:updated>
            <dct:modified></dct:modified>
            
            <!-- "An entity responsible for making the dataset available" -->
            <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/contactId"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:pointOfContact -->
            
            
            <!-- "The frequency with which dataset is published." See placetime.com intervals. -->
            <dct:accrualPeriodicity></dct:accrualPeriodicity>
            <!-- xpath: gmd:identificationInfo/*/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceAndUpdateFrequency/gmd:MD_MaintenanceFrequencyCode/@codeListValue -->
            
            <!-- "This is usually geographical or temporal but can also be other dimension" ??? -->
            <dcat:granularity></dcat:granularity>
            <!-- xpath: gmd:identificationInfo/*/gmd:spatialResolution/gmd:MD_Resolution/gmd:equivalentScale/gmd:MD_RepresentativeFraction/gmd:denominator/gco:Integer -->
            
            
            <!-- 
                "The language of the dataset."
                "This overrides the value of the catalog language in case of conflict"
            -->
            <dct:language></dct:language>
            <!-- xpath: gmd:identificationInfo/*/gmd:language/gmd:LanguageCode/@codeListValue -->
            
            
            <!-- "The license under which the dataset is published and can be reused." -->
            <dct:license></dct:license>
            <!-- xpath: gmd:identificationInfo/*/gmd:resourceConstraints/??? -->
            
            <dcat:Distribution rdf:resource=""/>
            <!-- xpath: gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource -->
            
            
            <!-- ISO19110 relation 
                "This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data."
            -->
            <dcat:dataDictionary rdf:resource=""/>
            <!-- xpath: gmd:contentInfo/gmd:MD_FeatureCatalogueDescription/gmd:featureCatalogueCitation/@uuidref -->
            
            <!-- 
                "A related document such as technical documentation, agency program page, citation, etc."            
            -->
            <dct:reference rdf:resource="url?"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:otherCitationDetails/gco:CharacterString -->
            
            
            <!-- "describes the quality of data." -->
            <dcat:dataQuality>
                <!-- rdfs:literal -->
            </dcat:dataQuality>
            <!-- xpath: gmd:dataQualityInfo/*/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString -->
            
            
            <!-- FIXME ? 
                <void:dataDump></void:dataDump>-->
        </dcat:Dataset>
    
    
    
  • Series (dcat:CatalogRecord+dcat:Dataset+dc:relation)
  • Service (dcat:CatalogRecord+rdf:Description+dc:relation)
        <!-- Service 
            Create a simple rdf:Description. To be improved.
            
            xpath: //srv:SV_ServiceIdentification||//*[@gco:isoType='srv:SV_ServiceIdentification']
        -->
        <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            <!-- Same as dcat:Dataset without dcat:* -->
        </rdf:Description>
    
  • Feature catalogue (rdf:Description+dc:relation)
        <!-- Feature Catalogue     
            Create a simple rdf:Description. To be improved.
            
            
            xpath: //gfc:FC_FeatureCatalogue
        -->
        <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            <dc:title></dc:title>
        </rdf:Description>
    
  • Thesaurus (skos:ConceptScheme)
        <!-- ConceptScheme describes all thesaurus available in the catalogue
        * Resource identifier is a local identifier for local thesaurus or public URI if external
        -->
        <skos:ConceptScheme rdf:about="http://localhost:8080/geonetwork/thesaurus/external.theme.inspire-theme">
            <dc:title>GEMET - INSPIRE themes, version 1.0</dc:title>
            <dc:description>INSPIRE themes thesaurus for GeoNetwork opensource.</dc:description>
            <dc:creator>
                <foaf:Organization>
                    <foaf:name>EEA</foaf:name>
                </foaf:Organization>
            </dc:creator>
            <dc:uri>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:uri>
            <dc:rights>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:rights>
            <dct:issued>2008-06-01</dct:issued>
            <dct:modified>2008-06-01</dct:modified>
        </skos:ConceptScheme>
        
    
  • Keyword (skos:Concept)
        
        <!-- Keywords -->
        <skos:Concept rdf:about="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId">
            <skos:inScheme rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
            <skos:prefLabel></skos:prefLabel>
        </skos:Concept>
        
    
  • Online resources (dcat:Distribution)
        <!-- Distribution 
            "Represents a specific available form of a dataset. Each dataset might be available in different 
            forms, these forms might represent different formats of the dataset, different endpoints,... 
            Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, 
            an RSS feed ..."
            
            Download, WebService, Feed
            
            xpath: //gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource
        -->
        <dcat:Distribution rdf:about="accessURL ?">
            <!-- 
                "points to the location of a distribution. This can be a direct download link, a link 
                to an HTML page containing a link to the actual data, Feed, Web Service etc. 
                the semantic is determined by its domain (Distribution, Feed, WebService, Download)." 
            -->
            <dcat:accessURL></dcat:accessURL>
            <!-- xpath: gmd:linkage/gmd:URL -->
            
            <dct:title></dct:title>
            <!-- xpath: gmd:name/gco:CharacterString -->
            
            <!-- "The size of a distribution.":N/A 
            <dcat:size></dcat:size>
            -->
            
            
            <dct:format>
                <!-- 
                    "the file format of the distribution." 
                    
                    "MIME type is used for values. A list of MIME types URLs can be found at IANA. 
                    However ESRI Shape files have no specific MIME type (A Shape distribution is actually 
                    a collection of files), currently this is still an open question?"
                    
                    In our case, Shapefile will be zipped !
                    
                    Mapping between protocol list and mime/type when needed
                -->
                <dct:IMT>
                    <rdf:value>text/csv</rdf:value>
                    <rdfs:label>CSV</rdfs:label>
                </dct:IMT>
            </dct:format>
            <!-- xpath: gmd:protocol/gco:CharacterString -->
            
        </dcat:Distribution>
        
    

Formats

  • RDF/XML is the output format for new services.
  • RDFa is used to add anotations to HTML pages.
  • Sitemap use XML file that uses the Semantic Crawling extension (See #81)

Services

New services:

  • Metadata service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.metadata.get?uuid=<uuid>
  • RDF search service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.search?
    • All GeoNetwork search criteria can be used to extract a subset of the catalogue.
  • Sitemap: http://<server_host>:<server_port>/<catalogue>/srv/eng/portal.sitemap?format=rdf

Rewriting rules for simple URL:

  • http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf
  • http://<server_host>:<server_port>/<catalogue>/search/rdf?

Conversion:

  • <schema>/convert/rdf.xsl

Schema supported:

  • ISO19139
  • ISO19110
  • dublin-core

Site map

A sitemap using the semantic crawling extension is added to existing XML sitemap.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
  <sc:dataset>
    <sc:datasetLabel>My GeoNetwork full content catalogue for Linked Data spiders (RDF)</sc:datasetLabel>

For 5 latests update:
    <sc:sampleURI>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:sampleURI>


Link to a full dump using the search API
    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/search/rdf/</sc:dataDumpLocation>
or provide for all catalogue record a link using
    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:dataDumpLocation>
    <changefreq>daily</changefreq>
  </sc:dataset>
</urlset>

Sitemap will be accessible using existing sitemap service with format=rdf as parameter: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?format=rdf

In robots.txt, the following line is added:

Sitemap: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?format=rdf

Using RDF outputs

Save metadata as xml

A save as RDF is added to the metadata menu:

Visualization tools

Running a search on the catalogue using the rdf.search service will provide a full or partial view of the catalogue which could be analyzed in visualization tools.

SPARQL queries

Once loaded in a SPARQL endpoint, the catalogue content could be queried using SPARQL:

  • Get metadata titles
    sparql select ?title where {?s <http://purl.org/dc/elements/1.1/title> ?title};
    
  • Get metadata about transport network
    sparql 
    PREFIX  dc: <http://purl.org/dc/elements/1.1/>
    PREFIX  dcat: <http://www.w3.org/ns/dcat#>
    PREFIX  skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT  ?title, ?label
    WHERE   { ?x dc:title ?title .
    	  ?x dcat:theme ?theme .
              ?theme skos:prefLabel ?label
              FILTER(?label = "Transport network")
            };
    

Risks

Future improvement

This proposal does not cover the following items which could be addressed in future works:

  • multilingual RDF output for multilingual metadata records

Participants

  • Francois Prunayre
Last modified 6 years ago Last modified on Sep 20, 2012 6:35:14 AM

Attachments (6)

Download all attachments as: .zip