wiki:proposals/DCATandRDFServices

Version 6 (modified by fxp, 12 years ago) ( diff )

--

GeoNetwork Data Catalog Vocabulary services

Date 2012/05/10
Contact(s) François Prunayre, Paul Hasenohr
Last edited
Status Ongoing
Assigned to release 2.x
Resources Available (funding EEA)
Ticket # #912

  1. Overview
    1. Proposal Type
    2. Links
  2. Proposal
    1. RDF Model
    2. Formats
    3. Services
    4. Site map
  3. Example of use of RDF outputs
    1. Visualization tools
  4. Risks
  5. Participants

Overview

Data Catalog Vocabulary services in GeoNetwork opensource increase discoverability and enable applications easily to consume metadata. Those services provide information about 3 types of objects :

  • the catalogue,
  • the datasets and services in the catalogue
  • and the link to distributed resources.

The description contains relation to thesaurus (eg. GEMET), keywords and organization. The document could be used in a linked data context.

The output format produced by the services are based on DCAT, an RDF vocabulary. Two types of services are created:

  • Metadata service to access to one metadata record
  • Search service to search the catalogue and retrieve a set of metadata

The Data Catalog Vocabulary services could be used by Semantic web tools to harvest, search (eg. using SPARQL) and link catalogue content with other interlinked resources.

A semantic portal sitemap is created in order to be able to harvest the catalogue.

Proposal Type

  • Type: Discoverability
  • App: GeoNetwork
  • Module: Metadata and search services

Proposal

RDF Model

RDF model is defined for ISO19139, ISO19110 and Dublin Core standards in order to cover most of the metadata of the catalogue. Model is based on DCAT which is "an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web".

Vocabularies:

Prefix Specification Namespace
dcat http://www.w3.org/TR/vocab-dcat/#class--catalog http://www.w3.org/ns/dcat#
void http://www.w3.org/TR/void/ http://rdfs.org/ns/void#
dc http://dublincore.org/ http://purl.org/dc/elements/1.1/
dcterms http://purl.org/dc/terms/
dctype http://purl.org/dc/dcmitype/
foaf http://xmlns.com/foaf/spec/ http://xmlns.com/foaf/0.1/
skos http://www.w3.org/2009/08/skos-reference/skos.html# http://www.w3.org/2004/02/skos/core#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://www.w3.org/TR/void/"
    xmlns:dcat="http://www.w3.org/ns/dcat#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dct="http://purl.org/dc/terms/"
    xmlns:dctype="http://purl.org/dc/dcmitype/"
    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
    >

Classes:

  • Catalogue (dcat:Catalog) is the local catalogue or any harvested nodes.
     <!-- First, the local catalog description using dcat:Catalog.
            "Typically, a web-based data catalog is represented as a single instance of this class."
            ... also describe harvested catalogues if harvested records are in the current dump.
        -->
        <dcat:Catalog rdf:about="http://localhost:8080/geonetwork">
            <!-- A name given to the catalog. -->
            <dct:title xml:lang="en">My GeoNetwork geospatial metadata catalogue</dct:title>
            
            <!-- free-text account of the catalog. -->
            <dct:description></dct:description>
            
            <rdf:label xml:lang="en">My GeoNetwork geospatial metadata catalogue</rdf:label>
            
            <!-- The homepage of the catalog -->
            <foaf:homepage>http://localhost:8080/geonetwork</foaf:homepage>
            
            <!-- FIXME : void:Dataset -->
            <void:openSearchDescription>http://localhost:8080/geonetwork/srv/eng/portal.opensearch</void:openSearchDescription>
            <void:uriLookupEndpoint>http://localhost:8080/geonetwork/search/rdf?any=</void:uriLookupEndpoint>
            
            
            <!-- The entity responsible for making the catalog online. -->
            <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/1"/>
            
            <!-- The knowledge organization system (KOS) used to classify catalog's datasets. -->
            <dcat:themes rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
            
            <!-- The language of the catalog. This refers to the language used 
                in the textual metadata describing titles, descriptions, etc. 
                of the datasets in the catalog. 
            
                http://www.ietf.org/rfc/rfc3066.txt
                
                Multiple values can be used. The publisher might also choose to describe 
                the language on the dataset level (see dataset language).
            -->
            <dct:language>en</dct:language>
            
            
            <!-- This describes the license under which the catalog can be used/reused and not the datasets. 
                Even if the license of the catalog applies to all of its datasets it should be 
                replicated on each dataset.-->
            <dct:license>
                <!-- TODO using VoID-->
            </dct:license>
            
            <!-- The geographical area covered by the catalog. -->
            <dct:Location>
                <!-- TODO -->
            </dct:Location>
            
            <!-- List all catalogue records -->
            <dcat:dataset rdf:resource="http://localhost:8080/geonetwork/dataset/1"/>
            <dcat:record rdf:resource="http://localhost:8080/geonetwork/metadata/1"/>
        </dcat:Catalog>
        
    
  • Organization (foaf:Organization)
        <!-- Organization description. 
            Organization could be linked to a catalogue, a catalogue record.
            
            xpath: //gmd:organisationName
        -->
        <foaf:Organization rdf:about="http://localhost:8080/geonetwork/organization/1">
            <foaf:name></foaf:name>
            <!-- xpath: gmd:organisationName/gco:CharacterString -->
            <foaf:member>
                <foaf:Person rdf:resource=""/>
            </foaf:member>
        </foaf:Organization>
        
        <!-- Organization memeber
            
            xpath: //gmd:CI_ResponsibleParty-->
        <foaf:Person rdf:about="http://localhost:8080/geonetwork/person/ID">
            <foaf:name></foaf:name>
            <!-- xpath: gmd:individualName/gco:CharacterString -->
            <foaf:phone></foaf:phone>
            <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:CI_Telephone/gmd:voice/gco:CharacterString -->
            <foaf:mbox></foaf:mbox>
            <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString -->
        </foaf:Person>
        
    
  • Dataset (dcat:CatalogRecord+dcat:Dataset)
        
        <!-- Catalogue records
            "A record in a data catalog, describing a single dataset."        
            
            xpath: //gmd:MD_Metadata|//*[@gco:isoType='gmd:MD_Metadata']
        -->
        <dcat:CatalogRecord>
            <!-- Link to a dcat:Dataset or a rdf:Description for services and feature catalogue. -->
            <foaf:primaryTopic rdf:resource="http://localhost:8080/geonetwork/metadata/uuid"/>
            
            <!-- Metadata change date.
                "The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats)." -->
            <dct:issued></dct:issued>
            <dct:modified></dct:modified>
            <!-- xpath: gmd:dateStamp/gco:DateTime -->
        </dcat:CatalogRecord>
    
    
    
        <!-- Dataset
            "A collection of data, published or curated by a single source, and available for access or 
            download in one or more formats."
            
            xpath: //gmd:MD_DataIdentification|//*[@gco:isoType='gmd:MD_DataIdentification']
        -->
        <dcat:Dataset rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            
            <!-- "A unique identifier of the dataset." -->
            <dct:identifier></dct:identifier>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:identifier/*/gmd:code --> 
    
    
            <dc:title></dc:title>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:title/gco:CharacterString -->
            
            
            <dc:abstract></dc:abstract>
            <!-- xpath: gmd:identificationInfo/*/gmd:abstract/gco:CharacterString -->
            
                        
            <!-- "A keyword or tag describing the dataset."
                Create dcat:keyword if no thesaurus name information available.
            -->
            <dcat:keyword></dcat:keyword>
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords[not(gmd:thesaurusName)]/gmd:keyword/gco:CharaceterString --> 
            
            
            <!-- "The main category of the dataset. A dataset can have multiple themes." 
                Create dcat:theme if gmx:Anchor or GEMET concepts or INSPIRE themes
            -->
            <dcat:theme rdf:resource="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gmx:Anchor --> 
            <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharaceterString -->
            <!-- xpath: gmd:identificationInfo/*/gmd:topicCategory/gmd:MD_TopicCategoryCode -->
            
            
            <!-- Thumbnail -->
            <foaf:thumbnail rdf:resource=""/>
            <!-- xpath: gmd:identificationInfo/*/gmd:graphicOverview/gmd:MD_BrowseGraphic/gmd:fileName/gco:CharacterString -->
            
            
            <!-- "Spatial coverage of the dataset." -->
            <dct:spatial>Polygon(...)</dct:spatial>
            <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:geographicElement/gmd:EX_GeographicBoundingBox --> 
            
            
            <!-- "The temporal period that the dataset covers." -->
            <dct:temporal></dct:temporal>
            <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:temporalElement --> 
            
            
            <dct:issued></dct:issued>
            <dct:updated></dct:updated>
            <dct:modified></dct:modified>
            
            <!-- "An entity responsible for making the dataset available" -->
            <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/contactId"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:pointOfContact -->
            
            
            <!-- "The frequency with which dataset is published." See placetime.com intervals. -->
            <dct:accrualPeriodicity></dct:accrualPeriodicity>
            <!-- xpath: gmd:identificationInfo/*/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceAndUpdateFrequency/gmd:MD_MaintenanceFrequencyCode/@codeListValue -->
            
            <!-- "This is usually geographical or temporal but can also be other dimension" ??? -->
            <dcat:granularity></dcat:granularity>
            <!-- xpath: gmd:identificationInfo/*/gmd:spatialResolution/gmd:MD_Resolution/gmd:equivalentScale/gmd:MD_RepresentativeFraction/gmd:denominator/gco:Integer -->
            
            
            <!-- 
                "The language of the dataset."
                "This overrides the value of the catalog language in case of conflict"
            -->
            <dct:language></dct:language>
            <!-- xpath: gmd:identificationInfo/*/gmd:language/gmd:LanguageCode/@codeListValue -->
            
            
            <!-- "The license under which the dataset is published and can be reused." -->
            <dct:license></dct:license>
            <!-- xpath: gmd:identificationInfo/*/gmd:resourceConstraints/??? -->
            
            <dcat:Distribution rdf:resource=""/>
            <!-- xpath: gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource -->
            
            
            <!-- ISO19110 relation 
                "This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data."
            -->
            <dcat:dataDictionary rdf:resource=""/>
            <!-- xpath: gmd:contentInfo/gmd:MD_FeatureCatalogueDescription/gmd:featureCatalogueCitation/@uuidref -->
            
            <!-- 
                "A related document such as technical documentation, agency program page, citation, etc."            
            -->
            <dct:reference rdf:resource="url?"/>
            <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:otherCitationDetails/gco:CharacterString -->
            
            
            <!-- "describes the quality of data." -->
            <dcat:dataQuality>
                <!-- rdfs:literal -->
            </dcat:dataQuality>
            <!-- xpath: gmd:dataQualityInfo/*/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString -->
            
            
            <!-- FIXME ? 
                <void:dataDump></void:dataDump>-->
        </dcat:Dataset>
    
    
    
  • Series (dcat:CatalogRecord+dcat:Dataset+dc:relation)
  • Service (dcat:CatalogRecord+rdf:Description+dc:relation)
        <!-- Service 
            Create a simple rdf:Description. To be improved.
            
            xpath: //srv:SV_ServiceIdentification||//*[@gco:isoType='srv:SV_ServiceIdentification']
        -->
        <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            <!-- Same as dcat:Dataset without dcat:* -->
        </rdf:Description>
    
  • Feature catalogue (rdf:Description+dc:relation)
        <!-- Feature Catalogue     
            Create a simple rdf:Description. To be improved.
            
            
            xpath: //gfc:FC_FeatureCatalogue
        -->
        <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
            <dc:title></dc:title>
        </rdf:Description>
    
  • Thesaurus (skos:ConceptScheme)
        <!-- ConceptScheme describes all thesaurus available in the catalogue
        * Resource identifier is a local identifier for local thesaurus or public URI if external
        -->
        <skos:ConceptScheme rdf:about="http://localhost:8080/geonetwork/thesaurus/external.theme.inspire-theme">
            <dc:title>GEMET - INSPIRE themes, version 1.0</dc:title>
            <dc:description>INSPIRE themes thesaurus for GeoNetwork opensource.</dc:description>
            <dc:creator>
                <foaf:Organization>
                    <foaf:name>EEA</foaf:name>
                </foaf:Organization>
            </dc:creator>
            <dc:uri>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:uri>
            <dc:rights>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:rights>
            <dct:issued>2008-06-01</dct:issued>
            <dct:modified>2008-06-01</dct:modified>
        </skos:ConceptScheme>
        
    
  • Keyword (skos:Concept)
        
        <!-- Keywords -->
        <skos:Concept rdf:about="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId">
            <skos:inScheme rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
            <skos:prefLabel></skos:prefLabel>
        </skos:Concept>
        
    
  • Online resources (dcat:Distribution)
        <!-- Distribution 
            "Represents a specific available form of a dataset. Each dataset might be available in different 
            forms, these forms might represent different formats of the dataset, different endpoints,... 
            Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, 
            an RSS feed ..."
            
            Download, WebService, Feed
            
            xpath: //gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource
        -->
        <dcat:Distribution rdf:about="accessURL ?">
            <!-- 
                "points to the location of a distribution. This can be a direct download link, a link 
                to an HTML page containing a link to the actual data, Feed, Web Service etc. 
                the semantic is determined by its domain (Distribution, Feed, WebService, Download)." 
            -->
            <dcat:accessURL></dcat:accessURL>
            <!-- xpath: gmd:linkage/gmd:URL -->
            
            <dct:title></dct:title>
            <!-- xpath: gmd:name/gco:CharacterString -->
            
            <!-- "The size of a distribution.":N/A 
            <dcat:size></dcat:size>
            -->
            
            
            <dct:format>
                <!-- 
                    "the file format of the distribution." 
                    
                    "MIME type is used for values. A list of MIME types URLs can be found at IANA. 
                    However ESRI Shape files have no specific MIME type (A Shape distribution is actually 
                    a collection of files), currently this is still an open question?"
                    
                    In our case, Shapefile will be zipped !
                    
                    Mapping between protocol list and mime/type when needed
                -->
                <dct:IMT>
                    <rdf:value>text/csv</rdf:value>
                    <rdfs:label>CSV</rdfs:label>
                </dct:IMT>
            </dct:format>
            <!-- xpath: gmd:protocol/gco:CharacterString -->
            
        </dcat:Distribution>
        
    

Formats

  • RDF/XML is the output format for new services.
  • RDFa is used to add anotations to HTML pages.
  • Sitemap use XML file that uses the Semantic Crawling extension (See #81)

Services

New services:

  • Metadata service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.metadata.get?uuid=<uuid>
  • RDF search service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.search?
    • All GeoNetwork search criteria can be used to extract a subset of the catalogue.
  • Sitemap: http://<server_host>:<server_port>/<catalogue>/srv/eng/portal.sitemap?type=rdf

Rewriting rules for simple URL:

  • http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf
  • http://<server_host>:<server_port>/<catalogue>/search/rdf?

Conversion:

  • <schema>/convert/rdf.xsl

Schema supported:

  • ISO19139
  • ISO19110
  • dublin-core

Site map

A sitemap using the semantic crawling extension is added to existing XML sitemap.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
  <sc:dataset>
    <sc:datasetLabel>My GeoNetwork full content catalogue for Linked Data spiders (RDF)</sc:datasetLabel>

For 5 latests update:
    <sc:sampleURI>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:sampleURI>


Link to a full dump using the search API
    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/search/rdf/</sc:dataDumpLocation>
or provide for all catalogue record a link using
    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:dataDumpLocation>
    <changefreq>daily</changefreq>
  </sc:dataset>
</urlset>

Sitemap will be accessible using existing sitemap service with type=rdf as parameter: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?type=rdf

In robots.txt, the following line is added:

Sitemap: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?type=rdf

Example of use of RDF outputs

Visualization tools

Running a search on the catalogue using the rdf.search service will provide a full or partial view of the catalogue which could be analyzed in visualization tools.

Risks

Participants

  • Francois Prunayre

Attachments (6)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.