Changes between Initial Version and Version 1 of proposals/DCATandRDFServices


Ignore:
Timestamp:
May 22, 2012, 10:52:05 PM (12 years ago)
Author:
fxp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • proposals/DCATandRDFServices

    v1 v1  
     1
     2= GeoNetwork Data Catalog Vocabulary services =
     3
     4|| '''Date''' || 2012/05/10 ||
     5|| '''Contact(s)''' || François Prunayre, Paul Hasenohr ||
     6|| '''Last edited''' || ||
     7|| '''Status''' || Ongoing ||
     8|| '''Assigned to release''' || 2.x ||
     9|| '''Resources''' || Available (funding EEA) ||
     10|| '''Ticket #''' || TODO ||
     11
     12
     13== Overview ==
     14
     15Data Catalog Vocabulary services in !GeoNetwork opensource increase discoverability and enable applications easily to consume metadata. Those services provide information about 3 types of objects :
     16 * the catalogue,
     17 * the datasets and services in the catalogue
     18 * and the link to distributed resources.
     19
     20The description contains relation to thesaurus (eg. GEMET), keywords and organization. The document could be used in a linked data context.
     21
     22The output format produced by the services are based on DCAT, an RDF vocabulary. Two types of services are created:
     23 * Metadata service to access to one metadata record
     24 * Search service to search the catalogue and retrieve a set of metadata
     25
     26The Data Catalog Vocabulary services could be used by Semantic web tools to harvest, search (eg. using SPARQL) and link catalogue content with other interlinked resources.
     27
     28A semantic portal sitemap is created in order to be able to harvest the catalogue.
     29
     30=== Proposal Type ===
     31 * '''Type''': Discoverability
     32 * '''App''': !GeoNetwork
     33 * '''Module''': Metadata and search services
     34
     35=== Links ===
     36 * '''Documents''':
     37  * Data Catalog Vocabulary (DCAT) http://www.w3.org/TR/vocab-dcat/#property--data-dictionary
     38  * Vocabulary of interlinked Dataset (VoID) http://www.w3.org/TR/void/
     39  * http://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
     40  * Semantic Web Crawling: A Sitemap Extension http://sw.deri.org/2007/07/sitemapextension/
     41  * http://geovocab.org/doc/neogeo.html
     42
     43
     44
     45== Proposal ==
     46
     47=== RDF Model ===
     48RDF model is defined for ISO19139, ISO19110 and Dublin Core standards in order to cover most of the metadata of the catalogue. Model is based on DCAT which is "an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web".
     49
     50Vocabularies:
     51
     52|| Prefix  || Specification                                       || Namespace                                    ||
     53|| dcat    || http://www.w3.org/TR/vocab-dcat/#class--catalog     || http://www.w3.org/ns/dcat#                   ||
     54|| void    || http://www.w3.org/TR/void/                          || http://rdfs.org/ns/void#                     ||
     55|| dc      || http://dublincore.org/                              || http://purl.org/dc/elements/1.1/             ||
     56|| dcterms ||                                                     || http://purl.org/dc/terms/                    ||
     57|| dctype  ||                                                     || http://purl.org/dc/dcmitype/                 ||
     58|| foaf    || http://xmlns.com/foaf/spec/                         || http://xmlns.com/foaf/0.1/                   ||
     59|| skos    || http://www.w3.org/2009/08/skos-reference/skos.html# || http://www.w3.org/2004/02/skos/core#         ||
     60|| rdf     ||                                                     || http://www.w3.org/1999/02/22-rdf-syntax-ns#  ||
     61|| rdfs    ||                                                     || http://www.w3.org/2000/01/rdf-schema#        ||
     62
     63
     64{{{
     65<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     66    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     67    xmlns:foaf="http://xmlns.com/foaf/0.1/"
     68    xmlns:void="http://www.w3.org/TR/void/"
     69    xmlns:dcat="http://www.w3.org/ns/dcat#"
     70    xmlns:dc="http://purl.org/dc/elements/1.1/"
     71    xmlns:dct="http://purl.org/dc/terms/"
     72    xmlns:dctype="http://purl.org/dc/dcmitype/"
     73    xmlns:skos="http://www.w3.org/2004/02/skos/core#"
     74    >
     75}}}
     76
     77Classes:
     78 * Catalogue (dcat:Catalog) is the local catalogue or any harvested nodes.
     79{{{
     80 <!-- First, the local catalog description using dcat:Catalog.
     81        "Typically, a web-based data catalog is represented as a single instance of this class."
     82        ... also describe harvested catalogues if harvested records are in the current dump.
     83    -->
     84    <dcat:Catalog rdf:about="http://localhost:8080/geonetwork">
     85        <!-- A name given to the catalog. -->
     86        <dct:title xml:lang="en">My GeoNetwork geospatial metadata catalogue</dct:title>
     87       
     88        <!-- free-text account of the catalog. -->
     89        <dct:description></dct:description>
     90       
     91        <rdf:label xml:lang="en">My GeoNetwork geospatial metadata catalogue</rdf:label>
     92       
     93        <!-- The homepage of the catalog -->
     94        <foaf:homepage>http://localhost:8080/geonetwork</foaf:homepage>
     95       
     96        <!-- FIXME : void:Dataset -->
     97        <void:openSearchDescription>http://localhost:8080/geonetwork/srv/eng/portal.opensearch</void:openSearchDescription>
     98        <void:uriLookupEndpoint>http://localhost:8080/geonetwork/search/rdf?any=</void:uriLookupEndpoint>
     99       
     100       
     101        <!-- The entity responsible for making the catalog online. -->
     102        <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/1"/>
     103       
     104        <!-- The knowledge organization system (KOS) used to classify catalog's datasets. -->
     105        <dcat:themes rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
     106       
     107        <!-- The language of the catalog. This refers to the language used
     108            in the textual metadata describing titles, descriptions, etc.
     109            of the datasets in the catalog.
     110       
     111            http://www.ietf.org/rfc/rfc3066.txt
     112           
     113            Multiple values can be used. The publisher might also choose to describe
     114            the language on the dataset level (see dataset language).
     115        -->
     116        <dct:language>en</dct:language>
     117       
     118       
     119        <!-- This describes the license under which the catalog can be used/reused and not the datasets.
     120            Even if the license of the catalog applies to all of its datasets it should be
     121            replicated on each dataset.-->
     122        <dct:license>
     123            <!-- TODO using VoID-->
     124        </dct:license>
     125       
     126        <!-- The geographical area covered by the catalog. -->
     127        <dct:Location>
     128            <!-- TODO -->
     129        </dct:Location>
     130       
     131        <!-- List all catalogue records -->
     132        <dcat:dataset rdf:resource="http://localhost:8080/geonetwork/dataset/1"/>
     133        <dcat:record rdf:resource="http://localhost:8080/geonetwork/metadata/1"/>
     134        <!-- TODO : series, service, feature catalogue -->
     135    </dcat:Catalog>
     136   
     137}}}
     138 * Organization (foaf:Organization)
     139{{{
     140    <!-- Organization description.
     141        Organization could be linked to a catalogue, a catalogue record.
     142       
     143        xpath: //gmd:organisationName
     144    -->
     145    <foaf:Organization rdf:about="http://localhost:8080/geonetwork/organization/1">
     146        <foaf:name></foaf:name>
     147        <!-- xpath: gmd:organisationName/gco:CharacterString -->
     148        <foaf:member>
     149            <foaf:Person rdf:resource=""/>
     150        </foaf:member>
     151    </foaf:Organization>
     152   
     153    <!-- Organization memeber
     154       
     155        xpath: //gmd:CI_ResponsibleParty-->
     156    <foaf:Person rdf:about="http://localhost:8080/geonetwork/person/ID">
     157        <foaf:name></foaf:name>
     158        <!-- xpath: gmd:individualName/gco:CharacterString -->
     159        <foaf:phone></foaf:phone>
     160        <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:CI_Telephone/gmd:voice/gco:CharacterString -->
     161        <foaf:mbox></foaf:mbox>
     162        <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString -->
     163    </foaf:Person>
     164   
     165}}}
     166 * Dataset (dcat:CatalogRecord+dcat:Dataset)
     167{{{
     168
     169   
     170    <!-- Catalogue records
     171        "A record in a data catalog, describing a single dataset."       
     172       
     173        xpath: //gmd:MD_Metadata|//*[@gco:isoType='gmd:MD_Metadata']
     174    -->
     175    <dcat:CatalogRecord>
     176        <!-- Link to a dcat:Dataset or a rdf:Description for services and feature catalogue. -->
     177        <foaf:primaryTopic rdf:resource="http://localhost:8080/geonetwork/metadata/uuid"/>
     178       
     179        <!-- Metadata change date.
     180            "The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats)." -->
     181        <dct:issued></dct:issued>
     182        <dct:modified></dct:modified>
     183        <!-- xpath: gmd:dateStamp/gco:DateTime -->
     184    </dcat:CatalogRecord>
     185
     186
     187
     188    <!-- Dataset
     189        "A collection of data, published or curated by a single source, and available for access or
     190        download in one or more formats."
     191       
     192        xpath: //gmd:MD_DataIdentification|//*[@gco:isoType='gmd:MD_DataIdentification']
     193    -->
     194    <dcat:Dataset rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
     195       
     196        <!-- "A unique identifier of the dataset." -->
     197        <dct:identifier></dct:identifier>
     198        <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:identifier/*/gmd:code -->
     199
     200
     201        <dc:title></dc:title>
     202        <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:title/gco:CharacterString -->
     203       
     204       
     205        <dc:abstract></dc:abstract>
     206        <!-- xpath: gmd:identificationInfo/*/gmd:abstract/gco:CharacterString -->
     207       
     208                   
     209        <!-- "A keyword or tag describing the dataset."
     210            Create dcat:keyword if no thesaurus name information available.
     211        -->
     212        <dcat:keyword></dcat:keyword>
     213        <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords[not(gmd:thesaurusName)]/gmd:keyword/gco:CharaceterString -->
     214       
     215       
     216        <!-- "The main category of the dataset. A dataset can have multiple themes."
     217            Create dcat:theme if gmx:Anchor or GEMET concepts or INSPIRE themes
     218        -->
     219        <dcat:theme rdf:resource="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId"/>
     220        <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gmx:Anchor -->
     221        <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharaceterString -->
     222        <!-- xpath: gmd:identificationInfo/*/gmd:topicCategory/gmd:MD_TopicCategoryCode -->
     223       
     224       
     225        <!-- Thumbnail -->
     226        <foaf:thumbnail rdf:resource=""/>
     227        <!-- xpath: gmd:identificationInfo/*/gmd:graphicOverview/gmd:MD_BrowseGraphic/gmd:fileName/gco:CharacterString -->
     228       
     229       
     230        <!-- "Spatial coverage of the dataset." -->
     231        <dct:spatial>Polygon(...)</dct:spatial>
     232        <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:geographicElement/gmd:EX_GeographicBoundingBox -->
     233       
     234       
     235        <!-- "The temporal period that the dataset covers." -->
     236        <dct:temporal></dct:temporal>
     237        <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:temporalElement -->
     238       
     239       
     240        <dct:issued></dct:issued>
     241        <dct:updated></dct:updated>
     242        <dct:modified></dct:modified>
     243       
     244        <!-- "An entity responsible for making the dataset available" -->
     245        <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/contactId"/>
     246        <!-- xpath: gmd:identificationInfo/*/gmd:pointOfContact -->
     247       
     248       
     249        <!-- "The frequency with which dataset is published." See placetime.com intervals. -->
     250        <dct:accrualPeriodicity></dct:accrualPeriodicity>
     251        <!-- xpath: gmd:identificationInfo/*/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceAndUpdateFrequency/gmd:MD_MaintenanceFrequencyCode/@codeListValue -->
     252       
     253        <!-- "This is usually geographical or temporal but can also be other dimension" ??? -->
     254        <dcat:granularity></dcat:granularity>
     255        <!-- xpath: gmd:identificationInfo/*/gmd:spatialResolution/gmd:MD_Resolution/gmd:equivalentScale/gmd:MD_RepresentativeFraction/gmd:denominator/gco:Integer -->
     256       
     257       
     258        <!--
     259            "The language of the dataset."
     260            "This overrides the value of the catalog language in case of conflict"
     261        -->
     262        <dct:language></dct:language>
     263        <!-- xpath: gmd:identificationInfo/*/gmd:language/gmd:LanguageCode/@codeListValue -->
     264       
     265       
     266        <!-- "The license under which the dataset is published and can be reused." -->
     267        <dct:license></dct:license>
     268        <!-- xpath: gmd:identificationInfo/*/gmd:resourceConstraints/??? -->
     269       
     270        <dcat:Distribution rdf:resource=""/>
     271        <!-- xpath: gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource -->
     272       
     273       
     274        <!-- ISO19110 relation
     275            "This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data."
     276        -->
     277        <dcat:dataDictionary rdf:resource=""/>
     278        <!-- xpath: gmd:contentInfo/gmd:MD_FeatureCatalogueDescription/gmd:featureCatalogueCitation/@uuidref -->
     279       
     280        <!--
     281            "A related document such as technical documentation, agency program page, citation, etc."           
     282        -->
     283        <dct:reference rdf:resource="url?"/>
     284        <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:otherCitationDetails/gco:CharacterString -->
     285       
     286       
     287        <!-- "describes the quality of data." -->
     288        <dcat:dataQuality>
     289            <!-- rdfs:literal -->
     290        </dcat:dataQuality>
     291        <!-- xpath: gmd:dataQualityInfo/*/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString -->
     292       
     293       
     294        <!-- FIXME ?
     295            <void:dataDump></void:dataDump>-->
     296    </dcat:Dataset>
     297
     298
     299}}}
     300 * Series (dcat:CatalogRecord+dcat:Dataset+(void:linksets or void:inDataset or rdf:isPartOf ?))
     301{{{
     302        TODO
     303}}}
     304 * Service (dcat:CatalogRecord+(dcat:? or rdf:Description))
     305{{{
     306
     307    <!-- Service
     308        Create a simple rdf:Description. To be improved.
     309       
     310        xpath: //srv:SV_ServiceIdentification||//*[@gco:isoType='srv:SV_ServiceIdentification']
     311    -->
     312    <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
     313        <!-- Same as dcat:Dataset without dcat:* -->
     314    </rdf:Description>
     315}}}
     316 * Feature catalogue (dcat:CatalogRecord+dcat:Dataset or rdf:Description)
     317{{{
     318
     319    <!-- Feature Catalogue     
     320        Create a simple rdf:Description. To be improved.
     321       
     322       
     323        xpath: //gfc:FC_FeatureCatalogue
     324    -->
     325    <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid">
     326        <dc:title></dc:title>
     327    </rdf:Description>
     328}}}
     329 * Thesaurus (skos:ConceptScheme)
     330{{{
     331    <!-- ConceptScheme describes all thesaurus available in the catalogue
     332    * Resource identifier is a local identifier for local thesaurus or public URI if external
     333    -->
     334    <skos:ConceptScheme rdf:about="http://localhost:8080/geonetwork/thesaurus/external.theme.inspire-theme">
     335        <dc:title>GEMET - INSPIRE themes, version 1.0</dc:title>
     336        <dc:description>INSPIRE themes thesaurus for GeoNetwork opensource.</dc:description>
     337        <dc:creator>
     338            <foaf:Organization>
     339                <foaf:name>EEA</foaf:name>
     340            </foaf:Organization>
     341        </dc:creator>
     342        <dc:uri>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:uri>
     343        <dc:rights>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:rights>
     344        <dct:issued>2008-06-01</dct:issued>
     345        <dct:modified>2008-06-01</dct:modified>
     346    </skos:ConceptScheme>
     347   
     348}}}
     349 * Keyword (skos:Concept)
     350{{{
     351   
     352    <!-- Keywords -->
     353    <skos:Concept rdf:about="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId">
     354        <skos:inScheme rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/>
     355        <skos:prefLabel></skos:prefLabel>
     356    </skos:Concept>
     357   
     358}}}
     359 * Online resources (dcat:Distribution)
     360{{{
     361
     362    <!-- Distribution
     363        "Represents a specific available form of a dataset. Each dataset might be available in different
     364        forms, these forms might represent different formats of the dataset, different endpoints,...
     365        Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset,
     366        an RSS feed ..."
     367       
     368        Download, WebService, Feed
     369       
     370        xpath: //gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource
     371    -->
     372    <dcat:Distribution rdf:about="accessURL ?">
     373        <!--
     374            "points to the location of a distribution. This can be a direct download link, a link
     375            to an HTML page containing a link to the actual data, Feed, Web Service etc.
     376            the semantic is determined by its domain (Distribution, Feed, WebService, Download)."
     377        -->
     378        <dcat:accessURL></dcat:accessURL>
     379        <!-- xpath: gmd:linkage/gmd:URL -->
     380       
     381        <dct:title></dct:title>
     382        <!-- xpath: gmd:name/gco:CharacterString -->
     383       
     384        <!-- "The size of a distribution.":N/A
     385        <dcat:size></dcat:size>
     386        -->
     387       
     388       
     389        <dct:format>
     390            <!--
     391                "the file format of the distribution."
     392               
     393                "MIME type is used for values. A list of MIME types URLs can be found at IANA.
     394                However ESRI Shape files have no specific MIME type (A Shape distribution is actually
     395                a collection of files), currently this is still an open question?"
     396               
     397                In our case, Shapefile will be zipped !
     398               
     399                Mapping between protocol list and mime/type when needed
     400            -->
     401            <dct:IMT>
     402                <rdf:value>text/csv</rdf:value>
     403                <rdfs:label>CSV</rdfs:label>
     404            </dct:IMT>
     405        </dct:format>
     406        <!-- xpath: gmd:protocol/gco:CharacterString -->
     407       
     408    </dcat:Distribution>
     409   
     410}}}
     411
     412
     413=== Formats ===
     414 * RDF/XML is the output format for new services.
     415 * RDFa is used to add anotations to HTML pages.
     416 * Sitemap use XML file that uses the Semantic Crawling extension (See #81)
     417
     418=== Services ===
     419
     420New services:
     421 * Metadata service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.metadata.get?uuid=<uuid>
     422 * RDF search service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.search?
     423  * All !GeoNetwork search criteria can be used to extract a subset of the catalogue.
     424 * Sitemap: http://<server_host>:<server_port>/<catalogue>/srv/eng/portal.sitemap?type=rdf
     425 
     426Rewriting rules for simple URL:
     427 * http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf
     428 * http://<server_host>:<server_port>/<catalogue>/search/rdf?
     429
     430Conversion (to be improved):
     431 * <schema>/convert/dcat-rdf.xsl
     432
     433
     434=== Site map ===
     435A sitemap using the semantic crawling extension is added to existing XML sitemap.
     436
     437{{{
     438<?xml version="1.0" encoding="UTF-8"?>
     439<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
     440        xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd">
     441  <sc:dataset>
     442    <sc:datasetLabel>My GeoNetwork full content catalogue for Linked Data spiders (RDF)</sc:datasetLabel>
     443
     444For 5 latests update:
     445    <sc:sampleURI>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:sampleURI>
     446
     447
     448Link to a full dump using the search API
     449    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/search/rdf/</sc:dataDumpLocation>
     450or provide for all catalogue record a link using
     451    <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:dataDumpLocation>
     452    <changefreq>daily</changefreq>
     453  </sc:dataset>
     454</urlset>
     455}}}
     456
     457Sitemap will be accessible using existing sitemap service with type=rdf as parameter: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?type=rdf
     458
     459
     460In robots.txt, the following line is added:
     461{{{
     462Sitemap: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?type=rdf
     463}}}
     464
     465
     466== Risks ==
     467
     468== Participants ==
     469 * Francois Prunayre
     470