GeoNetwork Data Catalog Vocabulary services
Date | 2012/05/10 |
Contact(s) | François Prunayre, Paul Hasenohr |
Last edited | |
Status | Motion passed - Done |
Assigned to release | 2.9 |
Resources | Available (funding EEA) |
Ticket # | #912 |
Github source | https://github.com/fxprunayre/core-geonetwork/tree/feature/dcat-rdf |
Overview
Data Catalog Vocabulary services in GeoNetwork opensource increase discoverability and enable applications easily to consume metadata. Those services provide information about 3 types of objects :
- the catalogue,
- the datasets and services in the catalogue
- and the link to distributed resources.
The description contains relation to thesaurus (eg. GEMET), keywords and organization. The document could be used in a linked data context.
The output format produced by the services are based on DCAT, an RDF vocabulary. Two types of services are created:
- Metadata service to access to one metadata record
- Search service to search the catalogue and retrieve a set of metadata
The Data Catalog Vocabulary services could be used by Semantic web tools to harvest, search (eg. using SPARQL) and link catalogue content with other interlinked resources.
A semantic portal sitemap is created in order to be able to harvest the catalogue.
Proposal Type
- Type: Discoverability
- App: GeoNetwork
- Module: Metadata and search services
Links
- Documents:
- Data Catalog Vocabulary (DCAT) http://www.w3.org/TR/vocab-dcat/#property--data-dictionary
- Vocabulary of interlinked Dataset (VoID) http://www.w3.org/TR/void/
- http://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
- Semantic Web Crawling: A Sitemap Extension http://sw.deri.org/2007/07/sitemapextension/
- http://geovocab.org/doc/neogeo.html
Voting history
Vote proposed by Francois on 2012/07/04, result was
- +1 from Jeroen, Simon, Francois
Proposal
RDF Model
RDF model is defined for ISO19139, ISO19110 and Dublin Core standards in order to cover most of the metadata of the catalogue. Model is based on DCAT which is "an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web".
Vocabularies:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://www.w3.org/TR/void/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:dctype="http://purl.org/dc/dcmitype/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" >
Classes:
- Catalogue (dcat:Catalog) is the local catalogue or any harvested nodes.
<!-- First, the local catalog description using dcat:Catalog. "Typically, a web-based data catalog is represented as a single instance of this class." ... also describe harvested catalogues if harvested records are in the current dump. --> <dcat:Catalog rdf:about="http://localhost:8080/geonetwork"> <!-- A name given to the catalog. --> <dct:title xml:lang="en">My GeoNetwork geospatial metadata catalogue</dct:title> <!-- free-text account of the catalog. --> <dct:description></dct:description> <rdf:label xml:lang="en">My GeoNetwork geospatial metadata catalogue</rdf:label> <!-- The homepage of the catalog --> <foaf:homepage>http://localhost:8080/geonetwork</foaf:homepage> <!-- FIXME : void:Dataset --> <void:openSearchDescription>http://localhost:8080/geonetwork/srv/eng/portal.opensearch</void:openSearchDescription> <void:uriLookupEndpoint>http://localhost:8080/geonetwork/search/rdf?any=</void:uriLookupEndpoint> <!-- The entity responsible for making the catalog online. --> <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/1"/> <!-- The knowledge organization system (KOS) used to classify catalog's datasets. --> <dcat:themes rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/> <!-- The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog. http://www.ietf.org/rfc/rfc3066.txt Multiple values can be used. The publisher might also choose to describe the language on the dataset level (see dataset language). --> <dct:language>en</dct:language> <!-- This describes the license under which the catalog can be used/reused and not the datasets. Even if the license of the catalog applies to all of its datasets it should be replicated on each dataset.--> <dct:license> <!-- TODO using VoID--> </dct:license> <!-- The geographical area covered by the catalog. --> <dct:Location> <!-- TODO --> </dct:Location> <!-- List all catalogue records --> <dcat:dataset rdf:resource="http://localhost:8080/geonetwork/dataset/1"/> <dcat:record rdf:resource="http://localhost:8080/geonetwork/metadata/1"/> </dcat:Catalog>
- Organization (foaf:Organization)
<!-- Organization description. Organization could be linked to a catalogue, a catalogue record. xpath: //gmd:organisationName --> <foaf:Organization rdf:about="http://localhost:8080/geonetwork/organization/1"> <foaf:name></foaf:name> <!-- xpath: gmd:organisationName/gco:CharacterString --> <foaf:member> <foaf:Person rdf:resource=""/> </foaf:member> </foaf:Organization> <!-- Organization memeber xpath: //gmd:CI_ResponsibleParty--> <foaf:Person rdf:about="http://localhost:8080/geonetwork/person/ID"> <foaf:name></foaf:name> <!-- xpath: gmd:individualName/gco:CharacterString --> <foaf:phone></foaf:phone> <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:CI_Telephone/gmd:voice/gco:CharacterString --> <foaf:mbox></foaf:mbox> <!-- xpath: gmd:contactInfo/gmd:CI_Contact/gmd:address/gmd:CI_Address/gmd:electronicMailAddress/gco:CharacterString --> </foaf:Person>
- Dataset (dcat:CatalogRecord+dcat:Dataset)
<!-- Catalogue records "A record in a data catalog, describing a single dataset." xpath: //gmd:MD_Metadata|//*[@gco:isoType='gmd:MD_Metadata'] --> <dcat:CatalogRecord> <!-- Link to a dcat:Dataset or a rdf:Description for services and feature catalogue. --> <foaf:primaryTopic rdf:resource="http://localhost:8080/geonetwork/metadata/uuid"/> <!-- Metadata change date. "The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats)." --> <dct:issued></dct:issued> <dct:modified></dct:modified> <!-- xpath: gmd:dateStamp/gco:DateTime --> </dcat:CatalogRecord> <!-- Dataset "A collection of data, published or curated by a single source, and available for access or download in one or more formats." xpath: //gmd:MD_DataIdentification|//*[@gco:isoType='gmd:MD_DataIdentification'] --> <dcat:Dataset rdf:about="http://localhost:8080/geonetwork/metadata/uuid"> <!-- "A unique identifier of the dataset." --> <dct:identifier></dct:identifier> <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:identifier/*/gmd:code --> <dc:title></dc:title> <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:title/gco:CharacterString --> <dc:abstract></dc:abstract> <!-- xpath: gmd:identificationInfo/*/gmd:abstract/gco:CharacterString --> <!-- "A keyword or tag describing the dataset." Create dcat:keyword if no thesaurus name information available. --> <dcat:keyword></dcat:keyword> <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords[not(gmd:thesaurusName)]/gmd:keyword/gco:CharaceterString --> <!-- "The main category of the dataset. A dataset can have multiple themes." Create dcat:theme if gmx:Anchor or GEMET concepts or INSPIRE themes --> <dcat:theme rdf:resource="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId"/> <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gmx:Anchor --> <!-- xpath: gmd:identificationInfo/*/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharaceterString --> <!-- xpath: gmd:identificationInfo/*/gmd:topicCategory/gmd:MD_TopicCategoryCode --> <!-- Thumbnail --> <foaf:thumbnail rdf:resource=""/> <!-- xpath: gmd:identificationInfo/*/gmd:graphicOverview/gmd:MD_BrowseGraphic/gmd:fileName/gco:CharacterString --> <!-- "Spatial coverage of the dataset." --> <dct:spatial> <ogc:Polygon xmlns:ogc="http://www.opengis.net/rdf#"> <ogc:asWKT rdf:datatype="http://www.opengis.net/rdf#WKTLiteral"> <http://www.opengis.net/def/crs/OGC/1.3/CRS84> Polygon((13.208233 50.71671, 13.208233 51.24864, 14.40099 51.24864, 14.40099 50.71671, 13.208233 50.71671)) </ogc:asWKT> </ogc:Polygon> </dct:spatial> <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:geographicElement/gmd:EX_GeographicBoundingBox --> <!-- "The temporal period that the dataset covers." --> <dct:temporal></dct:temporal> <!-- xpath: gmd:identificationInfo/*/gmd:extent/*/gmd:temporalElement --> <dct:issued></dct:issued> <dct:updated></dct:updated> <dct:modified></dct:modified> <!-- "An entity responsible for making the dataset available" --> <dct:publisher rdf:resource="http://localhost:8080/geonetwork/organization/contactId"/> <!-- xpath: gmd:identificationInfo/*/gmd:pointOfContact --> <!-- "The frequency with which dataset is published." See placetime.com intervals. --> <dct:accrualPeriodicity></dct:accrualPeriodicity> <!-- xpath: gmd:identificationInfo/*/gmd:resourceMaintenance/gmd:MD_MaintenanceInformation/gmd:maintenanceAndUpdateFrequency/gmd:MD_MaintenanceFrequencyCode/@codeListValue --> <!-- "This is usually geographical or temporal but can also be other dimension" ??? --> <dcat:granularity></dcat:granularity> <!-- xpath: gmd:identificationInfo/*/gmd:spatialResolution/gmd:MD_Resolution/gmd:equivalentScale/gmd:MD_RepresentativeFraction/gmd:denominator/gco:Integer --> <!-- "The language of the dataset." "This overrides the value of the catalog language in case of conflict" --> <dct:language></dct:language> <!-- xpath: gmd:identificationInfo/*/gmd:language/gmd:LanguageCode/@codeListValue --> <!-- "The license under which the dataset is published and can be reused." --> <dct:license></dct:license> <!-- xpath: gmd:identificationInfo/*/gmd:resourceConstraints/??? --> <dcat:Distribution rdf:resource=""/> <!-- xpath: gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource --> <!-- ISO19110 relation "This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data." --> <dcat:dataDictionary rdf:resource=""/> <!-- xpath: gmd:contentInfo/gmd:MD_FeatureCatalogueDescription/gmd:featureCatalogueCitation/@uuidref --> <!-- "A related document such as technical documentation, agency program page, citation, etc." --> <dct:reference rdf:resource="url?"/> <!-- xpath: gmd:identificationInfo/*/gmd:citation/*/gmd:otherCitationDetails/gco:CharacterString --> <!-- "describes the quality of data." --> <dcat:dataQuality> <!-- rdfs:literal --> </dcat:dataQuality> <!-- xpath: gmd:dataQualityInfo/*/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString --> <!-- FIXME ? <void:dataDump></void:dataDump>--> </dcat:Dataset>
- Series (dcat:CatalogRecord+dcat:Dataset+dc:relation)
- Service (dcat:CatalogRecord+rdf:Description+dc:relation)
<!-- Service Create a simple rdf:Description. To be improved. xpath: //srv:SV_ServiceIdentification||//*[@gco:isoType='srv:SV_ServiceIdentification'] --> <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid"> <!-- Same as dcat:Dataset without dcat:* --> </rdf:Description>
- Feature catalogue (rdf:Description+dc:relation)
<!-- Feature Catalogue Create a simple rdf:Description. To be improved. xpath: //gfc:FC_FeatureCatalogue --> <rdf:Description rdf:about="http://localhost:8080/geonetwork/metadata/uuid"> <dc:title></dc:title> </rdf:Description>
- Thesaurus (skos:ConceptScheme)
<!-- ConceptScheme describes all thesaurus available in the catalogue * Resource identifier is a local identifier for local thesaurus or public URI if external --> <skos:ConceptScheme rdf:about="http://localhost:8080/geonetwork/thesaurus/external.theme.inspire-theme"> <dc:title>GEMET - INSPIRE themes, version 1.0</dc:title> <dc:description>INSPIRE themes thesaurus for GeoNetwork opensource.</dc:description> <dc:creator> <foaf:Organization> <foaf:name>EEA</foaf:name> </foaf:Organization> </dc:creator> <dc:uri>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:uri> <dc:rights>http://www.eionet.europa.eu/gemet/about?langcode=en</dc:rights> <dct:issued>2008-06-01</dct:issued> <dct:modified>2008-06-01</dct:modified> </skos:ConceptScheme>
- Keyword (skos:Concept)
<!-- Keywords --> <skos:Concept rdf:about="http://localhost:8080/geonetwork/thesaurus/thesaurusId/subjectId"> <skos:inScheme rdf:resource="http://localhost:8080/geonetwork/thesaurus/id"/> <skos:prefLabel></skos:prefLabel> </skos:Concept>
- Online resources (dcat:Distribution)
<!-- Distribution "Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset, different endpoints,... Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, an RSS feed ..." Download, WebService, Feed xpath: //gmd:distributionInfo/*/gmd:transferOptions/*/gmd:onLine/gmd:CI_OnlineResource --> <dcat:Distribution rdf:about="accessURL ?"> <!-- "points to the location of a distribution. This can be a direct download link, a link to an HTML page containing a link to the actual data, Feed, Web Service etc. the semantic is determined by its domain (Distribution, Feed, WebService, Download)." --> <dcat:accessURL></dcat:accessURL> <!-- xpath: gmd:linkage/gmd:URL --> <dct:title></dct:title> <!-- xpath: gmd:name/gco:CharacterString --> <!-- "The size of a distribution.":N/A <dcat:size></dcat:size> --> <dct:format> <!-- "the file format of the distribution." "MIME type is used for values. A list of MIME types URLs can be found at IANA. However ESRI Shape files have no specific MIME type (A Shape distribution is actually a collection of files), currently this is still an open question?" In our case, Shapefile will be zipped ! Mapping between protocol list and mime/type when needed --> <dct:IMT> <rdf:value>text/csv</rdf:value> <rdfs:label>CSV</rdfs:label> </dct:IMT> </dct:format> <!-- xpath: gmd:protocol/gco:CharacterString --> </dcat:Distribution>
Formats
- RDF/XML is the output format for new services.
- RDFa is used to add anotations to HTML pages.
- Sitemap use XML file that uses the Semantic Crawling extension (See #81)
Services
New services:
- Metadata service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.metadata.get?uuid=<uuid>
- RDF search service: http://<server_host>:<server_port>/<catalogue>/srv/eng/rdf.search?
- All GeoNetwork search criteria can be used to extract a subset of the catalogue.
- Sitemap: http://<server_host>:<server_port>/<catalogue>/srv/eng/portal.sitemap?format=rdf
Rewriting rules for simple URL:
- http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf
- http://<server_host>:<server_port>/<catalogue>/search/rdf?
Conversion:
- <schema>/convert/rdf.xsl
Schema supported:
- ISO19139
- ISO19110
- dublin-core
Site map
A sitemap using the semantic crawling extension is added to existing XML sitemap.
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:sc="http://sw.deri.org/2007/07/sitemapextension/scschema.xsd"> <sc:dataset> <sc:datasetLabel>My GeoNetwork full content catalogue for Linked Data spiders (RDF)</sc:datasetLabel> For 5 latests update: <sc:sampleURI>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:sampleURI> Link to a full dump using the search API <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/search/rdf/</sc:dataDumpLocation> or provide for all catalogue record a link using <sc:dataDumpLocation>http://<server_host>:<server_port>/<catalogue>/metadata/<uuid>.rdf</sc:dataDumpLocation> <changefreq>daily</changefreq> </sc:dataset> </urlset>
Sitemap will be accessible using existing sitemap service with format=rdf as parameter: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?format=rdf
In robots.txt, the following line is added:
Sitemap: http://<server_host>:<server_port>/geonetwork/srv/eng/portal.sitemap?format=rdf
Using RDF outputs
Save metadata as xml
A save as RDF is added to the metadata menu:
Visualization tools
Running a search on the catalogue using the rdf.search service will provide a full or partial view of the catalogue which could be analyzed in visualization tools.
SPARQL queries
Once loaded in a SPARQL endpoint, the catalogue content could be queried using SPARQL:
- Get metadata titles
sparql select ?title where {?s <http://purl.org/dc/elements/1.1/title> ?title};
- Get metadata about transport network
sparql PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?title, ?label WHERE { ?x dc:title ?title . ?x dcat:theme ?theme . ?theme skos:prefLabel ?label FILTER(?label = "Transport network") };
Risks
Future improvement
This proposal does not cover the following items which could be addressed in future works:
- multilingual RDF output for multilingual metadata records
Participants
- Francois Prunayre
Attachments (6)
- rdf-visual-ex.png (106.3 KB ) - added by 13 years ago.
- rdf-visual-ex-by-foaf-organization.png (54.9 KB ) - added by 13 years ago.
- rdf-visual-ex-by-inspire-themes.png (22.4 KB ) - added by 13 years ago.
-
23a9d577-f875-4acf-8634-a77241a71176.rdf
(11.0 KB
) - added by 13 years ago.
Sample RDF file
- rdf-visual-metadata.png (16.6 KB ) - added by 13 years ago.
- rdf-save-as-action.png (24.4 KB ) - added by 12 years ago.
Download all attachments as: .zip