wiki:proposals/InspireDownloadServiceAsAtom

Add Support for Inspire OpenSearch (Atom)

Date 2012/09/18
Contact(s) Paul van Genuchten
Last edited 2013/06/06
Status OnGoing
Assigned to release 2.12
Resources
Ticket #
Github source

Proposal

Generate Atom

Inspire requires either WFS service or Atom xml with link to the download for downloading datasets. More info can be read in technical guidence for download services doc 3.0 http://inspire.jrc.ec.europa.eu/documents/Network_Services/Technical_Guidance_Download_Services_3.0.pdf and/or http://wiki.geonovum.nl/index.php/Download_Service_via_Atom_feed (in dutch).

These days data owners are creating Atom-documents for downloads they provide and link to the Atom document from an iso19115 onlineresource section. The Atom document contains roughly the same information as the actual Inspire iso19115 record in GN. So seems quite usefull to generate this feed document as an output format for Geonetwork (most of this is already implemented in current opensearch gn-implementation). However we've also got some requests from governments that from a legal point of view geonetwork can not create these documents, but should link to and include their content from the original location in the lucene index. The Atom specification is part of the "Inspire Download services", where the geonetwork catalogue is part of the "Inspire Discovery service". In a country responsibility for either one of these services can be delegated to different legal entities.

So I propose to make a setting in a config-override to support both use cases

Shared functionality for the implementations will have these features:

  • Atom search queries the standard lucene index (by configuration limited to only records complying to the Inspire standard) and presents the results in an atom document, from this document the individual Atom documents can be accessed.
  • If an iso19119 metadata record identifier is provided in the url, the search will be limited to this document plus all the related datasets to this document (the download service)
  • For each iso19119 record an OpenSearch Description document should be available listing all the dataset-identifiers available in the Atom feed.
  • "Describe Spatial Data Set"-operation will provide a single Atom document for a dataset (inputs are identifier, language)
  • "Get Spatial Data Set"-operation will provide an attached spatial datafile (inputs are identifier, language, crs)

How to extend what we have now

In http://trac.osgeo.org/geonetwork/ticket/333 some work has been implemented to introduce OpenSearch. This proposal adds some extra fields (and functionality) to the existing implementation (and/or) is implemented as a series of overrides to not make the current implementation to complex.

/geonetwork/srv/dut/portal.opensearch

This url opens the OpenSearch Description document. Some extra fields should be added. The filter with a iso19119 uuid should be implemented, if such a filter is provided a list of all dataset identifiers in this service should be displayed.

<OpenSearchDescription>
        <ShortName>[AtomServiceFeed:feed.title]</ShortName>
        <Description>[AtomServiceFeed:feed.title]</Description>
        <!--URL of this document-->
        <Url type="application/OpenSearchDescription+xml" rel="self" template="http://nationaalgeoregister.nl/opensearch/[ServiceMetadata:fileIdentifier]/OpenSearchDescription.xml"/>
        <!--Generic URL template for browser integration-->
        <Url type="application/atom+xml" rel="results" template="http://nationaalgeoregister.nl/opensearch/[ServiceMetadata:fileIdentifier]/search?q={searchTerms}"/>

<!-- repeat for each Atom dataset feed -->
<!--Describe Spatial Data Set Operation request URL template to be used
        in order to retrieve the Description of Spatial Object Types in a Spatial
        Dataset-->
        <Url type="application/atom+xml" rel="describedby" template="http://nationaalgeoregister.nl/opensearch/[ServiceMetadata:fileIdentifier]/search?spatial_dataset_identifier_code={inspire_dls:spatial_dataset_identifier_code?}&amp;spatial_dataset_identifier_namespace={inspire_dls:spatial_dataset_identifier_namespace?}&amp;language={language?}&amp;q={searchTerms?}"/>

<!-- repeat step for each attached dataset -->
        <!--Get Spatial Data Set Operation request URL template to be used in
        order to retrieve a Spatial Dataset-->
        <!-- For download of GML files, use this template. -->
        <Url type="application/gml+xml;version=3.2" rel="results" template="http://nationaalgeoregister.nl/opensearch/[ServiceMetadata:fileIdentifier]/search?spatial_dataset_identifier_code={inspire_dls:spatial_dataset_identifier_code?}&amp;spatial_dataset_identifier_namespace={inspire_dls:spatial_dataset_identifier_namespace?}&amp;crs={inspire_dls:crs?}&amp;language={language?}&amp;q={searchTerms?}"/>
        <!-- format differentiation. If multiple formats are supported (for the same CRS), return an Atom feed containing multiple links. In that case use results="application/atom+xml" for multiple downloadable files. -->
        <Url type="application/atom+xml" rel="results" template="http://nationaalgeoregister.nl/opensearch/[ServiceMetadata:fileIdentifier]/search?spatial_dataset_identifier_code={inspire_dls:spatial_dataset_identifier_code?}&amp;spatial_dataset_identifier_namespace={inspire_dls:spatial_dataset_identifier_namespace?}&amp;crs={inspire_dls:crs?}&amp;language={language?}&amp;q={searchTerms?}"/>
<!-- Voor elke Service Feed de contactgegevens -->
        <Contact>[AtomServiceFeed:author.name]</Contact>
        <Tags>[ServiceMetadata.Keywords]</Tags>
        <LongName>[AtomServiceFeed:feed.subtitle]</LongName>

<!-- repeat for each dataset dataset and crs  -->
	<Query role="example" inspire_dls:spatial_dataset_identifier_namespace="[AtomServiceFeed:feed.entry.inspire_dls:spatial_dataset_identifier_namespace]" inspire_dls:spatial_dataset_identifier_code="[AtomServiceFeed:feed.entry.inspire_dls:spatial_dataset_identifier_code]" inspire_dls:crs="[AtomDatasetFeed:feed.entry.category@term]" language="[AtomDatasetFeed:feed.entry.link[rel=”alternate”]@xml:lang]" title="[AtomDatasetFeed:feed.entry.title]" count="1"/>

<!-- per Atom Service feed / Service metadata record combination: -->
        <Developer>[AtomServiceFeed:author.name]</Developer>
        <!--Languages supported by the service. The first language is the default language-->
        <Language>[AtomServiceFeed:feed.title@xml:lang]</Language>

/geonetwork/srv/dut/rss.search?any=

Queries the index and shows results. Some extra fields should be implemented. The link should not reference the iso19115 record in GN but an Atom document descrbing the dataset. The url for this could look like:

/geonetwork/srv/eng/rss.detail?uuid={uuid}&lang={lang}

This could also become an implementation of the "Describe Spatial Data Set"-operation, however note that this operation uses dataset-identifier/namespace and not metadata identifier

GN will return a document like:

<feed xmlns="http://www.w3.org/2005/Atom"
    xmlns:georss="http://www.georss.org/georss"
    xmlns:inspire_dls="http://inspire.ec.europa.eu/schemas/inspire_dls/1.0">
    <!-- feed title -->
    <title xml:lang="nl">Demonstratie INSPIRE Download Service 3.0, ATOM - Service Feed</title>
    <!-- feed subtitle -->
    <subtitle xml:lang="nl">INSPIRE Download Service van Geonovum als demonstratie van een Download Service met voorgedefinieerde datasets voor Geografische namen en Administratieve eenheden</subtitle>
    <!-- links to metadata and alternative representations -->
    <link href="http://s01.geonovum.site4u.nl/download/metadata_atom_servicefeed.xml" rel="describedby" type="application/vnd.iso.19139+xml"/>
    <link href="http://s01.geonovum.site4u.nl/download/downloadservice_atom_servicefeed.xml" rel="self" type="application/atom+xml"
        hreflang="nl" title="Demonstratie INSPIRE Download Service 3.0 - Service Feed"/>
    <link rel="search" href="http://s01.geonovum.site4u.nl/download/opensearch_description.xml" type="application/opensearchdescription+xml" title="Open Search Description voor Demonstratie INSPIRE Download Service 3.0, ATOM - Service Feed"/>
    <!-- identifier -->
    <id>http://s01.geonovum.site4u.nl/download/downloadservice_atom_servicefeed.xml</id>
    <!-- rights, access restrictions -->
    <rights>geen toegangsbeperkingen</rights>
    <!-- date/time of last update of feed -->
    <updated>2012-06-18T15:35:06Z</updated>
    <!-- author info -->
    <author>
        <name>Geonovum</name>
        <email>jj@jj.org</email>
    </author>
    <entry>
        <!-- title for pre-defined dataset -->
        <title xml:lang="nl">Geografische namen (DEMO) NamedPlaces - Parent Feed (CRS)</title>

		<!-- Spatial Dataset Unique Resourse Identifier voor de dataset -->
		<inspire_dls:spatial_dataset_identifier_code>06b6c650-cdb1-11dd-ad8b-0800200c9a79</inspire_dls:spatial_dataset_identifier_code>
		<!-- Geonovum: de namespace voor de code, van toepassing op de dataset. Nadere invulling hiervan volgt nog. -->
		<inspire_dls:spatial_dataset_identifier_namespace>http://s01.geonovum.site4u.nl/download</inspire_dls:spatial_dataset_identifier_namespace>
        <link href="http://nationaalgeoregister.nl/geonetwork/srv/nl/iso19139.xml?uuid=81ff84ec-42a4-4481-840b-12713bbb5d38" rel="describedby" type="application/xml"/>
        <!-- Link naar Dataset feed -->
        <link href="http://s01.geonovum.site4u.nl/download/downloadservice_atom_dataset1.xml" rel="alternate" type="application/atom+xml"
            hreflang="nl" title="Geografische namen (DEMO) - Download Service voorgedefinieerde dataset"/>            
        <id>http://s01.geonovum.site4u.nl/download/downloadservice_atom_dataset1.xml</id>
	    <updated>2012-06-18T15:35:04Z</updated>
        <!-- Optioneel: een samenvatting / omschrijving  -->
		<summary>Download the dataset Geografische namen (DEMO) NamedPlaces, via this feed</summary>
        <!-- The service feed contains the boundingbox, in polygon format -->
        <georss:polygon>50.7539 3.37087 50.7539 7.21097 53.4658 7.21097 53.4658 3.37087 50.7539 3.37087</georss:polygon>
        <!-- For each entry provide CRS information -->
        <category term="http://www.opengis.net/def/crs/EPSG:4258"
            label="ETRS89"/>
        <category term="http://www.opengis.net/def/crs/EPSG:4326"
            label="WGS84"/>
    </entry>
</feed>

Implementation with harvested ATOM will require additional functionality

Collect ATOM

In the situation that the data provider provides its own Atom document, geonetwork should not link to the Atom document generated by the catalogue, but to the document provided by the data provider. To be able to include the ATOM contents in the Lucene index, we'll need to harvest the Atom document on regular intervals. Similar to a WMS-capabilities harvest. An Atom harvest would be able to collect the contents of the Atom feed and include it as a field in the metadata table, to be able to add it to the lucene index.

Harvest ATOM

A usecase to consider is that we also create an ATOM harvester which will be able to harvest iso19115 and iso19119 metadata from Atom feeds. Comment by Simon: you could even harvest a WFS-service and package it with a geonetwork-generated Atom document.

Validate Atom

Before being able to collect or harvest Atom Feeds I guess we'll need Atom XSD in GN

Display Atom Contents

The atom link can be referred to from the Inspire iso19115 and Inspire iso19119 records in the catalogue, we might add a suggestion button here to be able to auto-add the geonetwork link here, or add a link to your local server

<srv:connectPoint><gmd:CI_OnlineResource><gmd:linkage><gmd:URL>/geonetwork/srv/eng/rss.detail?uuid=a3d33-...</gmd:URL></gmd:linkage></gmd:CI_OnlineResource></srv:connectPoint>

An example reord can be viewed at: http://www.nationaalgeoregister.nl/geonetwork/srv/nl/iso19139.xml?id=448130

Also if GN finds such an atom feed url in the gmd:url field, the metadata record-view could get the feed contents and return the linked datasets inside the Atom document and present them as hyperlinks

Link to Inspire thesaurus

A reference should be made from the ATOM feed to a SKOS/RDF thesaurus on the JRC website (http://inspire-registry.jrc.ec.europa.eu/registers/FCD). This thesaurus has a format currently not supported by geonetwork (each term is in a separate web location, the central document only has a list of links/identifiers). We might be able to support the format if with an upgrade of Sesame. Else we should transform the thesaurus to a readable format. A user should at least include a single keyword from this thesaurus in each record, that dhouls have an Aton document generated by geonetwork. Most probably in a new version of the discovery service specification a link to this thesaurus will be required anyway.

Other challenges when generating Inpire compliant Atom documents

  • The Atom feed should have some indication of filesize of the download, we might be able to find this info with a java function (if the file resides on the geonetwork server). This kind of info can also be filled in iso19115 ( transfersize), but it seems a total of all files attached to the record.
  • multilingual support, how to register the language of the external resource (proposal: gmd:online@xlink:role)
  • projection (crs) of the download, geonetwork doesn't have "epsg:xxxx" in rs_identifier, and crs seems to be registered for all gmd_online

Proposal Type

  • Type: Inspire download service improvement
  • App: GeoNetwork
  • Module: Inspire

Voting history

Vote proposed by Paul on 2013/june/6, result was

  • ?

Risks

  • ?

Participants

  • Paul van Genuchten
  • Steven Smolders
  • Heikki Doeleman
  • Jose Garcia
  • Thijs Brentjens / Ine de Visser
Last modified 11 years ago Last modified on Jun 7, 2013, 1:18:50 AM
Note: See TracWiki for help on using the wiki.