wiki:RemoteSearch

Remote & Distributed Search

Date 2009/12/20
Contact(s) A. Warnock (A/WWW Enterprises), D. Nebert (USGS/FGDC), L. Miao (GMU), Z. Li (GMU), H. Wu (GMU)
Last edited Timestamp
Status Proposed for voting
Assigned to release 2.5
Resources No additional resources needed

Overview

Some collections may be too large or too dynamic to harvest directly while still being desirable to search within the context of the general search capability. This proposal will add the ability to refer some search requests to remote collections and return results along with results from local collections.

Proposal Type

  • Type: Core Change
  • App: GeoNetwork
  • Module: Search Interface
  • Documents:
  • Email discussions: See thread on Proposal: Local harvesting and remote search on geonetwork-devel list
  • Other wiki discussions:

Voting History

  • Vote proposed by A. Warnock on 2009-12-20.

Motivations

The current configuration implements search only on locally-held collections. In order to provide full functionality as either a geospatial portal or clearinghouse, it would be desirable for GeoNetwork to have the ability to search remote collections (at least via CSW or Z39.50) without harvesting them locally.

Proposal

A number of metadata sources in GEOSS are either too large (250,000 - 1 million records or more) or too dynamic (several updates/hour, perhaps related to emergency conditions) to harvest and hold locally. In these cases, we anticipate that searching through the clearinghouse instance should be performed as a distributed search against the original collection, rather than against a locally-held harvested collection, and further, that this process should be transparent to the end-user. That is, while such collections may be presented to the end user as an optional source to be searched, they should not be expected to know which collections are held in the clearinghouse and which are searched remotely, nor should the end user be directed away from the clearinghouse site to search these remote sites separately.

We are fully cognizant of the network latencies involved in such a scenario, having had direct experience with it in the FGDC Clearinghouse network in years past. Nonetheless, support for distributed, remote searching is seen to be unavoidable within the GEOSS framework. The basic client functions for doing distributed, remote search is already in GeoNetwork - we propose to implement it as part of the search interface, at least through the CSW API. Note that, in GEOSS anyway, clearinghouse and portal functions are separate - portals provide the user interface, clearinghouses provide the programmatic search interface to the portals via CSW.

Backwards Compatibility Issues

None anticipated.

Risks

None forseen. Development will take place on a separate branch so that testing can take place before merging into the trunk.

Participants

  • A. Warnock (A/WWW Enterprises)
  • D. Nebert (USGS/FGDC)
  • L. Miao (GMU)
  • Z. Li (GMU)
  • H. Wu (GMU)
Last modified 14 years ago Last modified on Dec 21, 2009, 10:00:22 AM
Note: See TracWiki for help on using the wiki.