| 1 | = Remote & Distributed Search = |
| 2 | |
| 3 | || '''Date''' || 2009/12/20 || |
| 4 | || '''Contact(s)''' || A. Warnock (A/WWW Enterprises), D. Nebert (USGS/FGDC), L. Miao (GMU), Z. Li (GMU), H. Wu (GMU) || |
| 5 | || '''Last edited''' || [[Timestamp]] || |
| 6 | || '''Status''' || Proposed for voting || |
| 7 | || '''Assigned to release''' || 2.5 || |
| 8 | || '''Resources''' || No additional resources needed || |
| 9 | |
| 10 | == Overview == |
| 11 | Some collections may be too large or too dynamic to harvest directly while still being desirable to |
| 12 | search within the context of the general search capability. This proposal will add the ability to |
| 13 | refer some search requests to remote collections and return results along with results from local collections. |
| 14 | |
| 15 | === Proposal Type === |
| 16 | * '''Type''': Core Change |
| 17 | * '''App''': !GeoNetwork |
| 18 | * '''Module''': Search Interface |
| 19 | |
| 20 | === Links === |
| 21 | * '''Documents''': |
| 22 | * '''Email discussions''': See thread on ''Proposal: Local harvesting and remote search'' on geonetwork-devel list |
| 23 | * '''Other wiki discussions''': |
| 24 | |
| 25 | === Voting History === |
| 26 | * Vote proposed by A. Warnock on 2009-12-20. |
| 27 | |
| 28 | ---- |
| 29 | |
| 30 | == Motivations == |
| 31 | The current configuration implements search only on locally-held collections. In order to provide |
| 32 | full functionality as either a geospatial portal or clearinghouse, it would be desirable for !GeoNetwork |
| 33 | to have the ability to search remote collections (at least via CSW or Z39.50) without harvesting them |
| 34 | locally. |
| 35 | |
| 36 | == Proposal == |
| 37 | A number of metadata sources in GEOSS are either too large |
| 38 | (250,000 - 1 million records or more) or too dynamic (several |
| 39 | updates/hour, perhaps related to emergency conditions) to harvest and |
| 40 | hold locally. In these cases, we anticipate that searching through the |
| 41 | clearinghouse instance should be performed as a distributed search |
| 42 | against the original collection, rather than against a locally-held |
| 43 | harvested collection, and further, that this process should be |
| 44 | transparent to the end-user. That is, while such collections may be |
| 45 | presented to the end user as an optional source to be searched, they |
| 46 | should not be expected to know which collections are held in the |
| 47 | clearinghouse and which are searched remotely, nor should the end user |
| 48 | be directed away from the clearinghouse site to search these remote |
| 49 | sites separately. |
| 50 | |
| 51 | We are fully cognizant of the network latencies involved in such a |
| 52 | scenario, having had direct experience with it in the FGDC Clearinghouse |
| 53 | network in years past. Nonetheless, support for distributed, remote |
| 54 | searching is seen to be unavoidable within the GEOSS framework. The |
| 55 | basic client functions for doing distributed, remote search is already |
| 56 | in GeoNetwork - we propose to implement it as part of the search |
| 57 | interface, at least through the CSW API. Note that, in GEOSS anyway, |
| 58 | clearinghouse and portal functions are separate - portals provide the |
| 59 | user interface, clearinghouses provide the programmatic search interface |
| 60 | to the portals via CSW. |
| 61 | |
| 62 | === Backwards Compatibility Issues === |
| 63 | None anticipated. |
| 64 | |
| 65 | == Risks == |
| 66 | None forseen. Development will take place on a separate branch so that testing can take place before merging into the trunk. |
| 67 | |
| 68 | == Participants == |
| 69 | * A. Warnock (A/WWW Enterprises) |
| 70 | * D. Nebert (USGS/FGDC) |
| 71 | * L. Miao (GMU) |
| 72 | * Z. Li (GMU) |
| 73 | * H. Wu (GMU) |