crawler – GeoNetwork opensource Developer website

Context Navigation

Version 2 (modified by ticheler, 18 years ago) ( diff )
--

Metadata Crawler

This page describes a tool that we have wanted for a long time; a platform independent tool that automatically generates a metadata for geographic data.

Purpose: Geographic data requires good metadata in order to find it and to be able to judge its usability. The tool should make use of all information available embedded in the data and in the context it operates in to reduce manual editing as much as possible.

Description: A deamon process that automatically generates metadata for spatial data resources. The application is required to crawl a directory structure and extract data properties and other auxiliary metadata. It will write an ISO19115/19139 compliant metadata record. The application will be able to synchronize with a catalog application to retrieve metadata that was updated through the web interface (for example) in GeoNetwork opensource.

Requirements:

scanning one or more folders recursively
generate a metadata XML in ISO19115-19139 compliant format
- extract spatial properties from datasets found using GDAL/OGR
- add contact information etc... to the metadata that is found in a related text file (incl. default privileges)
- add information on data location
- add information on other services
timestamp for synching
The application will create MEF files that can be harvested by GeoNetwork or imported into !gvSIG, ArcMap (using a plug in) or any other tool able to read the MEF format.
synchronize with catalog (GeoNetwork opensource) in two directions depending on time stamps
have a mechanism to define default metadata content that is used for properties that can not be extracted directly from the resources
flag folders with spatial data that lack the minimum required metadata
platform independent (preferably written in Python, using GDAL/OGR and/or other FOSS libraries)
running as a deamon

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text