wiki:crawler

Metadata Crawler

This page describes a tool that we have wanted for a long time; a platform independent tool that automatically generates a metadata for geographic data.

Purpose: Geographic data requires good metadata in order to find it and to be able to judge its usability. The tool should make use of all information available embedded in the data and in the context it operates in to reduce manual editing as much as possible.

Description: A deamon process that automatically generates metadata for spatial data resources. The application is required to crawl a directory structure and extract data properties and other auxiliary metadata. It will write an ISO19115/19139 compliant metadata record. The application will be able to synchronize data, metadata and basic privileges with a catalog application. Eventually it will also be able to provide the required information to deploy !OGC web map services of different flavors (WMS, WCS, WFS) including e.g. symbology files (SLD).

Requirements:

  • scanning one or more folders recursively
  • generate a metadata XML in ISO19115-19139 compliant format
    • extract spatial properties from datasets found using GDAL/OGR
    • add contact information etc... to the metadata that is found in a related text file (incl. default privileges)
    • add information on data location
    • add information on other services
  • timestamp for synching
  • The application will create MEF files that can be harvested by GeoNetwork or imported into !gvSIG, ArcMap (using a plug in) or any other tool able to read the MEF format.
  • synchronize with catalog (GeoNetwork opensource) in two directions depending on time stamps
  • have a mechanism to define default metadata content that is used for properties that can not be extracted directly from the resources
  • flag folders with spatial data that lack the minimum required metadata
  • platform independent (preferably written in Python, using GDAL/OGR and/or other FOSS libraries)
  • running as a deamon
Last modified 14 years ago Last modified on May 29, 2007, 2:50:15 AM