wiki:crawler

Version 1 (modified by ticheler, 17 years ago) ( diff )

--

Metadata Crawler

This page describes a tool that we have wanted for a long time; a platform independent tool that automatically generates a metadata for geographic data.

Purpose: Geographic data requires good metadata in order to find it and to be able to judge its usability. The tool should make use of all information available embedded in the data and in the context it operates in to reduce manual editing as much as possible.

Description: A deamon process that automatically generates metadata for spatial data resources. The application is required to crawl a directory structure and extract data properties and other auxiliary metadata. It will write an ISO19115/19139 compliant metadata record. The application will be able to synchronize with a catalog application to retrieve metadata that was updated through the web interface (for example) in GeoNetwork opensource.

Requirements:

  • scanning one or more folders recursively
  • generate a metadata XML
    • extract spatial properties from datasets found using GDAL/OGR
    • add some minimal contact info etc... to the metadata that is found in a related text file (incl. default privileges)
    • add information on data location
    • add information on other services
  • timestamp for synching
  • synch metadata with metadata catalog (GeoNetwork opensource) in two directions depending on time stamps
  • have a default contact information document that is used when no text file is available in the data folder
  • flag folders with spatial data that lack the minimum required metadata
  • platform independent
  • running as a deamon
Note: See TracWiki for help on using the wiki.