Changes between Version 8 and Version 9 of PerformanceEnhancements


Ignore:
Timestamp:
Mar 5, 2010, 7:48:23 AM (14 years ago)
Author:
simonp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PerformanceEnhancements

    v8 v9  
    4343      * multiple threads to schedule documents that need to be written to the index without blocking
    4444   
    45  * Speeding up spatial indexing using !PostGIS: !GeoNetwork uses a shapefile to hold spatial extents for searches that contain spatial queries eg. touch, intersect, contains, overlap etc. At present only the CSW service uses the spatial index for these queries, the web search interface uses boxes and !Lucene. The spatial index needs to be maintained when records are added and deleted through import, harvesting, massive delete etc. Unfortunately the shapefile is not efficient for this purpose as the number of records in the catalog goes over 40,000 odd. In particular as the mechanism for deleting extents from the shapefile uses an attribute of the extent and these are not indexable meaning that there is a considerable cost for maintenance operations on the shapefile. To support fast maintenance and search of the spatial index for larger catalogs, it was decided to adopt the !PostGIS implementation for the spatial index written for the geocat.ch sandbox by Jessie Eichar and to fall back to using a shapefile when the catalog is not using !PostGIS for its database. An option has been added to GAST to allow the user to specify PostGIS as the database and to build the spatial index table when the Database->Setup option is used. When GeoTools 2.6.x is adopted, we will very likely to also allow the spatial index in Oracle.
     45 * Speeding up spatial indexing using !PostGIS: !GeoNetwork uses a shapefile to hold spatial extents for searches that contain spatial queries eg. touch, intersect, contains, overlap etc. At present only the CSW service uses the spatial index for these queries, the web search interface uses boxes and !Lucene. The spatial index needs to be maintained when records are added and deleted through import, harvesting, massive delete etc. Unfortunately the shapefile is not efficient for this purpose as the number of records in the catalog goes over 40,000 odd. In particular as the mechanism for deleting extents from the shapefile uses an attribute of the extent and these are not indexable meaning that there is a considerable cost for maintenance operations on the shapefile. To support fast maintenance and search of the spatial index for larger catalogs, it was decided to adopt the !PostGIS implementation for the spatial index written for the geocat.ch sandbox by Jessie Eichar and to fall back to using a shapefile when the catalog is not using !PostGIS for its database. An option has been added to GAST to allow the user to specify !PostGIS as the database and to build the spatial index table when the Database->Setup option is used. When !GeoTools 2.6.x is adopted, we will very likely also allow the spatial index in !Oracle for those who must use that.
    4646
    47 The net result of these two fixes is much faster load, harvest, reindex and massive operations in GeoNetwork. For example, in one case doing a file system harvest of 20,000 records was taking 10-12 hours without these modifications. With these modifications, the harvest now takes approx 30 minutes. 
     47The net result of these two fixes is much faster load, harvest, reindex and massive operations in !GeoNetwork. For example, in one case doing a file system harvest of 20,000 records was taking 10-12 hours without these modifications. With the modifications described in this proposal, the same harvest now takes approx 30 minutes. 
    4848
    4949=== Backwards Compatibility Issues ===
     50
     51 * Single (or a few records?) transaction in CSW needs to be examined to make sure its not slower
    5052
    5153== Risks ==
     
    5355== Participants ==
    5456 * Doug Nebert, Archie Warnock and team - testing and reporting
    55  * Timo Proescholdt (provided some timing analysis for the search problem)
    56  * geocat.ch developers
     57 * Craig Jones and eMII team - testing and reporting
     58 * Timo Proescholdt - provided some timing analysis for the search problem
     59 * Jose Garcia - provided some timing of Lucene Index Writer speed ups, feedback and discussion
     60 * geocat.ch developers provided changes to spatial index code necessary to support PostGIS
    5761