Changes between Version 22 and Version 23 of PerformanceEnhancements


Ignore:
Timestamp:
Mar 13, 2010, 2:37:14 AM (14 years ago)
Author:
simonp
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PerformanceEnhancements

    v22 v23  
    4040      * code that intends to write a number of documents to the !Lucene Index to keep the !IndexWriter open, and thus take advantage of the more sophisticated !IndexWriter implementation
    4141      * multiple threads to schedule documents that need to be written to the index without blocking
    42       * reduce the number of times !GeoNetwork attempts to optimize the Lucene Index for search speed: optimizing the search index for maximum search speed is a very costly operation, particularly when the index has 10s of thousands of documents in it. The !GeoNetwork approach is to optimize after a certain number of operations or a set period of time has passed (see lazyOptimize method in the new !IndexWriter implementation). The advice from the Lucene web page is to optimize when no more documents are to be added to the index for a while. Operations that expect to write many documents to the Index will now only call optimize when they complete. The default number of operations and timeout period has also been increased so that optimizes occur less frequently (this is line with up to date advice from the Lucene IndexWriter javadoc).
     42      * reduce the number of times !GeoNetwork attempts to optimize the Lucene Index for search speed: optimizing the search index for maximum search speed is a very costly operation, particularly when the index has 10s of thousands of documents in it. The !GeoNetwork approach is to optimize after a certain number of operations or a set period of time has passed (see lazyOptimize method in the new !IndexWriter implementation). The advice from the Lucene web page is to optimize when no more documents are to be added to the index for a while. Operations that expect to write many documents to the Index will now only call optimize when they complete. The default number of operations and timeout period has also been increased so that optimizes occur less frequently (this is line with up to date advice from the Lucene !IndexWriter javadoc).
    4343   
    4444 * Speeding up spatial indexing using PostGIS: !GeoNetwork uses a shapefile to hold spatial extents for searches that contain spatial queries eg. touch, intersect, contains, overlap etc. At present only the CSW service uses the spatial index for these queries, the web search interface uses boxes and ranges in Lucene. The spatial index needs to be maintained when records are added and deleted through import, harvesting, massive delete etc. Unfortunately the shapefile is not efficient for this purpose as the number of records in the catalog goes over 40,000 odd. In particular as the mechanism for deleting extents from the shapefile uses an attribute of the extent and these are not indexable. This means that there is a considerable cost for maintenance operations on the shapefile. To support fast maintenance and search of the spatial index for larger catalogs, it was decided to adopt the PostGIS implementation for the spatial index written for the geocat.ch sandbox by Jessie Eichar and to fall back to using a shapefile when the catalog is not using PostGIS for its database. An option has been added to GAST to allow the user to specify PostGIS as the database (the spatial index table will be built by the create-db-postgis.sql script when the Database->Setup option is used). When !GeoTools 2.6.x is adopted, we will very likely be able to also allow the spatial index in Oracle for those who must use that.