Changes between Initial Version and Version 1 of rfc26_blockcache


Ignore:
Timestamp:
Dec 6, 2009, 2:10:05 AM (14 years ago)
Author:
tamas
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • rfc26_blockcache

    v1 v1  
     1= RFC 26: GDAL Block Cache Improvements =
     2
     3Author: Tamas Szekeres[[BR]]
     4Contact: szekerest@gmail.com[[BR]]
     5Status: Proposed
     6
     7== Summary ==
     8
     9GDAL maintains an in-memory cache for the raster blocks fetched from the drivers and ensures that the second attempt
     10to access the same block will be served from the cache instead of the driver. This cache is maintained in a per-band fashion
     11and an array is allocated for the pointers for each blocks (or sub-blocks). This approach is not sufficient with large
     12raster dimensions (or virtually large rasters ie. with the WMS/TMS driver) which may cause out of memory errors in
     13GDALRasterBand::InitBlockInfo.
     14The purpose of this RFC would be to support switching between an array based and a new hashtable based block cache approach
     15depending on the raster size, which would optimize between the performance and the memory utilization in all cases.
     16
     17This RFC would also address the multithreading issues discovered in #3324 and #3325.
     18
     19
     20== Main concepts ==
     21
     22The infrastructure of the hashtable has already been established in cpl_hash_set.h / cpl_hash_set.cpp however this
     23implementation is not thread safe by default. With regards to the thread-safety we should at least consider the following
     24issues:
     25
     261. In gdalrasterblock.cpp a static linked list is maintained so as to track the access order of the blocks and keep the
     27size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the
     28datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band may also trigger a
     29GDALRasterBand::FlushBlock call on an other band within the scope of this mutex.
     30GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the
     31array or the hashtable.
     32
     332. Destroying the datasets/bands in many environments may also happen withing the scope of a different thread
     34(ie. by the garbage collector thread) which may also trigger a GDALRasterBand::FlushBlock or eventually the band level
     35block cache is also destroyed.
     36
     37
     38Protecting the band level cache from the access of multiple threads may happen by using a per-band a mutex of the
     39raster band, however using this second mutex may easily lead to deadlock situations as pointed out in #3324. To avoid this situation it would be much
     40safer to use hRBMutex to protect the simoultaneous access to the per-band cache as well. In most cases when accessing the per-band
     41cache, hRBMutex is also involved in various operations (like GDALRasterBlock:Touch, GDALRasterBlock:Internalize,
     42GDALRasterBlock::SafeLockBlock etc.) it would only require to extend the scope of this mutex by exposing hRBMutex to be
     43accessed by GDALRasterBand.
     44
     45To select between the array based and the hashtable based approaches a nMaxCacheSize parameter is introduced for each band.
     46When the array would exceed this size limit the hashtable based approach will be selected, and the hashtable is allocated
     47to this initial size by default.
     48
     49
     50== Implementation ==
     51
     52To implement the addition the following changes is made in the GDAL codebase:
     53
     541. In cpl_hash_set.cpp / cpl_hash_set.h a new creator function is added (CPLHashSetNew2) to accept the initial hash size.
     55
     562. In cpl_hash_set.cpp / cpl_hash_set.h CPLHashSetRemoveAll is added to remove all the elements in one operation.
     57
     583. In gdalrasterblock.cpp / gdal_priv.h  'static void **GethRBMutex( void )' is added to GDALRasterBlock in order to expose the mutex.
     59
     604. In gdalrasterband.cpp a new structure is added (BlockCacheCtx) which is the wrapper of the blocks stored in the hashtable.
     61
     625. In gdalrasterband.cpp static callback functions are added to support the hashtable operations (GDALBlockCacheHashFunc, GDALBlockCacheEqualFunc, 
     63
     64GDALBlockCacheFreeFunc)
     65
     666. In GDALRasterBand::InitBlockInfo the block cache array or the hashtable is allocated according to the raster size and the nMaxCacheSize parameter.
     67
     687. In GDALRasterBand::~GDALRasterBand the array and the hashtable is destroyed protected by hRBMutex.
     69
     708. In GDALRasterBand::AdoptBlock a block is added to the cache protected by hRBMutex.
     71
     729. GDALRasterBand::FlushCache is modified to remove all the elements of the hashtable in addition to the array approach.
     73
     7410. GDALRasterBand::FlushBlock removes an element from the cache is now protected by hRBMutex.
     75
     7611. *GDALRasterBand::TryGetLockedBlockRef tries to access the hashtable in addition to the array protected by hRBMutex.
     77
     78
     79== Backward Compatibility ==
     80
     81This implementation retains the backward compatibility with the existing API.
     82 
     83
     84== Documentation ==
     85
     86This change doesn't affect the existing user documentation.
     87
     88
     89== Implementation Staffing ==
     90
     91Tamas Szekeres will implement the RFC in the development version.
     92
     93
     94== References ==
     95
     96 * Tracking bug for this feature (containing a patch with the related changes): #3264
     97
     98== Voting History ==
     99
     100None