|Version 1 (modified by tamas, 3 years ago)|
RFC 26: GDAL Block Cache Improvements
Author: Tamas Szekeres
GDAL maintains an in-memory cache for the raster blocks fetched from the drivers and ensures that the second attempt to access the same block will be served from the cache instead of the driver. This cache is maintained in a per-band fashion and an array is allocated for the pointers for each blocks (or sub-blocks). This approach is not sufficient with large raster dimensions (or virtually large rasters ie. with the WMS/TMS driver) which may cause out of memory errors in GDALRasterBand::InitBlockInfo?. The purpose of this RFC would be to support switching between an array based and a new hashtable based block cache approach depending on the raster size, which would optimize between the performance and the memory utilization in all cases.
The infrastructure of the hashtable has already been established in cpl_hash_set.h / cpl_hash_set.cpp however this implementation is not thread safe by default. With regards to the thread-safety we should at least consider the following issues:
1. In gdalrasterblock.cpp a static linked list is maintained so as to track the access order of the blocks and keep the size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band may also trigger a GDALRasterBand::FlushBlock? call on an other band within the scope of this mutex. GDALRasterBand::FlushBlock? will also access the data structure of the band level cache by removing the corresponding tile from the array or the hashtable.
2. Destroying the datasets/bands in many environments may also happen withing the scope of a different thread (ie. by the garbage collector thread) which may also trigger a GDALRasterBand::FlushBlock? or eventually the band level block cache is also destroyed.
Protecting the band level cache from the access of multiple threads may happen by using a per-band a mutex of the raster band, however using this second mutex may easily lead to deadlock situations as pointed out in #3324. To avoid this situation it would be much safer to use hRBMutex to protect the simoultaneous access to the per-band cache as well. In most cases when accessing the per-band cache, hRBMutex is also involved in various operations (like GDALRasterBlock:Touch, GDALRasterBlock:Internalize, GDALRasterBlock::SafeLockBlock? etc.) it would only require to extend the scope of this mutex by exposing hRBMutex to be accessed by GDALRasterBand.
To select between the array based and the hashtable based approaches a nMaxCacheSize parameter is introduced for each band. When the array would exceed this size limit the hashtable based approach will be selected, and the hashtable is allocated to this initial size by default.
To implement the addition the following changes is made in the GDAL codebase:
1. In cpl_hash_set.cpp / cpl_hash_set.h a new creator function is added (CPLHashSetNew2) to accept the initial hash size.
2. In cpl_hash_set.cpp / cpl_hash_set.h CPLHashSetRemoveAll is added to remove all the elements in one operation.
3. In gdalrasterblock.cpp / gdal_priv.h 'static void **GethRBMutex( void )' is added to GDALRasterBlock in order to expose the mutex.
4. In gdalrasterband.cpp a new structure is added (BlockCacheCtx?) which is the wrapper of the blocks stored in the hashtable.
5. In gdalrasterband.cpp static callback functions are added to support the hashtable operations (GDALBlockCacheHashFunc, GDALBlockCacheEqualFunc,
6. In GDALRasterBand::InitBlockInfo? the block cache array or the hashtable is allocated according to the raster size and the nMaxCacheSize parameter.
7. In GDALRasterBand::~GDALRasterBand the array and the hashtable is destroyed protected by hRBMutex.
8. In GDALRasterBand::AdoptBlock? a block is added to the cache protected by hRBMutex.
9. GDALRasterBand::FlushCache? is modified to remove all the elements of the hashtable in addition to the array approach.
10. GDALRasterBand::FlushBlock? removes an element from the cache is now protected by hRBMutex.
11. *GDALRasterBand::TryGetLockedBlockRef? tries to access the hashtable in addition to the array protected by hRBMutex.
This implementation retains the backward compatibility with the existing API.
This change doesn't affect the existing user documentation.
Tamas Szekeres will implement the RFC in the development version.
- Tracking bug for this feature (containing a patch with the related changes): #3264