wiki:rfc26_blockcache

Version 3 (modified by tamas, 14 years ago) ( diff )

--

RFC 26: GDAL Block Cache Improvements

Author: Tamas Szekeres
Contact: szekerest@…
Status: Development

Summary

GDAL maintains an in-memory cache for the raster blocks fetched from the drivers and ensures that the second attempt to access the same block will be served from the cache instead of the driver. This cache is maintained in a per-band fashion and an array is allocated for the pointers for each blocks (or sub-blocks). This approach is not sufficient with large raster dimensions (or virtually large rasters ie. with the WMS/TMS driver) which may cause out of memory errors in GDALRasterBand::InitBlockInfo. The purpose of this RFC would be to support switching between an array based and a new hashtable based block cache approach depending on the raster size, which would optimize between the performance and the memory utilization in all cases.

This RFC would also address the multithreading issues discovered in #3324 and #3325.

Main concepts

The infrastructure of the hashtable has already been established in cpl_hash_set.h / cpl_hash_set.cpp however this implementation is not thread safe by default. With regards to the thread-safety we should at least consider the following issues:

  1. In gdalrasterblock.cpp a static linked list is maintained so as to track the access order of the blocks and keep the

size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band may also trigger a GDALRasterBand::FlushBlock call on an other band within the scope of this mutex. GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the array or the hashtable.

  1. Destroying the datasets/bands in many environments may also happen withing the scope of a different thread

(ie. by the garbage collector thread) which may also trigger a GDALRasterBand::FlushBlock or eventually the band level block cache is also destroyed.

Protecting the band level cache from the access of multiple threads may happen by using a per-band mutex of the raster band, however using this second mutex may easily lead to deadlock situations as pointed out in #3324. To avoid this situation it would be much safer to use hRBMutex to protect the simoultaneous access to the per-band cache as well. In most cases when accessing the per-band cache, hRBMutex is also involved in various operations (like GDALRasterBlock:Touch, GDALRasterBlock:Internalize, GDALRasterBlock::SafeLockBlock etc.) it would only be required to extend the scope of this mutex by exposing hRBMutex to be accessed by GDALRasterBand.

To select between the array based and the hashtable based approaches a nMaxCacheSize parameter is introduced at the band. When the array would exceed this size limit the hashtable based approach is selected, and the hashtable is allocated to this initial size by default.

Implementation

To implement the addition the following changes is made in the GDAL codebase:

  1. In cpl_hash_set.cpp / cpl_hash_set.h a new creator function is added (CPLHashSetNew2) to accept the initial hash size.
  1. In cpl_hash_set.cpp / cpl_hash_set.h CPLHashSetRemoveAll is added to remove all the elements in one operation.
  1. In gdalrasterblock.cpp / gdal_priv.h 'static void GethRBMutex( void )' is added to GDALRasterBlock in order to expose the mutex.
  1. In gdalrasterband.cpp a new structure is added (BlockCacheCtx) which is the wrapper of the blocks stored in the hashtable.
  1. In gdalrasterband.cpp static callback functions are added to support the hashtable operations (GDALBlockCacheHashFunc, GDALBlockCacheEqualFunc,

GDALBlockCacheFreeFunc)

  1. In GDALRasterBand::InitBlockInfo the block cache array or the hashtable is allocated according to the raster size and the nMaxCacheSize parameter.
  1. In GDALRasterBand::~GDALRasterBand the array and the hashtable is destroyed protected by hRBMutex.
  1. In GDALRasterBand::AdoptBlock a block is added to the cache protected by hRBMutex.
  1. GDALRasterBand::FlushCache is modified to remove all the elements of the hashtable in addition to the array approach.
  1. GDALRasterBand::FlushBlock removes an element from the cache is now protected by hRBMutex.
  1. *GDALRasterBand::TryGetLockedBlockRef tries to access the hashtable in addition to the array protected by hRBMutex.

Backward Compatibility

This implementation retains the backward compatibility with the existing API.

Documentation

This change doesn't affect the existing user documentation.

Implementation Staffing

Tamas Szekeres will implement the RFC in the development version.

References

  • Tracking bug for this feature (containing a patch with the related changes): #3264

Voting History

None

Note: See TracWiki for help on using the wiki.