Changes between Version 4 and Version 5 of rfc26_blockcache


Ignore:
Timestamp:
Jun 4, 2015, 6:54:42 AM (9 years ago)
Author:
Even Rouault
Comment:

Update of RFC 26

Legend:

Unmodified
Added
Removed
Modified
  • rfc26_blockcache

    v4 v5  
    11= RFC 26: GDAL Block Cache Improvements =
    22
    3 Author: Tamas Szekeres[[BR]]
    4 Contact: szekerest@gmail.com[[BR]]
    5 Status: Development
     3Authors: Tamas Szekeres, Even Rouault[[BR]]
     4Contact: szekerest@gmail.com, even.rouault at spatialys.com[[BR]]
     5Status: Development[[BR]]
     6Target version: GDAL 2.1
    67
    7 == Summary ==
     8== Summary and rationale ==
    89
    910GDAL maintains an in-memory cache for the raster blocks fetched from the drivers and ensures that the second attempt
    1011to access the same block will be served from the cache instead of the driver. This cache is maintained in a per-band fashion
    1112and an array is allocated for the pointers for each blocks (or sub-blocks). This approach is not sufficient with large
    12 raster dimensions (or virtually large rasters ie. with the WMS/TMS driver) which may cause out of memory errors in
    13 GDALRasterBand::InitBlockInfo.
    14 The purpose of this RFC would be to support switching between an array based and a new hashtable based block cache approach
    15 depending on the raster size, which would optimize between the performance and the memory utilization in all cases.
     13raster dimensions (or large virtual rasters ie. with the WMS/TMS driver), which may cause out of memory errors in
     14GDALRasterBand::InitBlockInfo, as raised in #3224
    1615
    17 This RFC would also address the multithreading issues discovered in #3224 and #3225.
     16For example, a band of a dataset at level 21 with a GoogleMaps tiling requires 2097152x2097152 tiles of 256x256 pixels. This means that GDAL will try to allocate an array of 32768x32768 = 1 billion elements (32768 = 2097152 / 64). The size of this array is 4 GB on a 32-bit build, so it cannot be allocated at all. And it is 8 GB on a 64-bit build (even if the size is generally only virtual memory but not actually allocated as physical pages of memory due to over-commit mechanism of the operating system). At dataset closing, this means that those 1 billion cells will have to be explored to discover remaining cached blocks. In reality, all above figures must be multiplied by 3 for a RGB (or 4 for a RGBA) dataset.
    1817
     18In the hash set implementation, memory allocation depends directly on the number of cached blocks. Typically with the default GDAL_CACHEMAX size of 40 MB, only 640 blocks of 256x256 pixels can be simultaneously cached (for all datasets).
    1919
    2020== Main concepts ==
    2121
    22 The infrastructure of the hashtable has already been established in cpl_hash_set.h / cpl_hash_set.cpp however this
    23 implementation is not thread safe by default. With regards to the thread-safety we should at least consider the following
    24 issues:
     22Awareness of thread-safety issues is crucial in the design of block caching. In gdalrasterblock.cpp, a static linked list is maintained so as to track the access order of the blocks and keep the size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band, when reading a new block, may also trigger a GDALRasterBand::UnreferenceBlock call on another band within the scope of this mutex. GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the array or the hashtable.
    2523
    26 1. In gdalrasterblock.cpp a static linked list is maintained so as to track the access order of the blocks and keep the
    27 size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the
    28 datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band may also trigger a
    29 GDALRasterBand::FlushBlock call on an other band within the scope of this mutex.
    30 GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the
    31 array or the hashtable.
     24In GDAL 2.0, some issues related to thread-safety (#3225, #3226) have been fixed and this RFC still preserves those scenarios as safe.
    3225
    33 2. Destroying the datasets/bands in many environments may also happen withing the scope of a different thread
    34 (ie. by the garbage collector thread) which may also trigger a GDALRasterBand::FlushBlock or eventually the band level
    35 block cache is also destroyed.
     26The changes of this RFC consist in moving away from the GDALRasterBand class the logic to access to a cached block, to add or remove it. This is done with the new GDALAbstractBandBlockCache class. The current array based logic is moved into the new GDALArrayBandBlockCache class, and the new hashset based logic in GDALHashsetBandBlockCache.
    3627
     28For the array based implementation, due to the "static" nature of the hosting structure (an array), no special care is needed when reading or writing a cell from concurrent threads. The only special care that must be taken is to prevent a given cell (block) to be accessed concurrently. For example we want to avoid TryGetLockedBlockRef() to return a block that is being freed by another thread from GDALRasterBlock::FlushCacheBlock() or Internalize(). For that, the nRefCount member of GDALRasterBlock is now accessed and modified only through atomic functions to increase, decrease or compare-and-swap its value.
    3729
    38 Protecting the band level cache from the access of multiple threads may happen by using a per-band mutex of the
    39 raster band, however using this second mutex may easily lead to deadlock situations as pointed out in #3224. To avoid this situation it would be much
    40 safer to use hRBMutex to protect the simoultaneous access to the per-band cache as well. In most cases when accessing the per-band
    41 cache, hRBMutex is also involved in various operations (like GDALRasterBlock:Touch, GDALRasterBlock:Internalize,
    42 GDALRasterBlock::SafeLockBlock etc.) it would only be required to extend the scope of this mutex by exposing hRBMutex to be
    43 accessed by GDALRasterBand.
     30For the hash set based implementation, the base implementation of hash set data structure done in in cpl_hash_set.h / cpl_hash_set.cpp
     31is not thread safe by default. So GDALHashsetBandBlockCache has a dedicated mutex to protect all reads, additions and removals from the hash set. No dead-lock with the hRBMutex can occurs since no operations done under the hashset mutex involves calling any method from GDALRasterBlock.
    4432
    45 To select between the array based and the hashtable based approaches a nMaxCacheSize parameter is introduced at the band.
    46 When the array would exceed this size limit the hashtable based approach is selected, and the hashtable is allocated
    47 to this initial size by default.
     33We could potentially have reused the hRBMutex to protect the hash set, but this would have increased the contention of the hRBMutex unnecessarily.
    4834
     35By default, the selection between the array based and the hashtable based approaches is based on the following rule: if the dataset has more than 1 million blocks, the hashset based implementation is used, otherwise the array based implementation is used.
     36The new GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS open flags can also be passed to GDALOpenEx() to override this choice.
     37The GDAL_BAND_BLOCK_CACHE configuration option can also be set to ARRAY or HASHSET.
     38
     39The hashset based implementation could potentially be the default implementation in all cases (performance comparisons done with the autotest/cpp/testblockcache utility with 4 or 8 cores show non measurable differences), but in theory the array based implementation offers less contention of the hRBMutex, so should be more scalable when using lots of cores. And as work has been done during GDAL 2.0 to improve the scalability, it might be prudent for now to remain on the array based implementation on rasters of modest size.
     40
     41Not completely linked with this RFC, a few changes have been done to limit the number of allocation/deallocation of objects (GDALRasterBlock instances, as well as an internal element of CPLHashSet), which has an effect on scalability since memory allocation routines involve synchronization between threads.
    4942
    5043== Implementation ==
     
    5245To implement the addition the following changes is made in the GDAL codebase:
    5346
    54 1. In cpl_hash_set.cpp / cpl_hash_set.h a new creator function is added (CPLHashSetNew2) to accept the initial hash size.
     47  * port/cpl_hash_set.cpp / port/cpl_hash_set.h: CPLHashSetClear() function added to remove all the elements in one operation.
    5548
    56 2. In cpl_hash_set.cpp / cpl_hash_set.h CPLHashSetRemoveAll is added to remove all the elements in one operation.
     49  * port/cpl_hash_set.cpp / port/cpl_hash_set.h: CPLHashSetRemoveDeferRehash() function added to remove one element quickly. That is to say the potential resizing of the array used internally is defered to a later operation
    5750
    58 3. In gdalrasterblock.cpp / gdal_priv.h  'static void **GethRBMutex( void )' is added to GDALRasterBlock in order to expose the mutex.
     51  * port/cpl_hash_set.cpp / port/cpl_hash_set.h: improvements to "recycle" links from the linked lists and avoid useless malloc()/free().
    5952
    60 4. In gdalrasterband.cpp a new structure is added (BlockCacheCtx) which is the wrapper of the blocks stored in the hashtable.
     53  * port/cpl_atomic_ops.cpp: addition of CPLAtomicCompareAndExchange()
    6154
    62 5. In gdalrasterband.cpp static callback functions are added to support the hashtable operations (GDALBlockCacheHashFunc, GDALBlockCacheEqualFunc, 
     55  * gcore/gdal.h: additions of GDAL_OF_DEFAULT_BLOCK_ACCESS, GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS values.
    6356
    64 GDALBlockCacheFreeFunc)
     57  * gcore/gdal_priv.h: definition of GDALAbstractBandBlockCache class, and GDALArrayBandBlockCacheCreate() and GDALHashSetBandBlockCacheCreate() functions. Modifications of GDALRasterBand, GDALDataset and GDALRasterBlock definitions.
    6558
    66 6. In GDALRasterBand::InitBlockInfo the block cache array or the hashtable is allocated according to the raster size and the nMaxCacheSize parameter.
     59  * gcore/gdalrasterband.cpp: InitBlockInfo() instanciates the appropriate band block cache implementation.
    6760
    68 7. In GDALRasterBand::~GDALRasterBand the array and the hashtable is destroyed protected by hRBMutex.
     61  * gcore/gdalrasterband.cpp: the AdoptBlock(), UnreferenceBlock(), FlushBlock() and TryGetLockedBlockRef() methods delegate to the actual band block cache implementation.
    6962
    70 8. In GDALRasterBand::AdoptBlock a block is added to the cache protected by hRBMutex.
     63  * gcore/gdalrasterband.cpp: AddBlockToFreeList() is added and delegate to GDALAbstractBandBlockCache
    7164
    72 9. GDALRasterBand::FlushCache is modified to remove all the elements of the hashtable in addition to the array approach.
     65  * gcore/gdalrasterblock.cpp: SafeLockBlock() is replaced by TakeLock()
    7366
    74 10. GDALRasterBand::FlushBlock removes an element from the cache is now protected by hRBMutex.
     67  * gcore/gdalrasterblock.cpp: RecycleFor() method added to recycle an existing block object to save a few new/delete calls (used by GDALAbstractBandBlockCache::CreateBlock())
    7568
    76 11. *GDALRasterBand::TryGetLockedBlockRef tries to access the hashtable in addition to the array protected by hRBMutex.
     69  * gcore/gdalrasterblock.cpp: Internalize() or FlushCacheBlock() no longer directly free a block (they still free or recycle its pData member), but provide it to GDALRasterBand::AddBlockToFreeList() for layer reuse.
    7770
     71  * gcore/gdalrasterblock.cpp: DropLockForRemovalFromStorage() is added to avoid racing destruction of blocks between GDALRasterBand::FlushCache() or FlushBlock() with GDALRasterBlock::Internalize() or FlushCacheBlock().
     72
     73  * gcore/gdalabstractbandblockcache.cpp: added. Contains logic to keep instanciated GDALRasterBlock that were discarded by the global block manager for their later reuse. Saves a few new/delete calls.
     74
     75  * gcore/gdalarraybandblockcache.cpp: the GDALArrayBandBlockCache class implementation with mostly the existing code
     76
     77  * gcore/gdalhashsetbandblockcache.cpp: the new GDALHashsetBandBlockCache class implementation
    7878
    7979== Backward Compatibility ==
    8080
    81 This implementation retains the backward compatibility with the existing API.
     81This implementation retains the backward compatibility with the existing API. The C++ ABI of GDALRasterBand, GDALDataset and GDALRasterBlock is modified.
    8282 
     83== Performance impacts ==
    8384
     85The array based implementation after this RFC should still show the same performance than the current implementation (potentially very slightly improved with the recycling of blocks). Confirmed by tests with autotest/cpp/testblockcache.
     86 
    8487== Documentation ==
    8588
    8689This change doesn't affect the existing user documentation.
    8790
     91== Testing ==
    8892
    89 == Implementation Staffing ==
     93The autotest/cpp/testblockcache utility is now run by the "quick_test" target of autotest/cpp/Makefile with GDAL_BAND_BLOCK_CACHE=HASHSET in additions to the array based implementation.
    9094
    91 Tamas Szekeres will implement the RFC in the development version.
     95A new autotest/cpp/testblockcachelimits utility has been developed to test a few racing situations. As races are hard to trigger, the code of GDALRasterBlock has been instrumented to allow sleeping in particular places, enabling races to be reliably simulated.
    9296
     97== Implementation ==
     98
     99Tamas Szekeres had provided an initial version of this RFC. It has been restructured and ported on GDAL 2.0 by Even Rouault (sponsored by [http://www.linz.govt.nz/ LINZ (Land Information New Zealand)])
    93100
    94101== References ==
    95102
    96  * Tracking bug for this feature (containing a patch with the related changes): #3264
     103The proposed implementation lies in the "rfc26_bandblockcache" branch of the
     104https://github.com/rouault/gdal2/tree/rfc26_bandblockcache repository.
     105
     106The list of changes: https://github.com/rouault/gdal2/compare/rfc26_bandblockcache
     107
     108Related bugs: #3264, #3224.
    97109
    98110== Voting History ==