Changes between Version 4 and Version 5 of rfc26_blockcache
- Timestamp:
- Jun 4, 2015, 6:54:42 AM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
rfc26_blockcache
v4 v5 1 1 = RFC 26: GDAL Block Cache Improvements = 2 2 3 Author: Tamas Szekeres[[BR]] 4 Contact: szekerest@gmail.com[[BR]] 5 Status: Development 3 Authors: Tamas Szekeres, Even Rouault[[BR]] 4 Contact: szekerest@gmail.com, even.rouault at spatialys.com[[BR]] 5 Status: Development[[BR]] 6 Target version: GDAL 2.1 6 7 7 == Summary ==8 == Summary and rationale == 8 9 9 10 GDAL maintains an in-memory cache for the raster blocks fetched from the drivers and ensures that the second attempt 10 11 to access the same block will be served from the cache instead of the driver. This cache is maintained in a per-band fashion 11 12 and an array is allocated for the pointers for each blocks (or sub-blocks). This approach is not sufficient with large 12 raster dimensions (or virtually large rasters ie. with the WMS/TMS driver) which may cause out of memory errors in 13 GDALRasterBand::InitBlockInfo. 14 The purpose of this RFC would be to support switching between an array based and a new hashtable based block cache approach 15 depending on the raster size, which would optimize between the performance and the memory utilization in all cases. 13 raster dimensions (or large virtual rasters ie. with the WMS/TMS driver), which may cause out of memory errors in 14 GDALRasterBand::InitBlockInfo, as raised in #3224 16 15 17 This RFC would also address the multithreading issues discovered in #3224 and #3225.16 For example, a band of a dataset at level 21 with a GoogleMaps tiling requires 2097152x2097152 tiles of 256x256 pixels. This means that GDAL will try to allocate an array of 32768x32768 = 1 billion elements (32768 = 2097152 / 64). The size of this array is 4 GB on a 32-bit build, so it cannot be allocated at all. And it is 8 GB on a 64-bit build (even if the size is generally only virtual memory but not actually allocated as physical pages of memory due to over-commit mechanism of the operating system). At dataset closing, this means that those 1 billion cells will have to be explored to discover remaining cached blocks. In reality, all above figures must be multiplied by 3 for a RGB (or 4 for a RGBA) dataset. 18 17 18 In the hash set implementation, memory allocation depends directly on the number of cached blocks. Typically with the default GDAL_CACHEMAX size of 40 MB, only 640 blocks of 256x256 pixels can be simultaneously cached (for all datasets). 19 19 20 20 == Main concepts == 21 21 22 The infrastructure of the hashtable has already been established in cpl_hash_set.h / cpl_hash_set.cpp however this 23 implementation is not thread safe by default. With regards to the thread-safety we should at least consider the following 24 issues: 22 Awareness of thread-safety issues is crucial in the design of block caching. In gdalrasterblock.cpp, a static linked list is maintained so as to track the access order of the blocks and keep the size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band, when reading a new block, may also trigger a GDALRasterBand::UnreferenceBlock call on another band within the scope of this mutex. GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the array or the hashtable. 25 23 26 1. In gdalrasterblock.cpp a static linked list is maintained so as to track the access order of the blocks and keep the 27 size of the cache within a desired limit by dropping the oldest blocks out of the list. This linked list is shared among all the 28 datasets and bands in GDAL (protected by a hRBMutex) and a thread on each band may also trigger a 29 GDALRasterBand::FlushBlock call on an other band within the scope of this mutex. 30 GDALRasterBand::FlushBlock will also access the data structure of the band level cache by removing the corresponding tile from the 31 array or the hashtable. 24 In GDAL 2.0, some issues related to thread-safety (#3225, #3226) have been fixed and this RFC still preserves those scenarios as safe. 32 25 33 2. Destroying the datasets/bands in many environments may also happen withing the scope of a different thread 34 (ie. by the garbage collector thread) which may also trigger a GDALRasterBand::FlushBlock or eventually the band level 35 block cache is also destroyed. 26 The changes of this RFC consist in moving away from the GDALRasterBand class the logic to access to a cached block, to add or remove it. This is done with the new GDALAbstractBandBlockCache class. The current array based logic is moved into the new GDALArrayBandBlockCache class, and the new hashset based logic in GDALHashsetBandBlockCache. 36 27 28 For the array based implementation, due to the "static" nature of the hosting structure (an array), no special care is needed when reading or writing a cell from concurrent threads. The only special care that must be taken is to prevent a given cell (block) to be accessed concurrently. For example we want to avoid TryGetLockedBlockRef() to return a block that is being freed by another thread from GDALRasterBlock::FlushCacheBlock() or Internalize(). For that, the nRefCount member of GDALRasterBlock is now accessed and modified only through atomic functions to increase, decrease or compare-and-swap its value. 37 29 38 Protecting the band level cache from the access of multiple threads may happen by using a per-band mutex of the 39 raster band, however using this second mutex may easily lead to deadlock situations as pointed out in #3224. To avoid this situation it would be much 40 safer to use hRBMutex to protect the simoultaneous access to the per-band cache as well. In most cases when accessing the per-band 41 cache, hRBMutex is also involved in various operations (like GDALRasterBlock:Touch, GDALRasterBlock:Internalize, 42 GDALRasterBlock::SafeLockBlock etc.) it would only be required to extend the scope of this mutex by exposing hRBMutex to be 43 accessed by GDALRasterBand. 30 For the hash set based implementation, the base implementation of hash set data structure done in in cpl_hash_set.h / cpl_hash_set.cpp 31 is not thread safe by default. So GDALHashsetBandBlockCache has a dedicated mutex to protect all reads, additions and removals from the hash set. No dead-lock with the hRBMutex can occurs since no operations done under the hashset mutex involves calling any method from GDALRasterBlock. 44 32 45 To select between the array based and the hashtable based approaches a nMaxCacheSize parameter is introduced at the band. 46 When the array would exceed this size limit the hashtable based approach is selected, and the hashtable is allocated 47 to this initial size by default. 33 We could potentially have reused the hRBMutex to protect the hash set, but this would have increased the contention of the hRBMutex unnecessarily. 48 34 35 By default, the selection between the array based and the hashtable based approaches is based on the following rule: if the dataset has more than 1 million blocks, the hashset based implementation is used, otherwise the array based implementation is used. 36 The new GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS open flags can also be passed to GDALOpenEx() to override this choice. 37 The GDAL_BAND_BLOCK_CACHE configuration option can also be set to ARRAY or HASHSET. 38 39 The hashset based implementation could potentially be the default implementation in all cases (performance comparisons done with the autotest/cpp/testblockcache utility with 4 or 8 cores show non measurable differences), but in theory the array based implementation offers less contention of the hRBMutex, so should be more scalable when using lots of cores. And as work has been done during GDAL 2.0 to improve the scalability, it might be prudent for now to remain on the array based implementation on rasters of modest size. 40 41 Not completely linked with this RFC, a few changes have been done to limit the number of allocation/deallocation of objects (GDALRasterBlock instances, as well as an internal element of CPLHashSet), which has an effect on scalability since memory allocation routines involve synchronization between threads. 49 42 50 43 == Implementation == … … 52 45 To implement the addition the following changes is made in the GDAL codebase: 53 46 54 1. In cpl_hash_set.cpp / cpl_hash_set.h a new creator function is added (CPLHashSetNew2) to accept the initial hash size. 47 * port/cpl_hash_set.cpp / port/cpl_hash_set.h: CPLHashSetClear() function added to remove all the elements in one operation. 55 48 56 2. In cpl_hash_set.cpp / cpl_hash_set.h CPLHashSetRemoveAll is added to remove all the elements in one operation. 49 * port/cpl_hash_set.cpp / port/cpl_hash_set.h: CPLHashSetRemoveDeferRehash() function added to remove one element quickly. That is to say the potential resizing of the array used internally is defered to a later operation 57 50 58 3. In gdalrasterblock.cpp / gdal_priv.h 'static void **GethRBMutex( void )' is added to GDALRasterBlock in order to expose the mutex.51 * port/cpl_hash_set.cpp / port/cpl_hash_set.h: improvements to "recycle" links from the linked lists and avoid useless malloc()/free(). 59 52 60 4. In gdalrasterband.cpp a new structure is added (BlockCacheCtx) which is the wrapper of the blocks stored in the hashtable. 53 * port/cpl_atomic_ops.cpp: addition of CPLAtomicCompareAndExchange() 61 54 62 5. In gdalrasterband.cpp static callback functions are added to support the hashtable operations (GDALBlockCacheHashFunc, GDALBlockCacheEqualFunc, 55 * gcore/gdal.h: additions of GDAL_OF_DEFAULT_BLOCK_ACCESS, GDAL_OF_ARRAY_BLOCK_ACCESS and GDAL_OF_HASHSET_BLOCK_ACCESS values. 63 56 64 GDALBlockCacheFreeFunc) 57 * gcore/gdal_priv.h: definition of GDALAbstractBandBlockCache class, and GDALArrayBandBlockCacheCreate() and GDALHashSetBandBlockCacheCreate() functions. Modifications of GDALRasterBand, GDALDataset and GDALRasterBlock definitions. 65 58 66 6. In GDALRasterBand::InitBlockInfo the block cache array or the hashtable is allocated according to the raster size and the nMaxCacheSize parameter.59 * gcore/gdalrasterband.cpp: InitBlockInfo() instanciates the appropriate band block cache implementation. 67 60 68 7. In GDALRasterBand::~GDALRasterBand the array and the hashtable is destroyed protected by hRBMutex.61 * gcore/gdalrasterband.cpp: the AdoptBlock(), UnreferenceBlock(), FlushBlock() and TryGetLockedBlockRef() methods delegate to the actual band block cache implementation. 69 62 70 8. In GDALRasterBand::AdoptBlock a block is added to the cache protected by hRBMutex. 63 * gcore/gdalrasterband.cpp: AddBlockToFreeList() is added and delegate to GDALAbstractBandBlockCache 71 64 72 9. GDALRasterBand::FlushCache is modified to remove all the elements of the hashtable in addition to the array approach. 65 * gcore/gdalrasterblock.cpp: SafeLockBlock() is replaced by TakeLock() 73 66 74 10. GDALRasterBand::FlushBlock removes an element from the cache is now protected by hRBMutex. 67 * gcore/gdalrasterblock.cpp: RecycleFor() method added to recycle an existing block object to save a few new/delete calls (used by GDALAbstractBandBlockCache::CreateBlock()) 75 68 76 11. *GDALRasterBand::TryGetLockedBlockRef tries to access the hashtable in addition to the array protected by hRBMutex.69 * gcore/gdalrasterblock.cpp: Internalize() or FlushCacheBlock() no longer directly free a block (they still free or recycle its pData member), but provide it to GDALRasterBand::AddBlockToFreeList() for layer reuse. 77 70 71 * gcore/gdalrasterblock.cpp: DropLockForRemovalFromStorage() is added to avoid racing destruction of blocks between GDALRasterBand::FlushCache() or FlushBlock() with GDALRasterBlock::Internalize() or FlushCacheBlock(). 72 73 * gcore/gdalabstractbandblockcache.cpp: added. Contains logic to keep instanciated GDALRasterBlock that were discarded by the global block manager for their later reuse. Saves a few new/delete calls. 74 75 * gcore/gdalarraybandblockcache.cpp: the GDALArrayBandBlockCache class implementation with mostly the existing code 76 77 * gcore/gdalhashsetbandblockcache.cpp: the new GDALHashsetBandBlockCache class implementation 78 78 79 79 == Backward Compatibility == 80 80 81 This implementation retains the backward compatibility with the existing API. 81 This implementation retains the backward compatibility with the existing API. The C++ ABI of GDALRasterBand, GDALDataset and GDALRasterBlock is modified. 82 82 83 == Performance impacts == 83 84 85 The array based implementation after this RFC should still show the same performance than the current implementation (potentially very slightly improved with the recycling of blocks). Confirmed by tests with autotest/cpp/testblockcache. 86 84 87 == Documentation == 85 88 86 89 This change doesn't affect the existing user documentation. 87 90 91 == Testing == 88 92 89 == Implementation Staffing == 93 The autotest/cpp/testblockcache utility is now run by the "quick_test" target of autotest/cpp/Makefile with GDAL_BAND_BLOCK_CACHE=HASHSET in additions to the array based implementation. 90 94 91 Tamas Szekeres will implement the RFC in the development version.95 A new autotest/cpp/testblockcachelimits utility has been developed to test a few racing situations. As races are hard to trigger, the code of GDALRasterBlock has been instrumented to allow sleeping in particular places, enabling races to be reliably simulated. 92 96 97 == Implementation == 98 99 Tamas Szekeres had provided an initial version of this RFC. It has been restructured and ported on GDAL 2.0 by Even Rouault (sponsored by [http://www.linz.govt.nz/ LINZ (Land Information New Zealand)]) 93 100 94 101 == References == 95 102 96 * Tracking bug for this feature (containing a patch with the related changes): #3264 103 The proposed implementation lies in the "rfc26_bandblockcache" branch of the 104 https://github.com/rouault/gdal2/tree/rfc26_bandblockcache repository. 105 106 The list of changes: https://github.com/rouault/gdal2/compare/rfc26_bandblockcache 107 108 Related bugs: #3264, #3224. 97 109 98 110 == Voting History ==