Changes between Version 5 and Version 6 of rfc47_dataset_caching
- Timestamp:
- Jul 29, 2014, 9:00:27 AM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
rfc47_dataset_caching
v5 v6 9 9 When utilizing GDAL in multithread code, it was found that often the limiting portion of the code was often the LRU block cache within GDAL. This is an attempt to make the LRU cache more efficient in multithreaded situations by making it possible to have multiple LRU per dataset and optimizing when locking occurs. 10 10 11 == Pull Request==11 == Two Different Solutions == 12 12 13 [https://github.com/OSGeo/gdal/pull/38 Pull Request] 13 Two different ways for solving this problem are being proposed and both have been coded up (test code for each still to be written). However, both share some common solutions. First I will go over the common changes for the two different methods, then the ways in which the two solutions differ. 14 14 15 == Dataset Caching == 15 == Pull Requests == 16 17 * [https://github.com/OSGeo/gdal/pull/38 Pull Request #1] - SOLUTION 1 (Dataset RW Locking) 18 * [https://github.com/OSGeo/gdal/pull/39 Pull Request #2] - SOLUTION 2 (Block RW Locking) 19 20 == Common Solution == 21 22 === Dataset Caching === 16 23 17 24 The static global mutex that is limiting performance is located within gcore/gdalrasterblock.cpp. This mutex is there to protect the setting of the maximum cache, the LRU cache itself itself, and the current size of the cache. The current scope of this mutex makes it lock for extended periods once the cache is full, and new memory is being initialized in GDALRasterBlock::Internalize(). 18 25 19 In order to remove the need for this LRU cache to be locked more often a new global config option "GDAL_DATASET_CACHING" causes the LRU cache to be per dataset when set to "YES", rather then a global cache ("NO" Default).26 In order to remove the need for this LRU cache to be locked more often a new global config option is introducted "GDAL_DATASET_CACHING". This causes the LRU cache to be per dataset when set to "YES", rather then a global cache ("NO" Default). Doing this will also allow threaded applications to flush only the cache for a single dataset, improving performance in some situations for two reasons. First a cache of a more commonly used dataset, might be set separately from other datasets, meaning that it is more likely to remain cached. The second is that the lack of a common global mutex will result in a less likely situation of two threads locking the same mutex if operations are being performed on different datasets. 20 27 21 28 In order to have management of the different caches, a GDALRasterBlockManager class is introduced. This class is responsible for the management of the cache in the global or per dataset situations. 22 29 23 === GDALRasterBlockManager===30 ==== GDALRasterBlockManager ==== 24 31 25 32 {{{ … … 51 58 }}} 52 59 53 Many of the operations originally done by statics within GDALRasterBlock are now moved into the RasterBlockManager.60 Many of the operations originally done by statics within GDALRasterBlock are now moved into the GDALRasterBlockManager. 54 61 55 === GDALDataset===62 ==== GDALDataset ==== 56 63 57 64 Every GDALDataset now has a: … … 76 83 }}} 77 84 78 === GDALRasterBand===85 ==== GDALRasterBand ==== 79 86 80 In order to make caching safer and more effecient, a mutex as also introduced in GDALRasterBand as well. The job of this mutex is to protect individual GDALRasterBlocks and to protectthe RasterBlock array per band (papoBlocks).87 In order to make caching safer and more effecient, a mutex as also introduced in GDALRasterBand as well. The job of this mutex is to protect the RasterBlock array per band (papoBlocks). 81 88 82 == Thread Safety ==89 == Thread Safety and the Two Solutions == 83 90 84 The multithreading of GDAL is a complicated thing, while these changes do seek to '''improve''' threading within GDAL. It does not '''solve''' threading problems within GDAL and make it truly thread safe. The goal of this change is simply to make the cache thread safe, in order to achieve this three mutexes are utilized. 91 The multithreading of GDAL is a complicated thing, while these changes do seek to '''improve''' threading within GDAL. It does not '''solve''' threading problems within GDAL and make it truly thread safe. The goal of this change is simply to make the cache thread safe, in order to achieve this three mutexes are utilized. Where these three mutexes are located is different between the two solutions proposed. 85 92 86 * RW Mutex (per GDALDataset) 93 === Solution 1 (RW Mutex in GDALDataset ) === 94 95 For solution one the three mutexes are: 96 97 * Dataset RW Mutex (per GDALDataset) 87 98 * Band Mutex (per GDALRasterBand) 88 99 * RBM Mutex (per GDALRasterBlockManager) 89 100 90 In order to prevent deadlocks, a priority of the mutexes is established in the order they are listed. For example if you have the Band Mutex, you may not obtain the RW Mutex, unless it was obtained prior to the Band Mutex being obtained.101 In order to prevent deadlocks, a priority of the mutexes is established in the order they are listed. For example if you have the Band Mutex, you may not obtain the Dataset RW Mutex, unless it was obtained prior to the Band Mutex being obtained. However, the goal should always be to never have more then mutex at a time! 91 102 92 === RW Mutex ===103 === Dataset RW Mutex === 93 104 94 The objective of the RW Mutex is to protect the data stored within the the GDALRasterBlocks associated with a dataset, and lock during large Read or Write operations. This prevents two different threads from using memcpy on the same GDALRasterBlock at the same time. This mutex normally lies within the GDALDataset, but in the case of a standalone GDALRasterBand, it utilizes a new mutex on the Band.105 The objective of the Dataset RW Mutex is to protect the data stored within the the GDALRasterBlocks associated with a dataset, and lock during large Read or Write operations. This prevents two different threads from using memcpy on the same GDALRasterBlock at the same time. This mutex normally lies within the GDALDataset, but in the case of a standalone GDALRasterBand, it utilizes a new mutex on the Band. 95 106 96 107 === Band Mutex ===