Changes between Version 5 and Version 6 of rfc47_dataset_caching


Ignore:
Timestamp:
Jul 29, 2014, 9:00:27 AM (10 years ago)
Author:
flippmoke
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • rfc47_dataset_caching

    v5 v6  
    99When utilizing GDAL in multithread code, it was found that often the limiting portion of the code was often the LRU block cache within GDAL. This is an attempt to make the LRU cache more efficient in multithreaded situations by making it possible to have multiple LRU per dataset and optimizing when locking occurs.
    1010
    11 == Pull Request ==
     11== Two Different Solutions ==
    1212
    13 [https://github.com/OSGeo/gdal/pull/38 Pull Request]
     13Two different ways for solving this problem are being proposed and both have been coded up (test code for each still to be written). However, both share some common solutions. First I will go over the common changes for the two different methods, then the ways in which the two solutions differ.
    1414
    15 == Dataset Caching ==
     15== Pull Requests ==
     16
     17 * [https://github.com/OSGeo/gdal/pull/38 Pull Request #1] - SOLUTION 1 (Dataset RW Locking)
     18 * [https://github.com/OSGeo/gdal/pull/39 Pull Request #2] - SOLUTION 2 (Block RW Locking)
     19
     20== Common Solution ==
     21
     22=== Dataset Caching ===
    1623
    1724The static global mutex that is limiting performance is located within gcore/gdalrasterblock.cpp. This mutex is there to protect the setting of the maximum cache, the LRU cache itself itself, and the current size of the cache. The current scope of this mutex makes it lock for extended periods once the cache is full, and new memory is being initialized in GDALRasterBlock::Internalize().
    1825
    19 In order to remove the need for this LRU cache to be locked more often a new global config option "GDAL_DATASET_CACHING" causes the LRU cache to be per dataset when set to "YES", rather then a global cache ("NO" Default).
     26In order to remove the need for this LRU cache to be locked more often a new global config option is introducted "GDAL_DATASET_CACHING". This causes the LRU cache to be per dataset when set to "YES", rather then a global cache ("NO" Default). Doing this will also allow threaded applications to flush only the cache for a single dataset, improving performance in some situations for two reasons. First a cache of a more commonly used dataset, might be set separately from other datasets, meaning that it is more likely to remain cached. The second is that the lack of a common global mutex will result in a less likely situation of two threads locking the same mutex if operations are being performed on different datasets.
    2027
    2128In order to have management of the different caches, a GDALRasterBlockManager class is introduced. This class is responsible for the management of the cache in the global or per dataset situations.
    2229
    23 === GDALRasterBlockManager ===
     30==== GDALRasterBlockManager ====
    2431
    2532{{{
     
    5158}}}
    5259
    53 Many of the operations originally done by statics within GDALRasterBlock are now moved into the RasterBlockManager.
     60Many of the operations originally done by statics within GDALRasterBlock are now moved into the GDALRasterBlockManager.
    5461
    55 === GDALDataset ===
     62==== GDALDataset ====
    5663
    5764Every GDALDataset now has a:
     
    7683}}}
    7784
    78 === GDALRasterBand ===
     85==== GDALRasterBand ====
    7986
    80 In order to make caching safer and more effecient, a mutex as also introduced in GDALRasterBand as well. The job of this mutex is to protect individual GDALRasterBlocks and to protect the RasterBlock array per band (papoBlocks).
     87In order to make caching safer and more effecient, a mutex as also introduced in GDALRasterBand as well. The job of this mutex is to protect the RasterBlock array per band (papoBlocks).
    8188
    82 == Thread Safety ==
     89== Thread Safety and the Two Solutions ==
    8390
    84 The multithreading of GDAL is a complicated thing, while these changes do seek to '''improve''' threading within GDAL. It does not '''solve''' threading problems within GDAL and make it truly thread safe. The goal of this change is simply to make the cache thread safe, in order to achieve this three mutexes are utilized.
     91The multithreading of GDAL is a complicated thing, while these changes do seek to '''improve''' threading within GDAL. It does not '''solve''' threading problems within GDAL and make it truly thread safe. The goal of this change is simply to make the cache thread safe, in order to achieve this three mutexes are utilized. Where these three mutexes are located is different between the two solutions proposed.
    8592
    86  * RW Mutex (per GDALDataset)
     93=== Solution 1 (RW Mutex in GDALDataset ) ===
     94
     95For solution one the three mutexes are:
     96
     97 * Dataset RW Mutex (per GDALDataset)
    8798 * Band Mutex (per GDALRasterBand)
    8899 * RBM Mutex (per GDALRasterBlockManager)
    89100
    90 In order to prevent deadlocks, a priority of the mutexes is established in the order they are listed. For example if you have the Band Mutex, you may not obtain the RW Mutex, unless it was obtained prior to the Band Mutex being obtained.
     101In order to prevent deadlocks, a priority of the mutexes is established in the order they are listed. For example if you have the Band Mutex, you may not obtain the Dataset RW Mutex, unless it was obtained prior to the Band Mutex being obtained. However, the goal should always be to never have more then mutex at a time!
    91102
    92 === RW Mutex ===
     103=== Dataset RW Mutex ===
    93104
    94 The objective of the RW Mutex is to protect the data stored within the the GDALRasterBlocks associated with a dataset, and lock during large Read or Write operations. This prevents two different threads from using memcpy on the same GDALRasterBlock at the same time. This mutex normally lies within the GDALDataset, but in the case of a standalone GDALRasterBand, it utilizes a new mutex on the Band.
     105The objective of the Dataset RW Mutex is to protect the data stored within the the GDALRasterBlocks associated with a dataset, and lock during large Read or Write operations. This prevents two different threads from using memcpy on the same GDALRasterBlock at the same time. This mutex normally lies within the GDALDataset, but in the case of a standalone GDALRasterBand, it utilizes a new mutex on the Band.
    95106
    96107=== Band Mutex ===