Changes between Initial Version and Version 1 of rfc45_virtualmem


Ignore:
Timestamp:
Dec 17, 2013, 11:41:45 AM (10 years ago)
Author:
Even Rouault
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • rfc45_virtualmem

    v1 v1  
     1= RFC 45: GDAL datasets and raster bands as virtual memory objects =
     2
     3Authors: Even Rouault[[BR]]
     4Contact: even dot rouault at mines dash paris dot org[[BR]]
     5Status: Developement
     6
     7== Summary ==
     8
     9This document proposes additions to GDAL so that image data of GDAL datasets and
     10raster bands can be seen as virtual memory objects, for hopefully simpler usage.
     11
     12== Rationale ==
     13
     14When one wants to read or write image data from/into a GDAL dataset or raster
     15band, one must use the RasterIO() interface for the regions of interest that
     16are read or written. For small images, the most convenient solution is usually
     17to read/write the whole image in a single request where the region of interest
     18is the full raster extent. For larger images, particularly when they do not
     19fit entirely in RAM, this is not possible, and if one wants to operate on the
     20whole image, one must use a windowing strategy to avoid memory issues : typically
     21by proceeding scanline (or group of scanlines) by scanline, or by blocks for tiled
     22images. This can make the writing of algorithms more complicated when they need
     23to access a neighbourhoud of pixels around each pixel of interest, since the size of this
     24extra window must be taken into account, leading to overlapping regions of
     25interests. Nothing that cannot be solved, but that requires some additional
     26thinking that distracts from the followed main purpose.
     27
     28The proposed addition of this RFC is to make the image data appear as a single
     29array accessed with a pointer, without being limited by the size of RAM with
     30respect to the size of the dataset (excepted limitations imposed by the CPU
     31architecture and the operating system)
     32
     33=== Technical solution ===
     34
     35==== Low-level machinery : cpl_virtualmem.h ====
     36
     37The low-level machinery to support this new capability is a CPLVirtualMem object
     38that represents an area of virtual memory ( on Linux, an area of virtual memory
     39allocated by the mmap() function ). This virtual memory area is initially just
     40reserved in terms of virtual memory space, but has no actual allocation in
     41physical memory. This reserved virtual memory space is protected with an access
     42permission that cause any attempt to access it to result in an exception - a
     43page fault, that on POSIX systems triggers a SIGSEGV signal (segmentation fault).
     44Fortunately, segmentation faults can be caught by the software with a signal
     45handler. When such a segmentation fault occurs, our specialized signal handler
     46will check if it occurs in a virtual memory region under its responsibility and,
     47if so, it will proceed to fill the part (a "page") of the virtual memory area
     48that has been accessed with sensible values (thanks to a user provided callback).
     49It will then set appropriate permissions to the page (read-only or read-write),
     50before attempting again the instruction that triggered the segmentation fault.
     51From the point of view of the user code that accesses the memory mapping, this
     52is completely transparent, and this is equivalent as if the whole virtual memory
     53area had been filled from the start.
     54
     55For very large mappings that are larger than RAM, this would still cause disk
     56swapping to occur at a certain point. To avoid that, the segmentation fault
     57handler will evict the least recently used pages, once a threshold defined at the
     58creation of the CPLVirtualMem object has been reached.
     59
     60For write support, another callback can be passed. It will be called before a
     61page is evicted so that user code has a chance to flush its content to a more
     62persistent storage.
     63
     64==== High-level usage ====
     65
     66Four new API are introduced (detailed in further section):
     67
     68* GDALDatasetGetVirtualMem() : takes almost the same arguments as GDALDatasetRasterIO(), with the notable exception of a pData buffer. It returns a CPLVirtualMem* object, from which the base address of the virtual memory mapping can be obtained with CPLVirtualMemGetAddr().
     69
     70* GDALRasterBandGetVirtualMem(): equivalent of GDALDatasetGetVirtualMem() that operates on a raster band object rather than a dataset object.
     71
     72* GDALDatasetGetTiledVirtualMem(): this is a rather original API. Instead of presenting a 2D view of the image data (i.e. organized rows by rows), the mapping exposes it as an array of tiles, which is more suitable, performance wise, when the dataset is itself tiled.
     73
     74                            [ INSERT SCHEMA HERE ]
     75
     76  When they are several bands, 3 different organizations of band components are possible. To the best of our knowledge, there is no standard way of calling those organizations, which consequently will be best illustrated by the folowing schemas :
     77
     78                            [ INSERT SCHEMAS HERE ]
     79
     80* GDALRasterBandGetTiledVirtualMem(): equivalent of GDALDatasetGetTiledVirtualMem() that operates on a raster band object rather than a dataset object.
     81
     82== Details of new API ==
     83
     84=== Implemented by cpl_virtualmem.cpp ===
     85
     86{{{
     87
     88/**
     89 * \file cpl_virtualmem.h
     90 *
     91 * Virtual memory management.
     92 *
     93 * This file provides mechanism to define virtual memory mappings, whose content
     94 * is allocated transparently and filled on-the-fly. Those virtual memory mappings
     95 * can be much larger than the available RAM, but only parts of the virtual
     96 * memory mapping, in the limit of the allowed the cache size, will actually be
     97 * physically allocated.
     98 *
     99 * This exploits low-level mechanisms of the operating system (virtual memory
     100 * allocation, page protection and handler of virtual memory exceptions).
     101 *
     102 * The current implementation is Linux only.
     103 */
     104
     105/** Opaque type that represents a virtual memory mapping. */
     106typedef struct CPLVirtualMem CPLVirtualMem;
     107
     108/** Callback triggered when a still unmapped page of virtual memory is accessed.
     109  * The callback has the responsibility of filling the page with relevant values
     110  *
     111  * @param ctxt virtual memory handle.
     112  * @param nOffset offset of the page in the memory mapping.
     113  * @param pPageToFill address of the page to fill. Note that the address might
     114  *                    be a temporary location, and not at CPLVirtualMemGetAddr() + nOffset.
     115  * @param nToFill number of bytes of the page.
     116  * @param pUserData user data that was passed to CPLVirtualMemNew().
     117  */
     118typedef void (*CPLVirtualMemCachePageCbk)(CPLVirtualMem* ctxt,
     119                                    size_t nOffset,
     120                                    void* pPageToFill,
     121                                    size_t nToFill,
     122                                    void* pUserData);
     123
     124/** Callback triggered when a dirty mapped page is going to be freed.
     125  * (saturation of cache, or termination of the virtual memory mapping).
     126  *
     127  * @param ctxt virtual memory handle.
     128  * @param nOffset offset of the page in the memory mapping.
     129  * @param pPageToBeEvicted address of the page that will be flushed. Note that the address might
     130  *                    be a temporary location, and not at CPLVirtualMemGetAddr() + nOffset.
     131  * @param nToBeEvicted number of bytes of the page.
     132  * @param pUserData user data that was passed to CPLVirtualMemNew().
     133  */
     134typedef void (*CPLVirtualMemUnCachePageCbk)(CPLVirtualMem* ctxt,
     135                                      size_t nOffset,
     136                                      const void* pPageToBeEvicted,
     137                                      size_t nToBeEvicted,
     138                                      void* pUserData);
     139
     140typedef void (*CPLVirtualMemFreeUserData)(void* pUserData);
     141
     142/** Access mode of a virtual memory mapping. */
     143typedef enum
     144{
     145    /*! The mapping is meant at being read-only, but writes will not be prevented.
     146        Note that any content written will be lost. */
     147    VIRTUALMEM_READONLY,
     148    /*! The mapping is meant at being read-only, and this will be enforced
     149        through the operating system page protection mechanism. */
     150    VIRTUALMEM_READONLY_ENFORCED,
     151    /*! The mapping is meant at being read-write, and modified pages can be saved
     152        thanks to the pfnUnCachePage callback */
     153    VIRTUALMEM_READWRITE
     154} CPLVirtualMemAccessMode;
     155
     156
     157/** Return the size of a page of virtual memory.
     158 *
     159 * @return the page size.
     160 *
     161 * @since GDAL 2.0
     162 */
     163size_t CPL_DLL CPLGetPageSize(void);
     164
     165/** Create a new virtual memory mapping.
     166 *
     167 * This will reserve an area of virtual memory of size nSize, whose size
     168 * might be potentially much larger than the physical memory available. Initially,
     169 * no physical memory will be allocated. As soon as memory pages will be accessed,
     170 * they will be allocated transparently and filled with the pfnCachePage callback.
     171 * When the allowed cache size is reached, the least recently used pages will
     172 * be unallocated.
     173 *
     174 * On Linux AMD64 platforms, the maximum value for nSize is 128 TB.
     175 * On Linux x86 platforms, the maximum value for nSize is 2 GB.
     176 *
     177 * Only supported on Linux for now.
     178 *
     179 * Note that on Linux, this function will install a SIGSEGV handler. The
     180 * original handler will be restored by CPLVirtualMemManagerTerminate().
     181 *
     182 * @param nSize size in bytes of the virtual memory mapping.
     183 * @param nCacheSize   size in bytes of the maximum memory that will be really
     184 *                     allocated (must ideally fit into RAM).
     185 * @param nPageSizeHint hint for the page size. Must be a multiple of the
     186 *                      system page size, returned by CPLGetPageSize().
     187 *                      Minimum value is generally 4096. Might be set to 0 to
     188 *                      let the function determine a default page size.
     189 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads
     190 *                           that will access the virtual memory mapping. This can
     191 *                           optimize performance a bit.
     192 * @param eAccessMode permission to use for the virtual memory mapping.
     193 * @param pfnCachePage callback triggered when a still unmapped page of virtual
     194 *                     memory is accessed. The callback has the responsibility
     195 *                     of filling the page with relevant values.
     196 * @param pfnUnCachePage callback triggered when a dirty mapped page is going to
     197 *                       be freed (saturation of cache, or termination of the
     198 *                       virtual memory mapping). Might be NULL.
     199 * @param pfnFreeUserData callback that can be used to free pCbkUserData. Might be
     200 *                        NULL
     201 * @param pCbkUserData user data passed to pfnCachePage and pfnUnCachePage.
     202 *
     203 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     204 *         or NULL in case of failure.
     205 *
     206 * @since GDAL 2.0
     207 */
     208
     209CPLVirtualMem CPL_DLL *CPLVirtualMemNew(size_t nSize,
     210                                        size_t nCacheSize,
     211                                        size_t nPageSizeHint,
     212                                        int bSingleThreadUsage,
     213                                        CPLVirtualMemAccessMode eAccessMode,
     214                                        CPLVirtualMemCachePageCbk pfnCachePage,
     215                                        CPLVirtualMemUnCachePageCbk pfnUnCachePage,
     216                                        CPLVirtualMemFreeUserData pfnFreeUserData,
     217                                        void *pCbkUserData);
     218
     219/** Free a virtual memory mapping.
     220 *
     221 * The pointer returned by CPLVirtualMemGetAddr() will no longer be valid.
     222 * If the virtual memory mapping was created with read/write permissions and that
     223 * they are dirty (i.e. modified) pages, they will be flushed through the
     224 * pfnUnCachePage callback before being freed.
     225 *
     226 * @param ctxt context returned by CPLVirtualMemNew().
     227 *
     228 * @since GDAL 2.0
     229 */
     230void CPL_DLL CPLVirtualMemFree(CPLVirtualMem* ctxt);
     231
     232/** Return the pointer to the start of a virtual memory mapping.
     233 *
     234 * The bytes in the range [p:p+CPLVirtualMemGetSize()-1] where p is the pointer
     235 * returned by this function will be valid, until CPLVirtualMemFree() is called.
     236 *
     237 * Note that if a range of bytes used as an argument of a system call
     238 * (such as read() or write()) contains pages that have not been "realized", the
     239 * system call will fail with EFAULT. CPLVirtualMemPin() can be used to work
     240 * around this issue.
     241 *
     242 * @param ctxt context returned by CPLVirtualMemNew().
     243 * @return the pointer to the start of a virtual memory mapping.
     244 *
     245 * @since GDAL 2.0
     246 */
     247void CPL_DLL *CPLVirtualMemGetAddr(CPLVirtualMem* ctxt);
     248
     249/** Return the size of the virtual memory mapping.
     250 *
     251 * @param ctxt context returned by CPLVirtualMemNew().
     252 * @return the size of the virtual memory mapping.
     253 *
     254 * @since GDAL 2.0
     255 */
     256size_t CPL_DLL CPLVirtualMemGetSize(CPLVirtualMem* ctxt);
     257
     258/** Return the page size associated to a virtual memory mapping.
     259 *
     260 * The value returned will be at least CPLGetPageSize(), but potentially
     261 * larger.
     262 *
     263 * @param ctxt context returned by CPLVirtualMemNew().
     264 * @return the page size
     265 *
     266 * @since GDAL 2.0
     267 */
     268size_t CPL_DLL CPLVirtualMemGetPageSize(CPLVirtualMem* ctxt);
     269
     270/** Return TRUE if this memory mapping can be accessed safely from concurrent
     271 *  threads.
     272 *
     273 * The situation that can cause problems is when several threads try to access
     274 * a page of the mapping that is not yet mapped.
     275 *
     276 * The return value of this function depends on whether bSingleThreadUsage has
     277 * been set of not in CPLVirtualMemNew() and/or the implementation.
     278 *
     279 * On Linux, this will always return TRUE if bSingleThreadUsage = FALSE.
     280 *
     281 * @param ctxt context returned by CPLVirtualMemNew().
     282 * @return TRUE if this memory mapping can be accessed safely from concurrent
     283 *         threads.
     284 *
     285 * @since GDAL 2.0
     286 */
     287int CPL_DLL CPLVirtualMemIsAccessThreadSafe(CPLVirtualMem* ctxt);
     288
     289/** Declare that a thread will access a virtual memory mapping.
     290 *
     291 * This function must be called by a thread that wants to access the
     292 * content of a virtual memory mapping, except if the virtual memory mapping has
     293 * been created with bSingleThreadUsage = TRUE.
     294 *
     295 * This function must be paired with CPLVirtualMemUnDeclareThread().
     296 *
     297 * @param ctxt context returned by CPLVirtualMemNew().
     298 *
     299 * @since GDAL 2.0
     300 */
     301void CPL_DLL CPLVirtualMemDeclareThread(CPLVirtualMem* ctxt);
     302
     303/** Declare that a thread will stop accessing a virtual memory mapping.
     304 *
     305 * This function must be called by a thread that will no longer access the
     306 * content of a virtual memory mapping, except if the virtual memory mapping has
     307 * been created with bSingleThreadUsage = TRUE.
     308 *
     309 * This function must be paired with CPLVirtualMemDeclareThread().
     310 *
     311 * @param ctxt context returned by CPLVirtualMemNew().
     312 *
     313 * @since GDAL 2.0
     314 */
     315void CPL_DLL CPLVirtualMemUnDeclareThread(CPLVirtualMem* ctxt);
     316
     317/** Make sure that a region of virtual memory will be realized.
     318 *
     319 * Calling this function is not required, but might be usefull when debugging
     320 * a process with tools like gdb or valgrind that do not naturally like
     321 * segmentation fault signals.
     322 *
     323 * It is also needed when wanting to provide part of virtual memory mapping
     324 * to a system call such as read() or write(). If read() or write() is called
     325 * on a memory region not yet realized, the call will fail with EFAULT.
     326 *
     327 * @param ctxt context returned by CPLVirtualMemNew().
     328 * @param pAddr the memory region to pin.
     329 * @param nSize the size of the memory region.
     330 * @param bWriteOp set to TRUE if the memory are will be accessed in write mode.
     331 *
     332 * @since GDAL 2.0
     333 */
     334void CPL_DLL CPLVirtualMemPin(CPLVirtualMem* ctxt,
     335                              void* pAddr, size_t nSize, int bWriteOp);
     336
     337/** Cleanup any resource and handlers related to virtual memory.
     338 *
     339 * This function must be called after the last CPLVirtualMem object has
     340 * been freed.
     341 *
     342 * @since GDAL 2.0
     343 */
     344void CPL_DLL CPLVirtualMemManagerTerminate(void);
     345
     346}}}
     347
     348=== Implemented by gdalvirtualmem.cpp ===
     349
     350{{{
     351
     352
     353/** Create a CPLVirtualMem object from a GDAL dataset object.
     354 *
     355 * Only supported on Linux for now.
     356 *
     357 * This method allows creating a virtual memory object for a region of one
     358 * or more GDALRasterBands from  this dataset. The content of the virtual
     359 * memory object is automatically filled from dataset content when a virtual
     360 * memory page is first accessed, and it is released (or flushed in case of a
     361 * "dirty" page) when the cache size limit has been reached.
     362 *
     363 * The pointer to access the virtual memory object is obtained with
     364 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     365 *
     366 * If p is such a pointer and base_type the C type matching eBufType, for default
     367 * values of spacing parameters, the element of image coordinates (x, y)
     368 * (relative to xOff, yOff) for band b can be accessed with
     369 * ((base_type*)p)[x + y * nBufXSize + (b-1)*nBufXSize*nBufYSize].
     370 *
     371 * Note that the mechanism used to transparently fill memory pages when they are
     372 * accessed is the same (but in a controlled way) than what occurs when a memory
     373 * error occurs in a program. Debugging software will generally interrupt program
     374 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid
     375 * that by ensuring memory pages are allocated before being accessed.
     376 *
     377 * The size of the region that can be mapped as a virtual memory object depends
     378 * on hardware and operating system limitations.
     379 * On Linux AMD64 platforms, the maximum value is 128 TB.
     380 * On Linux x86 platforms, the maximum value is 2 GB.
     381 *
     382 * Data type translation is automatically done if the data type
     383 * (eBufType) of the buffer is different than
     384 * that of the GDALRasterBand.
     385 *
     386 * Image decimation / replication is currently not supported, i.e. if the
     387 * size of the region being accessed (nXSize x nYSize) is different from the
     388 * buffer size (nBufXSize x nBufYSize).
     389 *
     390 * The nPixelSpace, nLineSpace and nBandSpace parameters allow reading into or
     391 * writing from various organization of buffers. Arbitrary values for the spacing
     392 * parameters are not supported. Those values must be multiple of the size of the
     393 * buffer data type, and must be either band sequential organization (typically
     394 * nPixelSpace = GDALGetDataTypeSize(eBufType) / 8, nLineSpace = nPixelSpace * nBufXSize,
     395 * nBandSpace = nLineSpace * nBufYSize), or pixel-interleaved organization
     396 * (typically nPixelSpace = nBandSpace * nBandCount, nLineSpace = nPixelSpace * nBufXSize,
     397 * nBandSpace = GDALGetDataTypeSize(eBufType) / 8)
     398 *
     399 * @param hDS Dataset object
     400 *
     401 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to
     402 * write a region of data.
     403 *
     404 * @param nXOff The pixel offset to the top left corner of the region
     405 * of the band to be accessed.  This would be zero to start from the left side.
     406 *
     407 * @param nYOff The line offset to the top left corner of the region
     408 * of the band to be accessed.  This would be zero to start from the top.
     409 *
     410 * @param nXSize The width of the region of the band to be accessed in pixels.
     411 *
     412 * @param nYSize The height of the region of the band to be accessed in lines.
     413 *
     414 * @param nBufXSize the width of the buffer image into which the desired region
     415 * is to be read, or from which it is to be written.
     416 *
     417 * @param nBufYSize the height of the buffer image into which the desired
     418 * region is to be read, or from which it is to be written.
     419 *
     420 * @param eBufType the type of the pixel values in the data buffer. The
     421 * pixel values will automatically be translated to/from the GDALRasterBand
     422 * data type as needed.
     423 *
     424 * @param nBandCount the number of bands being read or written.
     425 *
     426 * @param panBandMap the list of nBandCount band numbers being read/written.
     427 * Note band numbers are 1 based. This may be NULL to select the first
     428 * nBandCount bands.
     429 *
     430 * @param nPixelSpace The byte offset from the start of one pixel value in
     431 * the buffer to the start of the next pixel value within a scanline. If defaulted
     432 * (0) the size of the datatype eBufType is used.
     433 *
     434 * @param nLineSpace The byte offset from the start of one scanline in
     435 * the buffer to the start of the next. If defaulted (0) the size of the datatype
     436 * eBufType * nBufXSize is used.
     437 *
     438 * @param nBandSpace the byte offset from the start of one bands data to the
     439 * start of the next. If defaulted (0) the value will be
     440 * nLineSpace * nBufYSize implying band sequential organization
     441 * of the data buffer.
     442 *
     443 * @param nCacheSize   size in bytes of the maximum memory that will be really
     444 *                     allocated (must ideally fit into RAM)
     445 *
     446 * @param nPageSizeHint hint for the page size. Must be a multiple of the
     447 *                      system page size, returned by CPLGetPageSize().
     448 *                      Minimum value is generally 4096. Might be set to 0 to
     449 *                      let the function determine a default page size.
     450 *
     451 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads
     452 *                           that will access the virtual memory mapping. This can
     453 *                           optimize performance a bit. If set to FALSE,
     454 *                           CPLVirtualMemDeclareThread() must be called.
     455 *
     456 * @param papszOptions NULL terminated list of options. Unused for now.
     457 *
     458 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     459 *         or NULL in case of failure.
     460 *
     461 * @since GDAL 2.0
     462 */
     463
     464CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMem( GDALDatasetH hDS,
     465                                         GDALRWFlag eRWFlag,
     466                                         int nXOff, int nYOff,
     467                                         int nXSize, int nYSize,
     468                                         int nBufXSize, int nBufYSize,
     469                                         GDALDataType eBufType,
     470                                         int nBandCount, int* panBandMap,
     471                                         int nPixelSpace,
     472                                         GIntBig nLineSpace,
     473                                         GIntBig nBandSpace,
     474                                         size_t nCacheSize,
     475                                         size_t nPageSizeHint,
     476                                         int bSingleThreadUsage,
     477                                         char **papszOptions );
     478
     479
     480
     481/** Create a CPLVirtualMem object from a GDAL dataset object.
     482 *
     483 * Only supported on Linux for now.
     484 *
     485 * This method allows creating a virtual memory object for a region of a
     486 * GDALRasterBand. The content of the virtual
     487 * memory object is automatically filled from dataset content when a virtual
     488 * memory page is first accessed, and it is released (or flushed in case of a
     489 * "dirty" page) when the cache size limit has been reached.
     490 *
     491 * The pointer to access the virtual memory object is obtained with
     492 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     493 *
     494 * If p is such a pointer and base_type the C type matching eBufType, for default
     495 * values of spacing parameters, the element of image coordinates (x, y)
     496 * (relative to xOff, yOff) can be accessed with
     497 * ((base_type*)p)[x + y * nBufXSize].
     498 *
     499 * Note that the mechanism used to transparently fill memory pages when they are
     500 * accessed is the same (but in a controlled way) than what occurs when a memory
     501 * error occurs in a program. Debugging software will generally interrupt program
     502 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid
     503 * that by ensuring memory pages are allocated before being accessed.
     504 *
     505 * The size of the region that can be mapped as a virtual memory object depends
     506 * on hardware and operating system limitations.
     507 * On Linux AMD64 platforms, the maximum value is 128 TB.
     508 * On Linux x86 platforms, the maximum value is 2 GB.
     509 *
     510 * Data type translation is automatically done if the data type
     511 * (eBufType) of the buffer is different than
     512 * that of the GDALRasterBand.
     513 *
     514 * Image decimation / replication is currently not supported, i.e. if the
     515 * size of the region being accessed (nXSize x nYSize) is different from the
     516 * buffer size (nBufXSize x nBufYSize).
     517 *
     518 * The nPixelSpace and nLineSpace parameters allow reading into or
     519 * writing from various organization of buffers. Arbitrary values for the spacing
     520 * parameters are not supported. Those values must be multiple of the size of the
     521 * buffer data type and must be such that nLineSpace >= nPixelSpace * nBufXSize.
     522 *
     523 * @param hBand Rasterband object
     524 *
     525 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to
     526 * write a region of data.
     527 *
     528 * @param nXOff The pixel offset to the top left corner of the region
     529 * of the band to be accessed.  This would be zero to start from the left side.
     530 *
     531 * @param nYOff The line offset to the top left corner of the region
     532 * of the band to be accessed.  This would be zero to start from the top.
     533 *
     534 * @param nXSize The width of the region of the band to be accessed in pixels.
     535 *
     536 * @param nYSize The height of the region of the band to be accessed in lines.
     537 *
     538 * @param nBufXSize the width of the buffer image into which the desired region
     539 * is to be read, or from which it is to be written.
     540 *
     541 * @param nBufYSize the height of the buffer image into which the desired
     542 * region is to be read, or from which it is to be written.
     543 *
     544 * @param eBufType the type of the pixel values in the data buffer. The
     545 * pixel values will automatically be translated to/from the GDALRasterBand
     546 * data type as needed.
     547 *
     548 * @param nPixelSpace The byte offset from the start of one pixel value in
     549 * the buffer to the start of the next pixel value within a scanline. If defaulted
     550 * (0) the size of the datatype eBufType is used.
     551 *
     552 * @param nLineSpace The byte offset from the start of one scanline in
     553 * the buffer to the start of the next. If defaulted (0) the size of the datatype
     554 * eBufType * nBufXSize is used.
     555 *
     556 * @param nCacheSize   size in bytes of the maximum memory that will be really
     557 *                     allocated (must ideally fit into RAM)
     558 *
     559 * @param nPageSizeHint hint for the page size. Must be a multiple of the
     560 *                      system page size, returned by CPLGetPageSize().
     561 *                      Minimum value is generally 4096. Might be set to 0 to
     562 *                      let the function determine a default page size.
     563 *
     564 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads
     565 *                           that will access the virtual memory mapping. This can
     566 *                           optimize performance a bit. If set to FALSE,
     567 *                           CPLVirtualMemDeclareThread() must be called.
     568 *
     569 * @param papszOptions NULL terminated list of options. Unused for now.
     570 *
     571 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     572 *         or NULL in case of failure.
     573 *
     574 * @since GDAL 2.0
     575 */
     576
     577typedef enum
     578{
     579    /*! Tile Interleaved by Pixel: tile (0,0) with internal band interleaved
     580        by pixel organization, tile (1, 0), ...  */
     581    GTO_TIP,
     582    /*! Band Interleaved by Tile : tile (0,0) of first band, tile (0,0) of second
     583        band, ... tile (1,0) of fisrt band, tile (1,0) of second band, ... */
     584    GTO_BIT,
     585    /*! Band SeQuential : all the tiles of first band, all the tiles of following band... */
     586    GTO_BSQ
     587} GDALTileOrganization;
     588
     589CPLVirtualMem CPL_DLL* GDALRasterBandGetVirtualMem( GDALRasterBandH hBand,
     590                                         GDALRWFlag eRWFlag,
     591                                         int nXOff, int nYOff,
     592                                         int nXSize, int nYSize,
     593                                         int nBufXSize, int nBufYSize,
     594                                         GDALDataType eBufType,
     595                                         int nPixelSpace,
     596                                         GIntBig nLineSpace,
     597                                         size_t nCacheSize,
     598                                         size_t nPageSizeHint,
     599                                         int bSingleThreadUsage,
     600                                         char **papszOptions );
     601
     602
     603/** Create a CPLVirtualMem object from a GDAL dataset object, with tiling
     604 * organization
     605 *
     606 * Only supported on Linux for now.
     607 *
     608 * This method allows creating a virtual memory object for a region of one
     609 * or more GDALRasterBands from  this dataset. The content of the virtual
     610 * memory object is automatically filled from dataset content when a virtual
     611 * memory page is first accessed, and it is released (or flushed in case of a
     612 * "dirty" page) when the cache size limit has been reached.
     613 *
     614 * Contrary to GDALDatasetGetVirtualMem(), pixels will be organized by tiles
     615 * instead of scanlines. Different ways of organizing pixel within/accross tiles
     616 * can be selected with the eTileOrganization parameter.
     617 *
     618 * If nXSize is not a multiple of nTileXSize or nYSize is not a multiple of
     619 * nTileYSize, partial tiles will exists at the right and/or bottom of the region
     620 * of interest. Those partial tiles will also have nTileXSize * nTileYSize dimension,
     621 * with padding pixels.
     622 *
     623 * The pointer to access the virtual memory object is obtained with
     624 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     625 *
     626 * If p is such a pointer and base_type the C type matching eBufType, for default
     627 * values of spacing parameters, the element of image coordinates (x, y)
     628 * (relative to xOff, yOff) for band b can be accessed with :
     629 *  - for eTileOrganization = GTO_TIP, ((base_type*)p)[tile_number(x,y)*nBandCount*tile_size + offset_in_tile(x,y)*nBandCount + (b-1)].
     630 *  - for eTileOrganization = GTO_BIT, ((base_type*)p)[(tile_number(x,y)*nBandCount + (b-1)) * tile_size + offset_in_tile(x,y)].
     631 *  - for eTileOrganization = GTO_BSQ, ((base_type*)p)[(tile_number(x,y) + (b-1)*nTilesCount) * tile_size + offset_in_tile(x,y)].
     632 *
     633 * where nTilesPerRow = ceil(nXSize / nTileXSize)
     634 *       nTilesPerCol = ceil(nYSize / nTileYSize)
     635 *       nTilesCount = nTilesPerRow * nTilesPerCol
     636 *       tile_number(x,y) = (y / nTileYSize) * nTilesPerRow + (x / nTileXSize)
     637 *       offset_in_tile(x,y) = (y % nTileYSize) * nTileXSize  + (x % nTileXSize)
     638 *       tile_size = nTileXSize * nTileYSize
     639 *
     640 * Note that for a single band request, all tile organizations are equivalent.
     641 *
     642 * Note that the mechanism used to transparently fill memory pages when they are
     643 * accessed is the same (but in a controlled way) than what occurs when a memory
     644 * error occurs in a program. Debugging software will generally interrupt program
     645 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid
     646 * that by ensuring memory pages are allocated before being accessed.
     647 *
     648 * The size of the region that can be mapped as a virtual memory object depends
     649 * on hardware and operating system limitations.
     650 * On Linux AMD64 platforms, the maximum value is 128 TB.
     651 * On Linux x86 platforms, the maximum value is 2 GB.
     652 *
     653 * Data type translation is automatically done if the data type
     654 * (eBufType) of the buffer is different than
     655 * that of the GDALRasterBand.
     656 *
     657 * @param hDS Dataset object
     658 *
     659 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to
     660 * write a region of data.
     661 *
     662 * @param nXOff The pixel offset to the top left corner of the region
     663 * of the band to be accessed.  This would be zero to start from the left side.
     664 *
     665 * @param nYOff The line offset to the top left corner of the region
     666 * of the band to be accessed.  This would be zero to start from the top.
     667 *
     668 * @param nXSize The width of the region of the band to be accessed in pixels.
     669 *
     670 * @param nYSize The height of the region of the band to be accessed in lines.
     671 *
     672 * @param nTileXSize the width of the tiles.
     673 *
     674 * @param nTileYSize the height of the tiles.
     675 *
     676 * @param eBufType the type of the pixel values in the data buffer. The
     677 * pixel values will automatically be translated to/from the GDALRasterBand
     678 * data type as needed.
     679 *
     680 * @param nBandCount the number of bands being read or written.
     681 *
     682 * @param panBandMap the list of nBandCount band numbers being read/written.
     683 * Note band numbers are 1 based. This may be NULL to select the first
     684 * nBandCount bands.
     685 *
     686 * @param eTileOrganization tile organization.
     687 *
     688 * @param nCacheSize   size in bytes of the maximum memory that will be really
     689 *                     allocated (must ideally fit into RAM)
     690 *
     691 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads
     692 *                           that will access the virtual memory mapping. This can
     693 *                           optimize performance a bit. If set to FALSE,
     694 *                           CPLVirtualMemDeclareThread() must be called.
     695 *
     696 * @param papszOptions NULL terminated list of options. Unused for now.
     697 *
     698 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     699 *         or NULL in case of failure.
     700 *
     701 * @since GDAL 2.0
     702 */
     703
     704CPLVirtualMem CPL_DLL* GDALDatasetGetTiledVirtualMem( GDALDatasetH hDS,
     705                                              GDALRWFlag eRWFlag,
     706                                              int nXOff, int nYOff,
     707                                              int nXSize, int nYSize,
     708                                              int nTileXSize, int nTileYSize,
     709                                              GDALDataType eBufType,
     710                                              int nBandCount, int* panBandMap,
     711                                              GDALTileOrganization eTileOrganization,
     712                                              size_t nCacheSize,
     713                                              int bSingleThreadUsage,
     714                                              char **papszOptions );
     715
     716
     717/** Create a CPLVirtualMem object from a GDAL rasterband object, with tiling
     718 * organization
     719 *
     720 * Only supported on Linux for now.
     721 *
     722 * This method allows creating a virtual memory object for a region of one
     723 * GDALRasterBand. The content of the virtual
     724 * memory object is automatically filled from dataset content when a virtual
     725 * memory page is first accessed, and it is released (or flushed in case of a
     726 * "dirty" page) when the cache size limit has been reached.
     727 *
     728 * Contrary to GDALDatasetGetVirtualMem(), pixels will be organized by tiles
     729 * instead of scanlines.
     730 *
     731 * If nXSize is not a multiple of nTileXSize or nYSize is not a multiple of
     732 * nTileYSize, partial tiles will exists at the right and/or bottom of the region
     733 * of interest. Those partial tiles will also have nTileXSize * nTileYSize dimension,
     734 * with padding pixels.
     735 *
     736 * The pointer to access the virtual memory object is obtained with
     737 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     738 *
     739 * If p is such a pointer and base_type the C type matching eBufType, for default
     740 * values of spacing parameters, the element of image coordinates (x, y)
     741 * (relative to xOff, yOff) can be accessed with :
     742 *  ((base_type*)p)[tile_number(x,y)*tile_size + offset_in_tile(x,y)].
     743 *
     744 * where nTilesPerRow = ceil(nXSize / nTileXSize)
     745 *       nTilesCount = nTilesPerRow * nTilesPerCol
     746 *       tile_number(x,y) = (y / nTileYSize) * nTilesPerRow + (x / nTileXSize)
     747 *       offset_in_tile(x,y) = (y % nTileYSize) * nTileXSize  + (x % nTileXSize)
     748 *       tile_size = nTileXSize * nTileYSize
     749 *
     750 * Note that the mechanism used to transparently fill memory pages when they are
     751 * accessed is the same (but in a controlled way) than what occurs when a memory
     752 * error occurs in a program. Debugging software will generally interrupt program
     753 * execution when that happens. If needed, CPLVirtualMemPin() can be used to avoid
     754 * that by ensuring memory pages are allocated before being accessed.
     755 *
     756 * The size of the region that can be mapped as a virtual memory object depends
     757 * on hardware and operating system limitations.
     758 * On Linux AMD64 platforms, the maximum value is 128 TB.
     759 * On Linux x86 platforms, the maximum value is 2 GB.
     760 *
     761 * Data type translation is automatically done if the data type
     762 * (eBufType) of the buffer is different than
     763 * that of the GDALRasterBand.
     764 *
     765 * @param hBand Rasterband object
     766 *
     767 * @param eRWFlag Either GF_Read to read a region of data, or GF_Write to
     768 * write a region of data.
     769 *
     770 * @param nXOff The pixel offset to the top left corner of the region
     771 * of the band to be accessed.  This would be zero to start from the left side.
     772 *
     773 * @param nYOff The line offset to the top left corner of the region
     774 * of the band to be accessed.  This would be zero to start from the top.
     775 *
     776 * @param nXSize The width of the region of the band to be accessed in pixels.
     777 *
     778 * @param nYSize The height of the region of the band to be accessed in lines.
     779 *
     780 * @param nTileXSize the width of the tiles.
     781 *
     782 * @param nTileYSize the height of the tiles.
     783 *
     784 * @param eBufType the type of the pixel values in the data buffer. The
     785 * pixel values will automatically be translated to/from the GDALRasterBand
     786 * data type as needed.
     787 *
     788 * @param nCacheSize   size in bytes of the maximum memory that will be really
     789 *                     allocated (must ideally fit into RAM)
     790 *
     791 * @param bSingleThreadUsage set to TRUE if there will be no concurrent threads
     792 *                           that will access the virtual memory mapping. This can
     793 *                           optimize performance a bit. If set to FALSE,
     794 *                           CPLVirtualMemDeclareThread() must be called.
     795 *
     796 * @param papszOptions NULL terminated list of options. Unused for now.
     797 *
     798 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     799 *         or NULL in case of failure.
     800 *
     801 * @since GDAL 2.0
     802 */
     803
     804CPLVirtualMem CPL_DLL* GDALRasterBandGetTiledVirtualMem( GDALRasterBandH hBand,
     805                                              GDALRWFlag eRWFlag,
     806                                              int nXOff, int nYOff,
     807                                              int nXSize, int nYSize,
     808                                              int nTileXSize, int nTileYSize,
     809                                              GDALDataType eBufType,
     810                                              size_t nCacheSize,
     811                                              int bSingleThreadUsage,
     812                                              char **papszOptions );
     813}}}
     814
     815== Portability ==
     816
     817The CPLVirtualMem low-level machinery is only implemented for Linux now. It
     818assumes that returning from a SIGSEGV handler is possible, which is a blatant
     819violation of POSIX, but in practice it seems that most POSIX (and non
     820POSIX such as Windows) systems should be able to resume execution after a
     821segmentation fault.
     822
     823Porting to other POSIX operating systems such as MacOSX should be doable with moderate
     824effort. Windows has API that offer similar capabilities as
     825POSIX API with VirtualAlloc(), VirtualProtect() and SetUnhandledExceptionFilter(),
     826although the porting would undoubtly require more effort.
     827
     828The existence of [http://www.gnu.org/software/libsigsegv libsigsegv] that run on
     829various OS is an evidence on its capacity of being ported to other platforms.
     830
     831The trickiest part is ensuring that things will work reliably when two concurrent
     832threads that try to access the same initally unmapped page. Without special care, one
     833thread could manage to access the page that is being filled by the other thread,
     834before it is completely filled. On Linux this can be easily avoided with the
     835mremap() call. When a page is filled, we don't actually pass the target page to
     836the user callback, but a temporary page. When the callback has finished its job,
     837this temporary page is mremap()'ed to its target location, which is an atomic
     838operation. An alternative implementation for POSIX systems that don't have this
     839mremap() call has been tested : any declared threads that can access the memory
     840mapping are paused before the temporary page is memcpy'ed to its target location,
     841and are resumed afterwards. This requires threads to priorly declare their
     842"interest" for a memory mapping with CPLVirtualMemDeclareThread(). Pausing a
     843thread is interestingly non-obvious : the solution found to do so is to
     844send it a SIGUSR1 signal and make it wait in a signal handler for this SIGUSR1
     845signal... It has not been investigated if/how this could be done on Windows.
     846CPLVirtualMemIsAccessThreadSafe() has been introduced for that purpose.
     847
     848== Performance ==
     849
     850No miraculous performance gain should be expected from this new capability,
     851when compared to code that carefully uses GDALRasterIO(). Handling
     852segmentation faults has a cost ( the operating system catches a hardware
     853exception, then calls the user program segmentation fault handler, which does
     854the normal GDAL I/O operations, and plays with page mappings and permissions
     855which invalidate some CPU caches, etc... ). However, when a page has been realized,
     856access to it should be really fast, so with appropriate access patterns and
     857cache size, good performance should be expected.
     858
     859It should also be noted that in the current implementation, the realization of
     860pages is done in a serialized way, that is to say if 2 threads which use 2 different
     861memory mappings cause a segmentation fault at the same time, they will not be
     862dealt by 2 different threads, but one after the other one.
     863
     864== Limitations ==
     865
     866The maximum size of the virtual memory space (and thus a virtual memory mapping)
     867depends on the CPU architecture and OS limitations :
     868  * on Linux AMD64, 128 TB.
     869  * on Linux x86, 2 GB.
     870  * On Windows AMD64 (unsupported by the current implementation), 8 TB.
     871  * On Windows x86 (unsupported by the current implementation), 2 GB.
     872
     873Clearly, the main interest of this new functionality is for AMD64 platforms.
     874
     875On a Linux AMD64 machine with 4 GB RAM, the Python binding of
     876GDALDatasetGetTiledVirtualMem() has been successfully used to access random points
     877on the new [http://www.eea.europa.eu/data-and-maps/data/eu-dem/ Europe 3'' DEM dataset],
     878which is a 20 GB compressed GeoTIFF ( and 288000 * 180000 * 4 = 193 GB uncompressed )
     879
     880== Related thoughts ==
     881
     882With an uncompressed GeoTIFF file (where strips or tiles are sequentially written
     883to disk), GDALDatasetGetVirtualMem() and GDALDatasetGetTiledVirtualMem(), with
     884appropriate input parameters, could potentially just mmap() the file itself, which
     885would save any GDAL overhead. It is no clear however how old accessed pages can
     886be evicted from RAM since Linux does not seem to discard them, which tend to cause
     887undesirable disk swapping when the memory mapping is bigger than RAM.
     888
     889Some issues with system calls such as read() or write(), or easier multi-threading
     890could potentially be solved by making a FUSE (File system in USEr space) driver that
     891would expose a GDAL dataset as a file, and the mmap()'ing the file itself. The
     892issue raised in the previous paragraph would still apply. Plus the fact that
     893FUSE drivers are only available on POSIX OS, and need root priviledge to be
     894mounted (a FUSE filesystem does not need root priviledge to run, but the mounting
     895operation does).
     896
     897== Open questions ==
     898
     899Due to the fact that it currently only works on Linux, should we mark the API
     900as experimental for now ?
     901
     902== Backward compatibility issues ==
     903
     904None: new API.
     905
     906== SWIG bindings ==
     907
     908The high level API (dataset and raster band) API is available in Python bindings.
     909
     910GDALDatasetGetVirtualMem() is mapped as Dataset.GetVirtualArray(), which
     911returns a NumPy array.
     912
     913{{{
     914    def GetVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0,
     915                           xsize=None, ysize=None, bufxsize=None, bufysize=None,
     916                           datatype = None, band_list = None, band_sequential = True,
     917                           cache_size = 10 * 1024 * 1024, page_size_hint = 0):
     918        """Return a NumPy array for the dataset, seen as a virtual memory mapping.
     919           If there are several bands and band_sequential = True, an element is
     920           accessed with array[band][y][x].
     921           If there are several bands and band_sequential = False, an element is
     922           accessed with array[y][x][band].
     923           If there is only one band, an element is accessed with array[y][x].
     924        """
     925}}}
     926
     927Similarly for GDALDatasetGetTiledVirtualMem() :
     928
     929{{{
     930    def GetTiledVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0,
     931                           xsize=None, ysize=None, tilexsize=256, tileysize=256,
     932                           datatype = None, band_list = None, tile_organization = gdalconst.GTO_BSQ,
     933                           cache_size = 10 * 1024 * 1024):
     934        """Return a NumPy array for the dataset, seen as a virtual memory mapping with
     935           a tile organization.
     936           If there are several bands and tile_organization = gdal.GTO_BIP, an element is
     937           accessed with array[tiley][tilex][y][x][band].
     938           If there are several bands and tile_organization = gdal.GTO_BTI, an element is
     939           accessed with array[tiley][tilex][band][y][x].
     940           If there are several bands and tile_organization = gdal.GTO_BSQ, an element is
     941           accessed with array[band][tiley][tilex][y][x].
     942           If there is only one band, an element is accessed with array[tiley][tilex][y][x].
     943        """
     944}}}
     945
     946And the Band object has the following 2 methods :
     947
     948{{{
     949  def GetVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0,
     950                         xsize=None, ysize=None, bufxsize=None, bufysize=None,
     951                         datatype = None,
     952                         cache_size = 10 * 1024 * 1024, page_size_hint = 0):
     953        """Return a NumPy array for the band, seen as a virtual memory mapping.
     954           An element is accessed with array[y][x].
     955        """
     956
     957  def GetTiledVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0,
     958                           xsize=None, ysize=None, tilexsize=256, tileysize=256,
     959                           datatype = None,
     960                           cache_size = 10 * 1024 * 1024):
     961        """Return a NumPy array for the band, seen as a virtual memory mapping with
     962           a tile organization.
     963           An element is accessed with array[tiley][tilex][y][x].
     964        """
     965}}}
     966
     967Note: dataset/Band.GetVirtualMem()/GetTiledVirtualMem() methods are also available.
     968They return a VirtualMem python object that has a GetAddr() method that returns
     969a Python memoryview object (Python 2.7 or later required). However, using such
     970object does not seem practical for non-Byte data types.
     971
     972== Test Suite ==
     973
     974The autotest suite will be extended to test the Python API of this RFC. In
     975autotest/cpp, a test_virtualmem.cpp file tests concurrent access to the same
     976pages by 2 threads.
     977
     978== Implementation ==
     979
     980Implementation will be done by Even Rouault in GDAL/OGR trunk. The proposed
     981implementation is attached as a patch.