Changes between Version 5 and Version 6 of rfc45_virtualmem


Ignore:
Timestamp:
Jan 8, 2014, 4:44:41 AM (10 years ago)
Author:
Even Rouault
Comment:

Addition of GetVirtualMemAuto() and memory file mapping

Legend:

Unmodified
Added
Removed
Modified
  • rfc45_virtualmem

    v5 v6  
    6262persistent storage.
    6363
     64We also offer an alternative way of creating a CPLVirtualMem object, by using
     65memory file mapping mechanisms. This may be used by "raw" datasets (EHdr driver
     66for example) where the organization of data on disk directly matches the
     67organization of a in-memory array.
     68
    6469==== High-level usage ====
    6570
     
    8489* GDALRasterBandGetTiledVirtualMem(): equivalent of GDALDatasetGetTiledVirtualMem() that operates on a raster band object rather than a dataset object.
    8590
     91* GDALGetVirtualMemAuto(): simplified version of GDALRasterBandGetVirtualMem() where
     92  the user only specifies the access mode. The pixel spacing and line spacing are
     93  returned by the function. This is implemented as a virtual method at the GDALRasterBand
     94  level, so that drivers have a chance of overriding the base implementation. The
     95  base implementation justs uses GDALRasterBandGetVirtualMem(). Overriden implementation
     96  may use the memory file mapping mechanism instead. Such implementations will be done
     97  in the RawRasterBand object and in the GeoTIFF driver.
     98
    8699== Details of new API ==
    87100
     
    89102
    90103{{{
    91 
    92104/**
    93105 * \file cpl_virtualmem.h
     
    103115 * This exploits low-level mechanisms of the operating system (virtual memory
    104116 * allocation, page protection and handler of virtual memory exceptions).
     117 *
     118 * It is also possible to create a virtual memory mapping from a file or part
     119 * of a file.
    105120 *
    106121 * The current implementation is Linux only.
     
    142157                                      void* pUserData);
    143158
     159/** Callback triggered when a virtual memory mapping is destroyed.
     160  * @param pUserData user data that was passed to CPLVirtualMemNew().
     161 */
    144162typedef void (*CPLVirtualMemFreeUserData)(void* pUserData);
    145163
     
    221239                                        void *pCbkUserData);
    222240
     241
     242/** Return if virtual memory mapping of a file is available.
     243 *
     244 * @return TRUE if virtual memory mapping of a file is available.
     245 * @since GDAL 2.0
     246 */
     247int CPL_DLL CPLIsVirtualMemFileMapAvailable(void);
     248
     249/** Create a new virtual memory mapping from a file.
     250 *
     251 * The file must be a "real" file recognized by the operating system, and not
     252 * a VSI extended virtual file.
     253 *
     254 * In VIRTUALMEM_READWRITE mode, updates to the memory mapping will be written
     255 * in the file.
     256 *
     257 * On Linux AMD64 platforms, the maximum value for nLength is 128 TB.
     258 * On Linux x86 platforms, the maximum value for nLength is 2 GB.
     259 *
     260 * Only supported on Linux for now.
     261 *
     262 * @param  fp       Virtual file handle.
     263 * @param  nOffset  Offset in the file to start the mapping from.
     264 * @param  nLength  Length of the portion of the file to map into memory.
     265 * @param eAccessMode Permission to use for the virtual memory mapping. This must
     266 *                    be consistant with how the file has been opened.
     267 * @param pfnFreeUserData callback that is called when the object is destroyed.
     268 * @param pCbkUserData user data passed to pfnFreeUserData.
     269 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     270 *         or NULL in case of failure.
     271 *
     272 * @since GDAL 2.0
     273 */
     274CPLVirtualMem CPL_DLL *CPLVirtualMemFileMapNew( VSILFILE* fp,
     275                                                vsi_l_offset nOffset,
     276                                                vsi_l_offset nLength,
     277                                                CPLVirtualMemAccessMode eAccessMode,
     278                                                CPLVirtualMemFreeUserData pfnFreeUserData,
     279                                                void *pCbkUserData );
     280
     281/** Create a new virtual memory mapping derived from an other virtual memory
     282 *  mapping.
     283 *
     284 * This may be usefull in case of creating mapping for pixel interleaved data.
     285 *
     286 * The new mapping takes a reference on the base mapping.
     287 *
     288 * @param pVMemBase Base virtual memory mapping
     289 * @param nOffset   Offset in the base virtual memory mapping from which to start
     290 *                  the new mapping.
     291 * @param nSize     Size of the base virtual memory mapping to expose in the
     292 *                  the new mapping.
     293 * @param pfnFreeUserData callback that is called when the object is destroyed.
     294 * @param pCbkUserData user data passed to pfnFreeUserData.
     295 * @return a virtual memory object that must be freed by CPLVirtualMemFree(),
     296 *         or NULL in case of failure.
     297 *
     298 * @since GDAL 2.0
     299 */
     300CPLVirtualMem CPL_DLL *CPLVirtualMemDerivedNew(CPLVirtualMem* pVMemBase,
     301                                               vsi_l_offset nOffset,
     302                                               vsi_l_offset nSize,
     303                                               CPLVirtualMemFreeUserData pfnFreeUserData,
     304                                               void *pCbkUserData);
     305
    223306/** Free a virtual memory mapping.
    224307 *
     
    260343size_t CPL_DLL CPLVirtualMemGetSize(CPLVirtualMem* ctxt);
    261344
     345/** Return if the virtal memory mapping is a direct file mapping.
     346 *
     347 * @param ctxt context returned by CPLVirtualMemNew().
     348 * @return TRUE if the virtal memory mapping is a direct file mapping.
     349 *
     350 * @since GDAL 2.0
     351 */
     352int CPL_DLL CPLVirtualMemIsFileMapping(CPLVirtualMem* ctxt);
     353
     354/** Return the access mode of the virtual memory mapping.
     355 *
     356 * @param ctxt context returned by CPLVirtualMemNew().
     357 * @return the access mode of the virtual memory mapping.
     358 *
     359 * @since GDAL 2.0
     360 */
     361CPLVirtualMemAccessMode CPL_DLL CPLVirtualMemGetAccessMode(CPLVirtualMem* ctxt);
     362
    262363/** Return the page size associated to a virtual memory mapping.
    263364 *
     
    353454
    354455{{{
    355 
    356456
    357457/** Create a CPLVirtualMem object from a GDAL dataset object.
     
    367467 * The pointer to access the virtual memory object is obtained with
    368468 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     469 * CPLVirtualMemFree() must be called before the dataset object is destroyed.
    369470 *
    370471 * If p is such a pointer and base_type the C type matching eBufType, for default
     
    481582                                         char **papszOptions );
    482583
    483 
    484 
    485 /** Create a CPLVirtualMem object from a GDAL dataset object.
     584** Create a CPLVirtualMem object from a GDAL raster band object.
    486585 *
    487586 * Only supported on Linux for now.
     
    495594 * The pointer to access the virtual memory object is obtained with
    496595 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     596 * CPLVirtualMemFree() must be called before the raster band object is destroyed.
    497597 *
    498598 * If p is such a pointer and base_type the C type matching eBufType, for default
     
    578678 * @since GDAL 2.0
    579679 */
    580 
    581 typedef enum
    582 {
    583     /*! Tile Interleaved by Pixel: tile (0,0) with internal band interleaved
    584         by pixel organization, tile (1, 0), ...  */
    585     GTO_TIP,
    586     /*! Band Interleaved by Tile : tile (0,0) of first band, tile (0,0) of second
    587         band, ... tile (1,0) of fisrt band, tile (1,0) of second band, ... */
    588     GTO_BIT,
    589     /*! Band SeQuential : all the tiles of first band, all the tiles of following band... */
    590     GTO_BSQ
    591 } GDALTileOrganization;
    592680
    593681CPLVirtualMem CPL_DLL* GDALRasterBandGetVirtualMem( GDALRasterBandH hBand,
     
    604692                                         char **papszOptions );
    605693
     694typedef enum
     695{
     696    /*! Tile Interleaved by Pixel: tile (0,0) with internal band interleaved
     697        by pixel organization, tile (1, 0), ...  */
     698    GTO_TIP,
     699    /*! Band Interleaved by Tile : tile (0,0) of first band, tile (0,0) of second
     700        band, ... tile (1,0) of fisrt band, tile (1,0) of second band, ... */
     701    GTO_BIT,
     702    /*! Band SeQuential : all the tiles of first band, all the tiles of following band... */
     703    GTO_BSQ
     704} GDALTileOrganization;
    606705
    607706/** Create a CPLVirtualMem object from a GDAL dataset object, with tiling
     
    627726 * The pointer to access the virtual memory object is obtained with
    628727 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     728 * CPLVirtualMemFree() must be called before the dataset object is destroyed.
    629729 *
    630730 * If p is such a pointer and base_type the C type matching eBufType, for default
     
    718818                                              char **papszOptions );
    719819
    720 
    721820/** Create a CPLVirtualMem object from a GDAL rasterband object, with tiling
    722821 * organization
     
    740839 * The pointer to access the virtual memory object is obtained with
    741840 * CPLVirtualMemGetAddr(). It remains valid until CPLVirtualMemFree() is called.
     841 * CPLVirtualMemFree() must be called before the raster band object is destroyed.
    742842 *
    743843 * If p is such a pointer and base_type the C type matching eBufType, for default
     
    814914                                              size_t nCacheSize,
    815915                                              int bSingleThreadUsage,
     916                                              char **papszOptions );
     917
     918}}}
     919
     920=== Implemented by gdalrasterband.cpp ===
     921
     922{{{
     923
     924/** \brief Create a CPLVirtualMem object from a GDAL raster band object.
     925 *
     926 * Only supported on Linux for now.
     927 *
     928 * This method allows creating a virtual memory object for a GDALRasterBand,
     929 * that exposes the whole image data as a virtual array.
     930 *
     931 * The default implementation relies on GDALRasterBandGetVirtualMem(), but specialized
     932 * implementation, such as for raw files, may also directly use mechanisms of the
     933 * operating system to create a view of the underlying file into virtual memory
     934 * ( CPLVirtualMemFileMapNew() )
     935 *
     936 * At the time of writing, the GeoTIFF driver and "raw" drivers (EHdr, ...) offer
     937 * a specialized implementation with direct file mapping, provided that some
     938 * requirements are met :
     939 *   - for all drivers, the dataset must be backed by a "real" file in the file
     940 *     system, and the byte ordering of multi-byte datatypes (Int16, etc.)
     941 *     must match the native ordering of the CPU.
     942 *   - in addition, for the GeoTIFF driver, the GeoTIFF file must be uncompressed, scanline
     943 *     oriented (i.e. not tiled). Strips must be organized in the file in sequential
     944 *     order, and be equally spaced (which is generally the case). Only power-of-two
     945 *     bit depths are supported (8 for GDT_Bye, 16 for GDT_Int16/GDT_UInt16,
     946 *     32 for GDT_Float32 and 64 for GDT_Float64)
     947 *
     948 * The pointer returned remains valid until CPLVirtualMemFree() is called.
     949 * CPLVirtualMemFree() must be called before the raster band object is destroyed.
     950 *
     951 * If p is such a pointer and base_type the type matching GDALGetRasterDataType(),
     952 * the element of image coordinates (x, y) can be accessed with
     953 * *(base_type*) ((GByte*)p + x * *pnPixelSpace + y * *pnLineSpace)
     954 *
     955 * This method is the same as the C GDALGetVirtualMemAuto() function.
     956 *
     957 * @param eRWFlag Either GF_Read to read the band, or GF_Write to
     958 * read/write the band.
     959 *
     960 * @param pnPixelSpace Output parameter giving the byte offset from the start of one pixel value in
     961 * the buffer to the start of the next pixel value within a scanline.
     962 *
     963 * @param pnLineSpace Output parameter giving the byte offset from the start of one scanline in
     964 * the buffer to the start of the next.
     965 *
     966 * @param papszOptions NULL terminated list of options.
     967 *                     If a specialized implementation exists, defining USE_DEFAULT_IMPLEMENTATION=YES
     968 *                     will cause the default implementation to be used.
     969 *                     When requiring or falling back to the default implementation, the following
     970 *                     options are available : CACHE_SIZE (in bytes, defaults to 40 MB),
     971 *                     PAGE_SIZE_HINT (in bytes),
     972 *                     SINGLE_THREAD ("FALSE" / "TRUE", defaults to FALSE)
     973 *
     974 * @return a virtual memory object that must be unreferenced by CPLVirtualMemFree(),
     975 *         or NULL in case of failure.
     976 *
     977 * @since GDAL 2.0
     978 */
     979
     980CPLVirtualMem  *GDALRasterBand::GetVirtualMemAuto( GDALRWFlag eRWFlag,
     981                                                   int *pnPixelSpace,
     982                                                   GIntBig *pnLineSpace,
     983                                                   char **papszOptions ):
     984
     985CPLVirtualMem CPL_DLL* GDALGetVirtualMemAuto( GDALRasterBandH hBand,
     986                                              GDALRWFlag eRWFlag,
     987                                              int *pnPixelSpace,
     988                                              GIntBig *pnLineSpace,
    816989                                              char **papszOptions );
    817990}}}
     
    8501023CPLVirtualMemIsAccessThreadSafe() has been introduced for that purpose.
    8511024
     1025As far as CPLVirtualMemFileMapNew() is concerned, memory file mapping on POSIX
     1026systems with mmap() should be portable. Windows has CreateFileMapping() and
     1027MapViewOfFile() API that have similar capabilities as mmap().
     1028
    8521029== Performance ==
    8531030
     
    8661043dealt by 2 different threads, but one after the other one.
    8671044
     1045The overhead of virtual memory objects returned by GetVirtualMemAuto(), when
     1046using the memory file mapping, should be lesser than the manual management of
     1047page faults. However, GDAL has no control of the strategy used by the operating
     1048system to cache pages.
     1049
    8681050== Limitations ==
    8691051
     
    8841066== Related thoughts ==
    8851067
    886 With an uncompressed GeoTIFF file (where strips or tiles are sequentially written
    887 to disk), GDALDatasetGetVirtualMem() and GDALDatasetGetTiledVirtualMem(), with
    888 appropriate input parameters, could potentially just mmap() the file itself, which
    889 would save any GDAL overhead. It is no clear however how old accessed pages can
    890 be evicted from RAM since Linux does not seem to discard them, which tend to cause
    891 undesirable disk swapping when the memory mapping is bigger than RAM.
    892 
    8931068Some issues with system calls such as read() or write(), or easier multi-threading
    8941069could potentially be solved by making a FUSE (File system in USEr space) driver that
    895 would expose a GDAL dataset as a file, and the mmap()'ing the file itself. The
    896 issue raised in the previous paragraph would still apply. Plus the fact that
     1070would expose a GDAL dataset as a file, and the mmap()'ing the file itself. However
    8971071FUSE drivers are only available on POSIX OS, and need root priviledge to be
    8981072mounted (a FUSE filesystem does not need root priviledge to run, but the mounting
     
    9191093                           xsize=None, ysize=None, bufxsize=None, bufysize=None,
    9201094                           datatype = None, band_list = None, band_sequential = True,
    921                            cache_size = 10 * 1024 * 1024, page_size_hint = 0):
     1095                           cache_size = 10 * 1024 * 1024, page_size_hint = 0, options = None):
    9221096        """Return a NumPy array for the dataset, seen as a virtual memory mapping.
    9231097           If there are several bands and band_sequential = True, an element is
     
    9261100           accessed with array[y][x][band].
    9271101           If there is only one band, an element is accessed with array[y][x].
     1102           Any reference to the array must be dropped before the last reference to the
     1103           related dataset is also dropped.
    9281104        """
    9291105}}}
     
    9351111                           xsize=None, ysize=None, tilexsize=256, tileysize=256,
    9361112                           datatype = None, band_list = None, tile_organization = gdalconst.GTO_BSQ,
    937                            cache_size = 10 * 1024 * 1024):
     1113                           cache_size = 10 * 1024 * 1024, options = None):
    9381114        """Return a NumPy array for the dataset, seen as a virtual memory mapping with
    9391115           a tile organization.
     
    9451121           accessed with array[band][tiley][tilex][y][x].
    9461122           If there is only one band, an element is accessed with array[tiley][tilex][y][x].
     1123           Any reference to the array must be dropped before the last reference to the
     1124           related dataset is also dropped.
    9471125        """
    9481126}}}
    9491127
    950 And the Band object has the following 2 methods :
     1128And the Band object has the following 3 methods :
    9511129
    9521130{{{
     
    9541132                         xsize=None, ysize=None, bufxsize=None, bufysize=None,
    9551133                         datatype = None,
    956                          cache_size = 10 * 1024 * 1024, page_size_hint = 0):
     1134                         cache_size = 10 * 1024 * 1024, page_size_hint = 0, options = None):
    9571135        """Return a NumPy array for the band, seen as a virtual memory mapping.
    9581136           An element is accessed with array[y][x].
     1137           Any reference to the array must be dropped before the last reference to the
     1138           related dataset is also dropped.
    9591139        """
     1140
     1141  def GetVirtualMemAutoArray(self, eAccess = gdalconst.GF_Read, options = None):
     1142        """Return a NumPy array for the band, seen as a virtual memory mapping.
     1143           An element is accessed with array[y][x].
    9601144
    9611145  def GetTiledVirtualMemArray(self, eAccess = gdalconst.GF_Read, xoff=0, yoff=0,
    9621146                           xsize=None, ysize=None, tilexsize=256, tileysize=256,
    9631147                           datatype = None,
    964                            cache_size = 10 * 1024 * 1024):
     1148                           cache_size = 10 * 1024 * 1024, options = None):
    9651149        """Return a NumPy array for the band, seen as a virtual memory mapping with
    9661150           a tile organization.
    9671151           An element is accessed with array[tiley][tilex][y][x].
     1152           Any reference to the array must be dropped before the last reference to the
     1153           related dataset is also dropped.
    9681154        """
    9691155}}}