Changes between Version 4 and Version 5 of CloudOptimizedGeoTIFF


Ignore:
Timestamp:
Jan 31, 2017, 6:45:31 AM (7 years ago)
Author:
Even Rouault
Comment:

Mention validate_cloud_optimized_geotiff.py and some performance testing

Legend:

Unmodified
Added
Removed
Modified
  • CloudOptimizedGeoTIFF

    v4 v5  
    3030For an image of 4096x4096 with 4 overview levels, the 5 IFDs and their TileOffsets and TileByteCounts tag data fit into the first 6KB of the file.
    3131
    32 (Note: for JPEG compression, the above method produce cloud optimized files only if using GDAL 2.2 (or a dev version >= r36849). For older
    33 versions, the IFD of the overviews will be written towards the end of the file.)
     32Note: for JPEG compression, the above method produce cloud optimized files only if using GDAL 2.2 (or a dev version >= r36879). For older
     33versions, the IFD of the overviews will be written towards the end of the file. A recent version of GDAL (2.2 or dev version >= r37257)
     34built against internal libtiff (or libtiff >= 3.0.8, unreleased at time of writing) will also help reducing the amount of bytes read for
     35JPEG compressed files with YCbCr subsampling.
    3436
    3537== How to read it with GDAL ==
     
    4648
    4749For files hosted on Amazon S3 storage, with non-public sharing rights, [http://www.gdal.org/cpl__vsi_8h.html#a5b4754999acd06444bfda172ff2aaa16 /vsis3/] can be used.
     50
     51== How to check if a GeoTIFF has a cloud optimization internal organization ? ==
     52
     53The [https://svn.osgeo.org/gdal/trunk/gdal/swig/python/samples/validate_cloud_optimized_geotiff.py validate_cloud_optimized_geotiff.py] script can be used to check that a (GeoTIFF) file follows the above described file structure
     54
     55{{{
     56$ python validate_cloud_optimized_geotiff.py test.tif
     57}}}
     58
     59or
     60
     61{{{
     62$ python
     63import validate_cloud_optimized_geotiff.py
     64validate_cloud_optimized_geotiff.validate('test.tif')
     65}}}
     66
     67
     68== Performance testing ==
     69
     70Done with GDAL trunk r37259 with internal libtiff.
     71
     72=== Preparation ===
     73
     74The source image is the True Color Image of a Sentinel 2A L1C product (10980x10980 pixels, RGB bands of type Byte)
     75
     76Creation of a "regular" GeoTIFF with overviews:
     77
     78{{{
     79gdal_translate SENTINEL2_L1C:S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441.SAFE/MTD_MSIL1C.xml:TCI:EPSG_32630 S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif -co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR
     80gdaladdo -r average S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif 2 4 8 16 32
     81}}}
     82
     83Creation of a cloud optimized GeoTIFF:
     84
     85{{{
     86gdal_translate SENTINEL2_L1C:S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441.SAFE/MTD_MSIL1C.xml:TCI:EPSG_32630 S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif -co TILED=YES -co COMPRESS=DEFLATE
     87gdaladdo -r average  S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif 2 4 8 16 32
     88gdal_translate S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif -co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR -co COPY_SRC_OVERVIEWS=YES
     89}}}
     90
     91=== Reading a single pixel ===
     92
     93* Regular GeoTIFF:
     94
     95{{{
     96$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdallocationinfo --debug on \
     97   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif 5000 5000
     98VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
     99VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     100VSICURL: Got response_code=206
     101VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     102VSICURL: Got response_code=206
     103GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x7a84c0) succeeds as GTiff.
     104Report:
     105  Location: (5000P,5000L)
     106  Band 1:
     107GDAL: GDAL_CACHEMAX = 791 MB
     108VSICURL: Downloading 1556480-1572863 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     109VSICURL: Got response_code=206
     110    Value: 255
     111  Band 2:
     112    Value: 255
     113  Band 3:
     114    Value: 255
     115GDAL: GDALClose(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x7a84c0)
     116
     117real    0m0.520s
     118user    0m0.080s
     119sys     0m0.012s
     120}}}
     121
     122* Cloud optimized GeoTIFF:
     123
     124{{{
     125$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdallocationinfo--debug on \
     126   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif  5000 5000
     127VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
     128VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     129VSICURL: Got response_code=206
     130VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     131VSICURL: Got response_code=206
     132GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1c544c0) succeeds as GTiff.
     133Report:
     134  Location: (5000P,5000L)
     135  Band 1:
     136GDAL: GDAL_CACHEMAX = 791 MB
     137VSICURL: Downloading 2785280-2801663 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     138VSICURL: Got response_code=206
     139    Value: 255
     140  Band 2:
     141    Value: 255
     142  Band 3:
     143    Value: 255
     144GDAL: GDALClose(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1c544c0)
     145
     146real    0m0.527s
     147user    0m0.088s
     148sys     0m0.024s
     149}}}
     150
     151No significant time difference (individual runs may differ by a few tens of milliseconds). Same amount of I/O (64 KB read). Note the use of CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif to avoid reading any side car files (.aux.xml, etc...) and GDAL_DISABLE_READDIR_ON_OPEN=YES to avoid any attempt of listing the files in the same directory.
     152
     153Same conclusions if using a AWS S3 hosting, with both GDAL_DISABLE_READDIR_ON_OPEN=YES and CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif defined as well.
     154
     155
     156=== Reading a block of pixels at full resolution ===
     157
     158* Regular GeoTIFF:
     159
     160{{{
     161$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
     162   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif \
     163   -srcwin 1024 1024 256 256 out.tif
     164VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
     165VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     166VSICURL: Got response_code=206
     167VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     168VSICURL: Got response_code=206
     169GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0xc6d620) succeeds as GTiff.
     170Input file size is 10980, 10980
     171GTiff: ScanDirectories()
     172VSICURL: Downloading 2113536-2129919 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     173VSICURL: Got response_code=206
     174VSICURL: Downloading 2129920-2162687 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     175VSICURL: Got response_code=206
     176GTiff: Opened 5490x5490 overview.
     177GTiff: Opened 2745x2745 overview.
     178GTiff: Opened 1373x1373 overview.
     179GTiff: Opened 687x687 overview.
     180GTiff: Opened 344x344 overview.
     181GDAL: GDALDefaultOverviews::OverviewScan()
     182VSICURL: Downloading 196608-212991 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     183VSICURL: Got response_code=206
     184...10...20...30...40...50...60...70...80...90...100 - done.
     185
     186real    0m0.757s
     187user    0m0.100s
     188sys     0m0.032s
     189}}}
     190
     191One can see that a directory scan is done (GTiff: ScanDirectories() trace), despite a few optimizations done in r37258 and r37259. This is due to gdal_translate trying to copy mask bands, which requires scaning directories to find a potential internal mask band. This scan can be avoided by adding the -mask none switch.
     192
     193{{{
     194$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
     195   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif
     196   -srcwin 1024 1024 256 256 -mask none out.tif
     197VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
     198VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     199VSICURL: Got response_code=206
     200VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     201VSICURL: Got response_code=206
     202GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x1078620) succeeds as GTiff.
     203Input file size is 10980, 10980
     204GDAL: GDALDefaultOverviews::OverviewScan()
     205GDAL: GDALDatasetCopyWholeRaster(): 256*256 swaths, bInterleave=1
     206VSICURL: Downloading 196608-212991 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     207VSICURL: Got response_code=206
     208...10...20...30...40...50...60...70...80...90...100 - done.
     209
     210real    0m0.518s
     211user    0m0.092s
     212sys     0m0.012s
     213}}}
     214
     215Best timing on a AWS S3 bucket (us-east-1 region, accessed from France): ~ 1.9s
     216
     217This is also the performance one gets with :
     218
     219{{{
     220$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
     221    python -c 'from osgeo import gdal; ds = gdal.Open("/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif"); ds.ReadAsArray(1024,1024,256,256)'
     222}}}
     223
     224* Cloud optimized GeoTIFF:
     225
     226{{{
     227$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
     228  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif \
     229  -srcwin 1024 1024 256 256 out.tif
     230VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
     231VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     232VSICURL: Got response_code=206
     233VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     234VSICURL: Got response_code=206
     235GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0xdeb620) succeeds as GTiff.
     236Input file size is 10980, 10980
     237GTiff: ScanDirectories()
     238GTiff: Opened 5490x5490 overview.
     239GTiff: Opened 2745x2745 overview.
     240GTiff: Opened 1373x1373 overview.
     241GTiff: Opened 687x687 overview.
     242GTiff: Opened 344x344 overview.
     243VSICURL: Downloading 1425408-1441791 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     244VSICURL: Got response_code=206
     245...10...20...30...40...50...60...70...80...90...100 - done.
     246
     247real    0m0.519s
     248user    0m0.096s
     249sys     0m0.008s
     250}}}
     251
     252Best timing on a AWS S3 bucket: ~ 1.9s
     253
     254No need to specify -mask none to get the maximum performance: as the IFD are at the beginning of the files, they have been fetched with the 2 first HTTP GET requests.
     255
     256=== Getting a subsampled version of the image ===
     257
     258* Regular GeoTIFF:
     259
     260{{{
     261$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
     262  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif \
     263  out.tif -outsize 1% 1%
     264
     265VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
     266VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     267VSICURL: Got response_code=206
     268VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     269VSICURL: Got response_code=206
     270GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x23d7610) succeeds as GTiff.
     271Input file size is 10980, 10980
     272GTiff: ScanDirectories()
     273VSICURL: Downloading 2113536-2129919 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     274VSICURL: Got response_code=206
     275VSICURL: Downloading 2129920-2162687 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     276VSICURL: Got response_code=206
     277GTiff: Opened 5490x5490 overview.
     278GTiff: Opened 2745x2745 overview.
     279GTiff: Opened 1373x1373 overview.
     280GTiff: Opened 687x687 overview.
     281GTiff: Opened 344x344 overview.
     282VSICURL: Downloading 3342336-3358719 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     283VSICURL: Got response_code=206
     284VSICURL: Downloading 3358720-3363606 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
     285VSICURL: Got response_code=206
     286...10...20...30...40...50...60...70...80...90...100 - done.
     287
     288real    0m0.810s
     289user    0m0.108s
     290sys     0m0.020s
     291}}}
     292
     293Best timing on a AWS S3 bucket: ~ 2.5s
     294
     295A full scan of the IFD is necessary to find the appropriate overview level.
     296
     297* Cloud optimized GeoTIFF:
     298
     299{{{
     300$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
     301  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif \
     302  out.tif -outsize 1% 1%
     303VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
     304VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     305VSICURL: Got response_code=206
     306VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
     307VSICURL: Got response_code=206
     308GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1847610) succeeds as GTiff.
     309Input file size is 10980, 10980
     310GTiff: ScanDirectories()
     311GTiff: Opened 5490x5490 overview.
     312GTiff: Opened 2745x2745 overview.
     313GTiff: Opened 1373x1373 overview.
     314GTiff: Opened 687x687 overview.
     315GTiff: Opened 344x344 overview.
     316GDAL: GDALDatasetCopyWholeRaster(): 109*109 swaths, bInterleave=1
     317...10...20...30...40...50...60...70...80...90...100 - done.
     318
     319real    0m0.435s
     320user    0m0.088s
     321sys     0m0.028s
     322}}}
     323
     324Best timing on a AWS S3 bucket: ~ 1.5s
     325
     326As the IFD are at the beginning of the files, as well as the pixel data for the smallest overview, the request can be completed with the 2 first HTTP GET requests (this is a bit of an extreme case of course)