wiki:CloudOptimizedGeoTIFF

Version 5 (modified by Even Rouault, 7 years ago) ( diff )

Mention validate_cloud_optimized_geotiff.py and some performance testing

Cloud optimized GeoTIFF

Definition

A cloud optimized GeoTIFF is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, whose internal organization is friendly for consumption by clients issuing HTTP GET range request ("bytes: start_offset-end_offset" HTTP header).

It contains at its beginning the metadata of the full resolution imagery, followed by the optional presence of overview metadata, and finally the imagery itself. To make it friendly with streaming and progressive rendering, we recommand starting with the imagery of the smallest overview and finishing with the imagery of the full resolution level.

More formally, the structure of such a file is:

  • TIFF / BigTIFF signature
  • IFD (Image File Directory) of full resolution image
  • Values of TIFF tags that don't fit inline in the IFD directory, such as TileOffsets, TileByteCounts and GeoTIFF keys
  • Optional: IFD (Image File Directory) of first overview (typically subsampled by a factor of 2), followed by the values of its tags that don't fit inline
  • Optional: IFD (Image File Directory) of second overview (typically subsampled by a factor of 4), followed by the values of its tags that don't fit inline
  • ...
  • Optional: IFD (Image File Directory) of last overview (typically subsampled by a factor of 2N), followed by the values of its tags that don't fit inline
  • Optional: tile content of last overview level
  • ...
  • Optional: tile content of first overview level
  • Tile content of full resolution image.

How to generate it with GDAL

Given an input dataset in.tif with already generated internal or external overviews, a cloud optimized GeoTIFF can be generated with:

gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

This will result in a images with tiles of dimension 256x256 pixel for main resolution, and 128x128 tiles for overviews.

For an image of 4096x4096 with 4 overview levels, the 5 IFDs and their TileOffsets and TileByteCounts tag data fit into the first 6KB of the file.

Note: for JPEG compression, the above method produce cloud optimized files only if using GDAL 2.2 (or a dev version >= r36879). For older versions, the IFD of the overviews will be written towards the end of the file. A recent version of GDAL (2.2 or dev version >= r37257) built against internal libtiff (or libtiff >= 3.0.8, unreleased at time of writing) will also help reducing the amount of bytes read for JPEG compressed files with YCbCr subsampling.

How to read it with GDAL

GDAL includes special filesystems that can read a file hosted on a HTTP/FTP server by chunks.

The base filesystem is /vsicurl/ (Virtual System Interface for Curl) and the filename it accepts are of the form "/vsicurl/http://example.com/path/to/some.tif". They can be used whereever GDAL expects a dataset / filename to be passed: gdalinfo, gdal_translate, GDALOpen() API, etc...

Currently /vsicurl/ uses 16 KB as the minimum unit for downloading with HTTP range requests, and a in-memory cache of up to 1000 16KB blocks, with a least recently used eviction strategy.

To minimize the total number of HTTP requests outside of the target GeoTIFF file, setting the GDAL_DISABLE_READDIR_ON_OPEN=YES and CPL_VSIL_CURL_ALLOWED_EXTENSIONS=tif environment variables/configuration options is recommended so as to avoid any side-car files (such a .ovr, .aux.xml, .aux, etc.) to be probed.

Running gdalinfo or GDALOpen() on such a cloud optimized GeoTIFF will retrieve all the metadata with a single HTTP request of 16 KB. When reading pixels in a tile, only the blocks of 16 KB intersecting the range of the tile will be downloaded.

For files hosted on Amazon S3 storage, with non-public sharing rights, /vsis3/ can be used.

How to check if a GeoTIFF has a cloud optimization internal organization ?

The validate_cloud_optimized_geotiff.py script can be used to check that a (GeoTIFF) file follows the above described file structure

$ python validate_cloud_optimized_geotiff.py test.tif

or

$ python
import validate_cloud_optimized_geotiff.py
validate_cloud_optimized_geotiff.validate('test.tif')

Performance testing

Done with GDAL trunk r37259 with internal libtiff.

Preparation

The source image is the True Color Image of a Sentinel 2A L1C product (10980x10980 pixels, RGB bands of type Byte)

Creation of a "regular" GeoTIFF with overviews:

gdal_translate SENTINEL2_L1C:S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441.SAFE/MTD_MSIL1C.xml:TCI:EPSG_32630 S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif -co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR
gdaladdo -r average S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif 2 4 8 16 32

Creation of a cloud optimized GeoTIFF:

gdal_translate SENTINEL2_L1C:S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441.SAFE/MTD_MSIL1C.xml:TCI:EPSG_32630 S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif -co TILED=YES -co COMPRESS=DEFLATE
gdaladdo -r average  S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif 2 4 8 16 32
gdal_translate S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI.tif S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif -co TILED=YES -co COMPRESS=JPEG -co PHOTOMETRIC=YCBCR -co COPY_SRC_OVERVIEWS=YES

Reading a single pixel

  • Regular GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdallocationinfo --debug on \
   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif 5000 5000
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x7a84c0) succeeds as GTiff.
Report:
  Location: (5000P,5000L)
  Band 1:
GDAL: GDAL_CACHEMAX = 791 MB
VSICURL: Downloading 1556480-1572863 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
    Value: 255
  Band 2:
    Value: 255
  Band 3:
    Value: 255
GDAL: GDALClose(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x7a84c0)

real	0m0.520s
user	0m0.080s
sys	0m0.012s
  • Cloud optimized GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdallocationinfo--debug on \
   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif  5000 5000
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1c544c0) succeeds as GTiff.
Report:
  Location: (5000P,5000L)
  Band 1:
GDAL: GDAL_CACHEMAX = 791 MB
VSICURL: Downloading 2785280-2801663 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
    Value: 255
  Band 2:
    Value: 255
  Band 3:
    Value: 255
GDAL: GDALClose(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1c544c0)

real	0m0.527s
user	0m0.088s
sys	0m0.024s

No significant time difference (individual runs may differ by a few tens of milliseconds). Same amount of I/O (64 KB read). Note the use of CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif to avoid reading any side car files (.aux.xml, etc...) and GDAL_DISABLE_READDIR_ON_OPEN=YES to avoid any attempt of listing the files in the same directory.

Same conclusions if using a AWS S3 hosting, with both GDAL_DISABLE_READDIR_ON_OPEN=YES and CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif defined as well.

Reading a block of pixels at full resolution

  • Regular GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif \
   -srcwin 1024 1024 256 256 out.tif
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0xc6d620) succeeds as GTiff.
Input file size is 10980, 10980
GTiff: ScanDirectories()
VSICURL: Downloading 2113536-2129919 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 2129920-2162687 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GTiff: Opened 5490x5490 overview.
GTiff: Opened 2745x2745 overview.
GTiff: Opened 1373x1373 overview.
GTiff: Opened 687x687 overview.
GTiff: Opened 344x344 overview.
GDAL: GDALDefaultOverviews::OverviewScan()
VSICURL: Downloading 196608-212991 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
...10...20...30...40...50...60...70...80...90...100 - done.

real	0m0.757s
user	0m0.100s
sys	0m0.032s

One can see that a directory scan is done (GTiff: ScanDirectories() trace), despite a few optimizations done in r37258 and r37259. This is due to gdal_translate trying to copy mask bands, which requires scaning directories to find a potential internal mask band. This scan can be avoided by adding the -mask none switch.

$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
   /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif 
   -srcwin 1024 1024 256 256 -mask none out.tif
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x1078620) succeeds as GTiff.
Input file size is 10980, 10980
GDAL: GDALDefaultOverviews::OverviewScan()
GDAL: GDALDatasetCopyWholeRaster(): 256*256 swaths, bInterleave=1
VSICURL: Downloading 196608-212991 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
...10...20...30...40...50...60...70...80...90...100 - done.

real	0m0.518s
user	0m0.092s
sys	0m0.012s

Best timing on a AWS S3 bucket (us-east-1 region, accessed from France): ~ 1.9s

This is also the performance one gets with :

$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif \
    python -c 'from osgeo import gdal; ds = gdal.Open("/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif"); ds.ReadAsArray(1024,1024,256,256)'
  • Cloud optimized GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif \
  -srcwin 1024 1024 256 256 out.tif
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0xdeb620) succeeds as GTiff.
Input file size is 10980, 10980
GTiff: ScanDirectories()
GTiff: Opened 5490x5490 overview.
GTiff: Opened 2745x2745 overview.
GTiff: Opened 1373x1373 overview.
GTiff: Opened 687x687 overview.
GTiff: Opened 344x344 overview.
VSICURL: Downloading 1425408-1441791 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
...10...20...30...40...50...60...70...80...90...100 - done.

real	0m0.519s
user	0m0.096s
sys	0m0.008s

Best timing on a AWS S3 bucket: ~ 1.9s

No need to specify -mask none to get the maximum performance: as the IFD are at the beginning of the files, they have been fetched with the 2 first HTTP GET requests.

Getting a subsampled version of the image

  • Regular GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif \
  out.tif -outsize 1% 1%

VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)=3363607  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif, this=0x23d7610) succeeds as GTiff.
Input file size is 10980, 10980
GTiff: ScanDirectories()
VSICURL: Downloading 2113536-2129919 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 2129920-2162687 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
GTiff: Opened 5490x5490 overview.
GTiff: Opened 2745x2745 overview.
GTiff: Opened 1373x1373 overview.
GTiff: Opened 687x687 overview.
GTiff: Opened 344x344 overview.
VSICURL: Downloading 3342336-3358719 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 3358720-3363606 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_regular_with_ovr_2.tif)...
VSICURL: Got response_code=206
...10...20...30...40...50...60...70...80...90...100 - done.

real	0m0.810s
user	0m0.108s
sys	0m0.020s

Best timing on a AWS S3 bucket: ~ 2.5s

A full scan of the IFD is necessary to find the appropriate overview level.

  • Cloud optimized GeoTIFF:
$ time GDAL_DISABLE_READDIR_ON_OPEN=YES CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif gdal_translate --debug on \
  /vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif \
  out.tif -outsize 1% 1%
VSICURL: GetFileSize(http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)=3355470  response_code=200
VSICURL: Downloading 0-16383 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
VSICURL: Downloading 16384-49151 (http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif)...
VSICURL: Got response_code=206
GDAL: GDALOpen(/vsicurl/http://even.rouault.free.fr/gtiff_test/S2A_MSIL1C_20170102T111442_N0204_R137_T30TXT_20170102T111441_TCI_cloudoptimized_2.tif, this=0x1847610) succeeds as GTiff.
Input file size is 10980, 10980
GTiff: ScanDirectories()
GTiff: Opened 5490x5490 overview.
GTiff: Opened 2745x2745 overview.
GTiff: Opened 1373x1373 overview.
GTiff: Opened 687x687 overview.
GTiff: Opened 344x344 overview.
GDAL: GDALDatasetCopyWholeRaster(): 109*109 swaths, bInterleave=1
...10...20...30...40...50...60...70...80...90...100 - done.

real	0m0.435s
user	0m0.088s
sys	0m0.028s

Best timing on a AWS S3 bucket: ~ 1.5s

As the IFD are at the beginning of the files, as well as the pixel data for the smallest overview, the request can be completed with the 2 first HTTP GET requests (this is a bit of an extreme case of course)

Note: See TracWiki for help on using the wiki.