wiki:UserDocs/ReadInZip

Version 4 (modified by Even Rouault, 16 years ago) ( diff )

--

Reading a GDAL dataset in a .gz file or a .zip archive

Summary

From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly. The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip)

This is implemented as 2 virtual file systems (like /vsimem for example), /vsigzip and /vsizip. Only read-only access is supported. Note that performance will not be very impressive, as random access inside gzip data is slow by nature, although some optimizations have been made to make it faster and generally usable ("snapshots" of the gzip state are taken from time to time, so that further access to a given offset inside the file just needs to restart decompression from the nearest snapshot)

Note that .tar.gz archives are not supported, and there's no plan to support them as it is a very inappropriate format for seeking.

How to use that capability with a gzip file ?

For example :

gdalinfo /vsigzip/path/to/the/file.gz

were path/to/the/file.gz is relative or absolute.

If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsigziphome/gdal/...

The first time that a .gz file is read, a small .gz.properties file will be generated (if possible) to capture the uncompressed data size. This will make following opening of that dataset much faster.

A VSIStatL("/vsigzip/...") call will return the uncompressed size of the file.

How to use that capability with a ZIP archive ?

gdalinfo /vsizip/path/to/the/file.zip/path/inside/the/zip/file

were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is the relative path to the file inside the archive.

If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsiziphome/gdal/...

For example gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif

The ReadDir() method is implemented for the .zip archives, so a driver will be able to find files relative to the given file inside the archive. For example, you can read a CADRG dataset from the zipped archive of the a.toc file and all its NITF tiles.

A VSIStatL("/vsizip/...") call will return the uncompressed size of the file. Directories inside the ZIP file can be distinguished from regular files with the VSI_ISDIR(stat.st_mode) macro as for regular file systems.

Small syntaxic sugar : if the .zip file contains only one file located at its root, just mentionning "/vsizip/path/to/the/file.zip" will work.

Drivers supporting that capability

The fact that this new capability is implemented as virtual file systems imply that it will only work for GDAL drivers supporting the "large file API". A list of such drivers is :

  • PNG
  • JPEG
  • ILWIS
  • GTiff
  • GIF
  • JP2KAK
  • NITF
  • ADRG
  • DTED
  • SRTMHGT
  • BMP
  • LCP
  • HFA (Erdas Imagine)
  • AAIGRID
  • EHdr

Other drivers may work too (I just looked for those advertizing the GDAL_DCAP_VIRTUALIO capability)

Note: See TracWiki for help on using the wiki.