UserDocs/ReadInZip

Reading a GDAL dataset in a .gz file or a .zip archive

Summary

From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly. The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip).

This is implemented as 2 virtual file systems (like /vsimem for example), /vsigzip and /vsizip. From .zip files only read-only access is supported, while the /vsigzip driver supports read and sequential write. Note that performance will not be very impressive, as random access inside gzip data is slow by nature, although some optimizations have been made to make it faster and generally usable ("snapshots" of the gzip state are taken from time to time, so that further access to a given offset inside the file just needs to restart decompression from the nearest snapshot)

From GDAL 1.8.0, it is also possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive

How to use that capability with a gzip file ?

For example :

gdalinfo /vsigzip/path/to/the/file.gz

were path/to/the/file.gz is relative or absolute.

If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsigzip//home/gdal/...

The first time that a .gz file is read, a small .gz.properties file will be generated (if possible) to capture the uncompressed data size. This will make following opening of that dataset much faster.

A VSIStatL("/vsigzip/...") call will return the uncompressed size of the file.

The special file handler is  VSIInstallGZipFileHandler ()

How to use that capability with a ZIP archive ?

gdalinfo /vsizip/path/to/the/file.zip/path/inside/the/zip/file

were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is the relative path to the file inside the archive.

If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsizip//home/gdal/...

For example gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif

The ReadDir?() method is implemented for the .zip archives, so a driver will be able to find files relative to the given file inside the archive. For example, you can read a CADRG dataset from the zipped archive of the a.toc file and all its NITF tiles.

A VSIStatL("/vsizip/...") call will return the uncompressed size of the file. Directories inside the ZIP file can be distinguished from regular files with the VSI_ISDIR(stat.st_mode) macro as for regular file systems.

Small syntaxic sugar : if the .zip file contains only one file located at its root, just mentionning "/vsizip/path/to/the/file.zip" will work.

The special file handler is  VSIInstallZipFileHandler ()

How to use that capability with a .tar or .tar.gz/.tgz archive ?

Since GDAL 1.8.0, it is possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive. The syntax is very similar to the /vsizip case :

gdalinfo /vsitar/path/to/the/file.tar/path/inside/the/tar/file

or

gdalinfo /vsitar/path/to/the/file.tar.gz/path/inside/the/targz/file

Note that reading a file in a .tar.gz archive is far less efficient than in a .zip file, because in the .tar.gz case the whole archive is compressed, whereas in the .zip case, files are compressed individually. So reading the last file in a .tar.gz archive requires to uncompress all the files that are located before.

Reading a file in a .tar file will not expose that penalty.

The special file handler is  VSIInstallTarFileHandler ()

Drivers supporting that capability

The fact that this new capability is implemented as virtual file systems imply that it will only work for GDAL drivers supporting the "large file API". A non-exhaustive list of such drivers is :

  • PNG
  • JPEG
  • ILWIS
  • GTiff
  • GIF
  • JP2KAK
  • NITF
  • ADRG
  • DTED
  • SRTMHGT
  • BMP
  • LCP
  • HFA (Erdas Imagine)
  • AAIGRID
  • EHdr

The full list can be obtained by looking at the driver marked with 'v' when running 'gdalinfo --formats'

vsicurl - to read from HTTP or FTP files (partial downloading)

  • According to the 1.8.0 release notes, part of the virtual file system handlers is vsicurl.
  • This allows partial http or ftp downloading (i.e. ogrinfo a very large shapefile over the internet).
  • More information in the  VSIInstallCurlFileHandler () page

Example

ogrinfo a shapefile on the internet:

ogrinfo -ro -al -so /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.shp

Complex example (combining with vsizip)

ogrinfo a shapefile in a zip file on the internet:

ogrinfo -ro -al -so /vsizip/vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.zip

Complex example (combining with vsizip and password on ftp)

ogrinfo a shapefile in a zip file on an ftp:

ogrinfo -ro -al -so /vsizip/vsicurl/ftp://user:password@example.com/foldername/file.zip/example.shp 

Notes