Reading a GDAL dataset in a .gz file or a .zip archive
From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly. The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip).
This is implemented as 2 virtual file systems (like /vsimem for example), /vsigzip and /vsizip. From .zip files only read-only access is supported, while the /vsigzip driver supports read and sequential write. Note that performance will not be very impressive, as random access inside gzip data is slow by nature, although some optimizations have been made to make it faster and generally usable ("snapshots" of the gzip state are taken from time to time, so that further access to a given offset inside the file just needs to restart decompression from the nearest snapshot)
From GDAL 1.8.0, it is also possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive
How to use that capability with a gzip file ?
For example :
were path/to/the/file.gz is relative or absolute.
If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsigziphome/gdal/...
The first time that a .gz file is read, a small .gz.properties file will be generated (if possible) to capture the uncompressed data size. This will make following opening of that dataset much faster.
A VSIStatL("/vsigzip/...") call will return the uncompressed size of the file.
The special file handler is VSIInstallGZipFileHandler ()
How to use that capability with a ZIP archive ?
were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is the relative path to the file inside the archive.
If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsiziphome/gdal/...
For example gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif
The ReadDir?() method is implemented for the .zip archives, so a driver will be able to find files relative to the given file inside the archive. For example, you can read a CADRG dataset from the zipped archive of the a.toc file and all its NITF tiles.
A VSIStatL("/vsizip/...") call will return the uncompressed size of the file. Directories inside the ZIP file can be distinguished from regular files with the VSI_ISDIR(stat.st_mode) macro as for regular file systems.
Small syntaxic sugar : if the .zip file contains only one file located at its root, just mentionning "/vsizip/path/to/the/file.zip" will work.
The special file handler is VSIInstallZipFileHandler ()
How to use that capability with a .tar or .tar.gz/.tgz archive ?
Since GDAL 1.8.0, it is possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive. The syntax is very similar to the /vsizip case :
Note that reading a file in a .tar.gz archive is far less efficient than in a .zip file, because in the .tar.gz case the whole archive is compressed, whereas in the .zip case, files are compressed individually. So reading the last file in a .tar.gz archive requires to uncompress all the files that are located before.
Reading a file in a .tar file will not expose that penalty.
The special file handler is VSIInstallTarFileHandler ()
Note for multi-file data types
...such as "EHdr/ESRI .hdr Labelled"
/vsigzip simply cannot work for datasets made of multiple files.
For /vsizip, you must explicitly point to the image file within the zip file, so /vsizip/test.zip/data.flt instead of /vsizip/test.zip
Drivers supporting that capability
The fact that this new capability is implemented as virtual file systems imply that it will only work for GDAL drivers supporting the "large file API". A non-exhaustive list of such drivers is :
- HFA (Erdas Imagine)
The full list can be obtained by looking at the driver marked with 'v' when running 'gdalinfo --formats'
vsicurl - to read from HTTP or FTP files (partial downloading)
- According to the 1.8.0 release notes, part of the virtual file system handlers is vsicurl.
- This allows partial http or ftp downloading (i.e. ogrinfo a very large shapefile over the internet).
- More information in the VSIInstallCurlFileHandler () page
ogrinfo a shapefile on the internet:
ogrinfo -ro -al -so /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.shp
Complex example (combining with vsizip)
ogrinfo a shapefile in a zip file on the internet:
ogrinfo -ro -al -so /vsizip/vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.zip
Complex example (combining with vsizip and password on ftp)
ogrinfo a shapefile in a zip file on an ftp:
ogrinfo -ro -al -so /vsizip/vsicurl/ftp://user:email@example.com/foldername/file.zip/example.shp
- In GDAL 1.X, there's no easy way of knowing if an OGR driver supports VSI virtual file handlers such as vsizip, vsicurl, etc. See more detail about this on the email list, http://lists.osgeo.org/pipermail/gdal-dev/2011-April/028384.html. With the GDAL/OGR unification in GDAL 2.0, vector drivers can now report if they support the "large file API"