= Reading a GDAL dataset in a .gz file or a .zip archive = == Summary == From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly. The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip). This is implemented as 2 virtual file systems (like /vsimem for example), /vsigzip and /vsizip. From .zip files only read-only access is supported, while the /vsigzip driver supports read and sequential write. Note that performance will not be very impressive, as random access inside gzip data is slow by nature, although some optimizations have been made to make it faster and generally usable ("snapshots" of the gzip state are taken from time to time, so that further access to a given offset inside the file just needs to restart decompression from the nearest snapshot) From GDAL 1.8.0, it is also possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive == How to use that capability with a gzip file ? == For example : {{{ gdalinfo /vsigzip/path/to/the/file.gz }}} were path/to/the/file.gz is relative or absolute. If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsigzip//home/gdal/... The first time that a .gz file is read, a small .gz.properties file will be generated (if possible) to capture the uncompressed data size. This will make following opening of that dataset much faster. A VSIStatL("/vsigzip/...") call will return the uncompressed size of the file. The special file handler is [http://gdal.org/cpl__vsi_8h.html#3cde09f204df6f417653b7af4761178e VSIInstallGZipFileHandler ()] == How to use that capability with a ZIP archive ? == {{{ gdalinfo /vsizip/path/to/the/file.zip/path/inside/the/zip/file }}} were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is the relative path to the file inside the archive. If the path is absolute, it should begin with a / on a Unix-like OS (or C:\ on Windows), so the line looks like /vsizip//home/gdal/... For example gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif The ReadDir() method is implemented for the .zip archives, so a driver will be able to find files relative to the given file inside the archive. For example, you can read a CADRG dataset from the zipped archive of the a.toc file and all its NITF tiles. A VSIStatL("/vsizip/...") call will return the uncompressed size of the file. Directories inside the ZIP file can be distinguished from regular files with the VSI_ISDIR(stat.st_mode) macro as for regular file systems. Small syntaxic sugar : if the .zip file contains only one file located at its root, just mentionning "/vsizip/path/to/the/file.zip" will work. The special file handler is [http://gdal.org/cpl__vsi_8h.html#884fac3cd6be2c09deb807e959d78b3a VSIInstallZipFileHandler ()] == How to use that capability with a .tar or .tar.gz/.tgz archive ? == Since GDAL 1.8.0, it is possible to access a file inside a .tar (uncompressed) or a .tar.gz/.tgz (compressed) archive. The syntax is very similar to the /vsizip case : {{{ gdalinfo /vsitar/path/to/the/file.tar/path/inside/the/tar/file }}} or {{{ gdalinfo /vsitar/path/to/the/file.tar.gz/path/inside/the/targz/file }}} Note that reading a file in a .tar.gz archive is far less efficient than in a .zip file, because in the .tar.gz case the whole archive is compressed, whereas in the .zip case, files are compressed individually. So reading the last file in a .tar.gz archive requires to uncompress all the files that are located before. Reading a file in a .tar file will not expose that penalty. The special file handler is [http://gdal.org/cpl__vsi_8h.html#d6dd983338849e7da4eaa88f6458ab64 VSIInstallTarFileHandler ()] == Drivers supporting that capability == The fact that this new capability is implemented as virtual file systems imply that it will only work for GDAL drivers supporting the "large file API". A non-exhaustive list of such drivers is : * PNG * JPEG * ILWIS * GTiff * GIF * JP2KAK * NITF * ADRG * DTED * SRTMHGT * BMP * LCP * HFA (Erdas Imagine) * AAIGRID * EHdr The full list can be obtained by looking at the driver marked with 'v' when running 'gdalinfo --formats' = vsicurl - to read from HTTP or FTP files (partial downloading) = * According to the [http://trac.osgeo.org/gdal/wiki/Release/1.8.0-News 1.8.0 release notes], part of the virtual file system handlers is vsicurl. * This allows partial http or ftp downloading (i.e. ogrinfo a very large shapefile over the internet). * More information in the [http://gdal.org/cpl__vsi_8h.html#4f791960f2d86713d16e99e9c0c36258 VSIInstallCurlFileHandler ()] page == Example == ogrinfo a shapefile on the internet: {{{ ogrinfo -ro -al -so /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.shp }}} == Complex example (combining with vsizip) == ogrinfo a shapefile in a zip file on the internet: {{{ ogrinfo -ro -al -so /vsizip/vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.zip }}} == Complex example (combining with vsizip and password on ftp) == ogrinfo a shapefile in a zip file on an ftp: {{{ ogrinfo -ro -al -so /vsizip/vsicurl/ftp://user:password@example.com/foldername/file.zip/example.shp }}} == Notes == * There's no easy way of knowing if an OGR driver supports VSI virtual file handlers such as vsizip, vsicurl, etc. See more detail about this on the email list, [http://lists.osgeo.org/pipermail/gdal-dev/2011-April/028384.html]