| 1 | = Reading a GDAL dataset in a .gz file or a .zip archive = |
| 2 | |
| 3 | == Summary == |
| 4 | |
| 5 | From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly. |
| 6 | The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip) |
| 7 | |
| 8 | This is implemented as 2 virtual file systems (like /vsimem for |
| 9 | example), /vsigzip and /vsizip. Only read-only access is supported. |
| 10 | Note that performance will not be very impressive, as random access inside gzip data |
| 11 | is slow by nature, although some optimizations have been made to make it |
| 12 | faster and generally usable ("snapshots" of the gzip state are taken from time to time, so |
| 13 | that further access to a given offset inside the file just needs to restart decompression |
| 14 | from the nearest snapshot) |
| 15 | |
| 16 | Note that .tar.gz archives are not supported, and there's no plan to support them |
| 17 | as it is a very inappropriate format for seeking. |
| 18 | |
| 19 | == How to use that capability with a gzip file ? == |
| 20 | |
| 21 | For example : |
| 22 | {{{ |
| 23 | gdalinfo /vsigzip/path/to/the/file.gz |
| 24 | }}} |
| 25 | were path/to/the/file.gz is relative or absolute. If the path is absolute, it should begin with a /, so the line looks like /vsigzip//home/gdal/... |
| 26 | |
| 27 | The first time that a .gz file is read, a small .gz.properties file will be |
| 28 | generated (if possible) to capture the uncompressed data size. This will make |
| 29 | following opening of that dataset much faster. |
| 30 | |
| 31 | A VSIStatL("/vsigzip/...") call will return the uncompressed size of the file. |
| 32 | |
| 33 | == How to use that capability with a ZIP archive ? == |
| 34 | |
| 35 | {{{ |
| 36 | gdalinfo /vsizip/path/to/the/file.zip/path/inside/the/zip/file |
| 37 | }}} |
| 38 | |
| 39 | were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is |
| 40 | the relative path to the file inside the archive. |
| 41 | |
| 42 | For example gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif |
| 43 | |
| 44 | The ReadDir() method is implemented for the .zip archives, so a driver will be |
| 45 | able to find files relative to the given file inside the archive. For |
| 46 | example, you can read a CADRG dataset from the zipped archive of the a.toc |
| 47 | file and all its NITF tiles. |
| 48 | |
| 49 | A VSIStatL("/vsizip/...") call will return the uncompressed size of the file. |
| 50 | Directories inside the ZIP file can be distinguished from regular files with |
| 51 | the VSI_ISDIR(stat.st_mode) macro as for regular file systems. |
| 52 | |
| 53 | Small syntaxic sugar : if the .zip file contains only one file located at its |
| 54 | root, just mentionning "/vsizip/path/to/the/file.zip" will work. |
| 55 | |
| 56 | == Drivers supporting that capability == |
| 57 | |
| 58 | The fact that this new capability is implemented as virtual file systems imply |
| 59 | that it will only work for GDAL drivers supporting the "large file API". A |
| 60 | list of such drivers is : PNG, JPEG, ILWIS, GTiff, GIF, JP2KAK, NITF, ADRG, |
| 61 | DTED, SRTMHGT, BMP, LCP, HFA (Erdas Imagine), AAIGRID. Other drivers may work |
| 62 | too (I just looked for those advertizing the GDAL_DCAP_VIRTUALIO capability) |