Changes between Version 17 and Version 18 of UserDocs/ReadInZip


Ignore:
Timestamp:
Oct 9, 2017, 3:05:09 PM (7 years ago)
Author:
Even Rouault
Comment:

Content moved to official documentation

Legend:

Unmodified
Added
Removed
Modified
  • UserDocs/ReadInZip

    v17 v18  
    11= Reading a GDAL or OGR datasets in archive files =
    22
    3 == Summary ==
    4 
    5 From GDAL 1.6.0, it is possible to access a GDAL dataset inside a compressed archive and read it on-the-fly.
    6 The archive formats that are handled are single gzip'ed file (ending with .gz) and ZIP archives (ending with .zip).
    7 
    8 This is implemented as 2 virtual file systems (like `/vsimem` for
    9 example), `/vsigzip` and `/vsizip`. From .zip files only read-only access is supported, while the `/vsigzip` driver supports read and sequential write.
    10 Note that performance will not be very impressive, as random access inside gzip data is slow by nature, although some optimizations have been made to make it faster and generally usable ("snapshots" of the gzip state are taken from time to time, so that further access to a given offset inside the file just needs to restart decompression from the nearest snapshot)
    11 
    12 From GDAL 1.8.0, it is also possible to access a file inside a .tar (uncompressed) or
    13 a .tar.gz/.tgz (compressed) archive
    14 
    15 == How to use that capability with a gzip file ? ==
    16 
    17 For example :
    18 {{{
    19 gdalinfo /vsigzip/path/to/the/file.gz
    20 }}}
    21 were path/to/the/file.gz is relative or absolute.
    22 
    23 If the path is absolute, it should begin with a `/` on a Unix-like OS (or `C:\` on Windows), so the line looks like `/vsigzip//home/gdal/...`
    24 
    25 The first time that a .gz file is read, a small .gz.properties file will be
    26 generated (if possible) to capture the uncompressed data size. This will make
    27 following opening of that dataset much faster.
    28 
    29 A `VSIStatL("/vsigzip/...")` call will return the uncompressed size of the file.
    30 
    31 The special file handler is [http://gdal.org/cpl__vsi_8h.html#3cde09f204df6f417653b7af4761178e VSIInstallGZipFileHandler ()]
    32 
    33 == How to use that capability with a ZIP archive ? ==
    34 
    35 {{{
    36 gdalinfo /vsizip/path/to/the/file.zip/path/inside/the/zip/file
    37 }}}
    38 
    39 were path/to/the/file.zip is relative or absolute and path/inside/the/zip/file is
    40 the relative path to the file inside the archive.
    41 
    42 If the path is absolute, it should begin with a `/` on a Unix-like OS (or `C:\` on Windows), so the line looks like `/vsizip//home/gdal/...`
    43 
    44 For example `gdalinfo /vsizip/myarchive.zip/subdir1/file1.tif`
    45 
    46 The ReadDir() method is implemented for the .zip archives, so a driver will be
    47 able to find files relative to the given file inside the archive. For
    48 example, you can read a CADRG dataset from the zipped archive of the a.toc
    49 file and all its NITF tiles.
    50 
    51 A `VSIStatL("/vsizip/...")` call will return the uncompressed size of the file.
    52 Directories inside the ZIP file can be distinguished from regular files with
    53 the VSI_ISDIR(stat.st_mode) macro as for regular file systems.
    54 
    55 Small syntaxic sugar : if the .zip file contains only one file located at its
    56 root, just mentionning `/vsizip/path/to/the/file.zip` will work.
    57 
    58 The special file handler is [http://gdal.org/cpl__vsi_8h.html#884fac3cd6be2c09deb807e959d78b3a VSIInstallZipFileHandler ()]
    59 
    60 == How to use that capability with a .tar or .tar.gz/.tgz archive ? ==
    61 
    62 Since GDAL 1.8.0, it is possible to access a file inside a .tar (uncompressed) or
    63 a .tar.gz/.tgz (compressed) archive. The syntax is very similar to the `/vsizip` case :
    64 
    65 {{{
    66 gdalinfo /vsitar/path/to/the/file.tar/path/inside/the/tar/file
    67 }}}
    68 
    69 or
    70 
    71 {{{
    72 gdalinfo /vsitar/path/to/the/file.tar.gz/path/inside/the/targz/file
    73 }}}
    74 
    75 Note that reading a file in a .tar.gz archive is far less efficient than in a .zip file,
    76 because in the .tar.gz case the whole archive is compressed, whereas in the .zip case, files
    77 are compressed individually. So reading the last file in a .tar.gz archive requires to
    78 uncompress all the files that are located before.
    79 
    80 Reading a file in a .tar file will not expose that penalty.
    81 
    82 
    83 The special file handler is [http://gdal.org/cpl__vsi_8h.html#d6dd983338849e7da4eaa88f6458ab64 VSIInstallTarFileHandler ()]
    84 
    85 == Note for multi-file data types ==
    86 
    87 ...such as ''"EHdr/ESRI .hdr Labelled"''
    88 
    89 `/vsigzip` simply cannot work for datasets made of multiple files.
    90 
    91 For `/vsizip`, you must explicitly point to the image file within the zip file, so `/vsizip/test.zip/data.flt` instead of `/vsizip/test.zip`
    92 
    93 == Drivers supporting that capability ==
    94 
    95 The fact that this new capability is implemented as virtual file systems imply
    96 that it will only work for GDAL or OGR drivers supporting the "large file API".
    97 The full list of these formats can be obtained by looking at the driver marked with 'v' when running either `gdalinfo --formats` or `ogrinfo --formats`.
    98 
    99 A non-exhaustive list of such drivers is :
    100 * GDAL raster drivers
    101   * ADRG
    102   * AAIGrid
    103   * BMP
    104   * DTED
    105   * EHdr
    106   * GIF
    107   * GTiff
    108   * HFA
    109   * ILWIS
    110   * JP2KAK
    111   * JPEG
    112   * LCP
    113   * NITF
    114   * PNG
    115   * SRTMHGT
    116 * OGR vector drivers
    117   * CSV
    118   * DXF
    119   * ESRI Shapefile
    120   * GeoJSON
    121   * GeoRSS
    122   * GML
    123   * GPX
    124   * KML
    125   * MapInfo File
    126   * OpenFileGDB
    127   * OSM
    128   * SQLite
    129   * SVG
    130   * WFS
    131   * XLSX
    132 
    133 = vsicurl - to read from HTTP or FTP files (partial downloading) =
    134  * According to the [http://trac.osgeo.org/gdal/wiki/Release/1.8.0-News 1.8.0 release notes], part of the virtual file system handlers is vsicurl.
    135  * This allows partial http or ftp downloading (i.e. ogrinfo a very large shapefile over the internet).
    136  * More information in the [http://gdal.org/cpl__vsi_8h.html#4f791960f2d86713d16e99e9c0c36258 VSIInstallCurlFileHandler ()] page
    137 
    138 == Example ==
    139 ogrinfo a shapefile on the internet:
    140 
    141 {{{
    142 ogrinfo -ro -al -so /vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.shp
    143 }}}
    144 
    145 
    146 == Complex example (combining with vsizip) ==
    147 ogrinfo a shapefile in a zip file on the internet:
    148 
    149 {{{
    150 ogrinfo -ro -al -so /vsizip/vsicurl/http://svn.osgeo.org/gdal/trunk/autotest/ogr/data/poly.zip
    151 }}}
    152 
    153 
    154 == Complex example (combining with vsizip and password on ftp) ==
    155 ogrinfo a shapefile in a zip file on an ftp:
    156 
    157 {{{
    158 ogrinfo -ro -al -so /vsizip/vsicurl/ftp://user:password@example.com/foldername/file.zip/example.shp
    159 }}}
    160 
    161 
    162 == Notes ==
    163  * In GDAL 1.X, there's no easy way of knowing if an OGR driver supports VSI virtual file handlers such as vsizip, vsicurl, etc.  See more detail about this on the email list, [http://lists.osgeo.org/pipermail/gdal-dev/2011-April/028384.html]. With the GDAL/OGR unification in GDAL 2.0, vector drivers can now report if they support the "large file API"
    164  
     3Previous content is now available in [http://gdal.org/gdal_virtual_file_systems.html GDAL Virtual File Systems (compressed, network hosted, etc...): /vsimem, /vsizip, /vsitar, /vsicurl, ...] documentation page