#1369 closed enhancement (fixed)
read gzipped files in situ (e.g. raster.tif.gz)
Reported by: | Owned by: | warmerdam | |
---|---|---|---|
Priority: | low | Milestone: | 1.6.0 |
Component: | GDAL_Raster | Version: | 1.4.0 |
Severity: | minor | Keywords: | zip gzip |
Cc: |
Description (last modified by )
I've run into a fair amount of raster imagery, OnEarth for example, which is available gzipped instead of using an internal compression. It would be very convenient if gdal could read gzipped files in situ. Having to decompress the files to a scratch space first means having to keep a lot extra room around to manouver. In other applications which do read .gz files such as VTP it also seems to be a lot faster to read/write a .gz than an uncompressed file. Perhaps because of less disk use (?).
Attachments (3)
Change History (13)
comment:4 by , 17 years ago
Description: | modified (diff) |
---|---|
Priority: | highest → low |
comment:5 by , 16 years ago
Here's a patch that adds a new class, VSIGZipHandle, which implements the large file API and enable the transparent use of gzip'd file by drivers using the large file API. It only supports reading.
In VSIFOpenL, we look for the first 2 bytes. If they match the magic header of GZip file, we wrap the poVirtualHandle with a VSIGZipHandle object.
It required a change in GDALOpenInfo to first try to open the file with the large file API, instead of first trying with the old file API. This way the drivers will see the uncompressed stream as pabyHeader instead of the compressed one.
Tested successfully with GTiff and NITF drivers. A bit slower than access to uncompressed file on a MODIS and Ikonos TIFF image, but usable. A gdalinfo on a .tif.gz is slow since for some reason the GTiff driver seeks nearly to the end of the TIF file.
I didn't do any effort to make seeking fast, so depending on the way the file is accessed, it may be really slow. I've tested it on onearth images, and... it's actually slow.
by , 16 years ago
Attachment: | gdal_svn_trunk_gzip.patch added |
---|
Add capability of reading .gz files transparently from GDAL drivers
comment:6 by , 16 years ago
Even,
I think we should avoid the changes in GDALOpen and CPLReadDir. Instead I think gzip files should be accessed with the filename prefix /vsizip/ with the remainder of the path being the real path. So the abc.png file in /usr/data/def.zip would have a virtual filename "/vsizip/usr/data/def.zip/abc.png".
This wouldn't address how to navigate into a zip file directly or how to GDALOpen() it directly, but I think that aspect needs to be carefully considered.
Access to TIFF files in .zip files will generally suck because the TIFF format uses a lot of seeking, and typically the directory (the first thing read) is at the end of the file.
comment:7 by , 16 years ago
I'm attaching an improved version of the patch that adds quite fast random seek. It is done thanks the regular creation of 'snapshots' by using inflateCopy that can dump the gzip state. Performance when using .tif.gz in openev is thus much improved (after the initial seek at the end of the file that can take some time, but I don't see how this could be avoided) and makes them usable on the test cases I mentionned in my first post.
Frank, as far as your remarks are concerned, this version doesn't take them into account yet as I've a few remarks/questions too :
- I think we should use rather /vsigzip for GZIP files and keep /vsizip for ZIP (PkZIP) files
- The syntax /vsizip/usr/data/def.zip/abc.png seems a good idaa of ZIP files. But for gzip ones, IMHO, it doesn't make much sense, except if the .gz is in fact a .tar.gz
- The syntax /vsigzip/ should be kept intern to GDAL or for advanced users. I think we must find a way such that a "openev foo.xxx.gz" works
- For a .tar.gz, the user should be able to do "openev /usr/data/def.tar.gz/abc.png"
- For a .zip, the user should be able to do "openev /usr/data/def.zip/abc.png". But if the .zip is made of a single file (SRTM HGT files can be downloaded as single file ZIP archive for example), "openev /usr/data/def.zip" should work too.
All in all, the main question is : where to put the magic to translate from the "user friendly filename" to the GDAL virtual filename.
by , 16 years ago
Attachment: | gdal_svn_trunk_gzip_faster_random_seek.patch added |
---|
comment:8 by , 16 years ago
Another improved version of the patch that adds support for .zip files. Reading of zip files is done through unzip.c from contrib/minizip in zlib-1.2.3. With the zip file handler, I can read GTiff, BT and CADRG datasets included in .zip files.
I've also added /vsigzip and /vsizip FilesystemHandler.
However I've still kept/added in VSIFOpenL, VSIFStatL and VSIReadDir the logic that automagically translates from the natural filename (/usr/data/def.zip/abc.png, foo.xxx.gz) to the virtual filename (/vsizip/usr/data/def.zip/abc.png, /vsigzip/foo.xxx.gz). I don't really see how to do it differently.
To sum up the current possibilities, one can do :
- gdalinfo foo.gz
- gdalinfo /vsigzip/foo.gz
- gdalinfo foo.zip (if the zip file contains only 1 file)
- gdalinfo /vsizip/foo.zip (if the zip file contains only 1 file)
- gdalinfo foo.zip/foo.XXX
- gdalinfo /vsizip/foo.zip/foo.XXX
comment:9 by , 16 years ago
Patch updated with changes required in srtmhgtdataset.cpp such as .hgt.zip downloaded from ftp://e0srp01u.ecs.nasa.gov/srtm/version2 can be read as such.
by , 16 years ago
Attachment: | gdal_svn_trunk_gzip_and_zip.patch added |
---|
comment:11 by , 15 years ago
Keywords: | zip gzip added |
---|---|
Milestone: | → 1.6.0 |
Resolution: | → fixed |
Status: | new → closed |
I've commited from r15211 to r15221 the necessary code to read data directly from .gz and .zip. 100% based on gdal_svn_trunk_gzip_and_zip.patch + some minor improvements.
Except, I didn't commit the "magic" parts in cpl_vsil.cpp that autodetect that the passed file is a .gz or .zip and prepend the right prefix in front of the file.
So the following will work :
gdalinfo /vsigzip/foo.gz gdalinfo /vsizip/foo.zip (if the zip file contains only 1 file) gdalinfo /vsizip/foo.zip/foo.XXX
Import note : the drivers must support VSI*L API to use those new capabilities.