Opened 12 years ago
Closed 5 years ago
#4658 closed enhancement (wontfix)
[PATCH] display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API
Reported by: | etourigny | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | closed_because_of_github_migration |
Component: | Utilities | Version: | svn-trunk |
Severity: | normal | Keywords: | gdalinfo vsi |
Cc: |
Description
This patch has 2 parts
1) show files inside a VSIFILE (/vsizip/ and /vsitar/) when gdalinfo and ogrinfo fail to open file directly. This uses VSIReadDirRecursive().
There could even be a commandline argument to force read vsi directory, in the case when the file is opened (esp. Shapefile driver which opens the file even if there are other datasets, see attached file).
2) Add VSIReaddirRecursive() function to API as a convenience function. I found this was lacking from the API. This way the listing of a vsi directory is the same as 'unzip file.zip' or 'tar tvf file.tar'.
This also works with /vsizip/vsicurlfile.zip , but only if server supports download by parts.
example output:
$ gdalinfo /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip ERROR 4: `/vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip' not recognised as a supported file format. gdalinfo failed - unable to open '/vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip'. VSIFILE contains: /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/landsat_b1.tif /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/landsat_b2.tif /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.dbf /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.prj /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.qml /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.shp /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.shx /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.xml /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.txt /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.tif /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/folder/ /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/folder/landsat_b1.tif
Attachments (6)
Change History (19)
by , 12 years ago
Attachment: | vsireaddirrec.patch added |
---|
comment:1 by , 12 years ago
Summary: | display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API → [PATCH] display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API |
---|
comment:2 by , 12 years ago
Component: | default → Utilities |
---|---|
Keywords: | gdalinfo vsi added |
For what it's worth, I don't really feel this is an appropriate function for gdalinfo, but I won't stand in the way of adding it.
I have no objection to the addition of VSIReaddirRecursive().
comment:3 by , 12 years ago
Frank - do you mean the function declaration inside gdalinfo (which I will remove in favor of the one in port/cpl_vsi.h) is not appropriate, or the functionality of printing files inside the vsifile? Does this apply also to ogrinfo?
I find it useful, especially when loading remote files or when the gdal/ogrinfo fails, this lets the user/dev know what's inside.
comment:4 by , 12 years ago
I just looked at the patch, and I see you actually add VSIReaddirRecursive1() inside gdalinfo.c even though you also add VSIReaddirRecursive() in gdal/port. The VSIReaddirRecursive1() is unnecessary and I presume this was an accident.
But my comment earlier was actually that I don't really feel that gdalinfo (or ogrinfo) ought to be producing file lists from inside .zip files. I feel that it out of scope for them.
But I don't feel very strongly and I won't interfere in your adding the capability.
follow-up: 6 comment:5 by , 12 years ago
Not commenting about the usefulness of the capability in gdalinfo/ogrinfo, but just about the patch itself.
- It would need a bit of cleanup to remove all the unused commented code. - There's an irrelevant change in gdalinfo (printf( "End = (%.15f,%.15f)\n", ...) - The message "VSIFILE contains" isn't very user friendly. Something like "The archive contains" might be clearer. - about VSIReadDirRecursive itself : * char szTemp1[1096], char szTemp2[1096] seems to be too small. On most systems, filenames can be longer. But I wouldn't fix that by increasing the size of the array on the stack, because due to the recursive nature of the function, there's a risk of stack overflow if encountering a specialy crafted file with a depth of several thousands of directories. "a/a/a/a/a/a/a..." ( on Win32, the stack size is 1 MB, so you 'just' need a depth of 500 to blow it up. I'd rather use dynamic allocation or CPLString. * The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too. * You should avoid using CSLCount() as the stop condition a loop because it has a linear complexity, making the loop quadratic * In the same idea, you could use the CPLStringList class recently integrated by Frank, which will save the linear cost of insertion of CSLAddString(). See VSIUnixStdioFilesystemHandler::ReadDir() as an example how to use it.
You can for example use the following script to generate such a pathological .zip file :
from osgeo import gdal fmain = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip', 'wb') filename = "a" for i in range(4096): filename = filename + "/a" finside = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip/' + filename, 'wb') gdal.VSIFCloseL(finside) gdal.VSIFCloseL(fmain)
By the way, while testing it, I discovered that r24421 was necessary.
follow-up: 7 comment:6 by , 12 years ago
Replying to rouault:
Not commenting about the usefulness of the capability in gdalinfo/ogrinfo, but just about the patch itself.
Can you please comment on that also? I was pondering adding a commandline option for this feature.
- It would need a bit of cleanup to remove all the unused commented code.
it was still a first prototype, will clean it up
- There's an irrelevant change in gdalinfo (printf( "End = (%.15f,%.15f)\n", ...)
the "End" string is unrelated (left it there by accident) but I thought it would be a nice addition, because it gives the bounding box with full precision instead of what is given at the end of gdalinfo
- The message "VSIFILE contains" isn't very user friendly. Something like "The archive contains" might be clearer.
- about VSIReadDirRecursive itself :
- char szTemp1[1096], char szTemp2[1096] seems to be too small. On most systems, filenames can be longer. But I wouldn't fix that by increasing the size of the array on the stack, because due to the recursive nature of the function, there's a risk of stack overflow if encountering a specialy crafted file with a depth of several thousands of directories. "a/a/a/a/a/a/a..." ( on Win32, the stack size is 1 MB, so you 'just' need a depth of 500 to blow it up. I'd rather use dynamic allocation or CPLString.
- The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too.
this was for a gcc compilation warning,I think - I prefer to avoid them as much as possible
- You should avoid using CSLCount() as the stop condition a loop because it has a linear complexity, making the loop quadratic
Will use an int variable to store CSLCount()
- In the same idea, you could use the CPLStringList class recently integrated by Frank, which will save the linear cost of insertion of CSLAddString(). See VSIUnixStdioFilesystemHandler::ReadDir() as an example how to use it.
I was not aware that CPLStringList was more efficient thant other CPLString/CSLString functions, but rather a convenience. Will try that.
Is it ok to use CPLStringList inside each recursion, and then return StealList() - in terms of efficiency? This is probably better than change the prototype to CPLStringList VSIReadDirRecursive( const char *pszPath )
}}}
You can for example use the following script to generate such a pathological .zip file :
from osgeo import gdal fmain = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip', 'wb') filename = "a" for i in range(4096): filename = filename + "/a" finside = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip/' + filename, 'wb') gdal.VSIFCloseL(finside) gdal.VSIFCloseL(fmain)
I will add that as a test script, as well as a valid zip using existing test data.
By the way, while testing it, I discovered that r24421 was necessary.
Many thanks for your comments!
comment:7 by , 12 years ago
Can you please comment on that also? I was pondering adding a commandline option for this feature.
I don't have a strong preference either, but I don't think a commandline option is necessary if you add it.
- There's an irrelevant change in gdalinfo (printf( "End = (%.15f,%.15f)\n", ...)
the "End" string is unrelated (left it there by accident) but I thought it would be a nice addition, because it gives the bounding box with full precision instead of what is given at the end of gdalinfo
I don't see the interest of that redundant information. Might be better to increase the precision of the corner coordinates instead if you feel it is not sufficient enough.
- The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too.
this was for a gcc compilation warning,I think - I prefer to avoid them as much as possible
I've tested with different gcc versions. No warning
Is it ok to use CPLStringList inside each recursion, and then return StealList() - in terms of efficiency?
I think that should do it. This was just an opportunity to advertize CPLStringList.
comment:8 by , 12 years ago
Attaching another patch with your recommendations.
I also updated swig bindings (only tested with python, see autotest), but that generated quite a few changes, mainly in php. Don't know how that is updated normaly (I changed swig/include/cpl.i and did make clean; make from swig/ directory).
Let me know what you think, and if it's ok I will update the doc and commit (I'll wait your comments for the bindings).
A
by , 12 years ago
Attachment: | vsireaddirrec2.patch added |
---|
by , 12 years ago
Attachment: | vsireaddirrec-test.patch added |
---|
by , 12 years ago
Attachment: | testzip.2.zip added |
---|
comment:9 by , 12 years ago
Looks fine to me. You can revert the changes in swig/php in your commit. The SWIG PHP bindings are kind of abandonned now and nobody has taken care of refreshing the corresponding swig generated files since years.
comment:10 by , 12 years ago
ok merci,
I just noticed an "odd" behaviour of ogr when dealing with zip files.
It seems that if there is one shapefile inside a zipfile, as well as other raster datasets, shapefile driver picks it up. Try with attached files testzip.zip and testzip3.zip.
$ ogrinfo -ro /vsizip/testzip3.zip INFO: Open of `/vsizip/testzip3.zip' using driver `ESRI Shapefile' successful. 1: points (Point) $ unzip -l testzip3.zip Archive: testzip3.zip Length Date Time Name --------- ---------- ----- ---- 40684 2012-04-19 10:13 landsat_b1.tif 40684 2012-04-19 11:39 landsat_b2.tif 2237 2012-03-08 18:30 points.dbf 3328 2012-05-16 08:31 points.geojson 143 2012-03-08 18:30 points.prj 3855 2012-03-23 18:08 points.qml 576 2012-03-08 18:30 points.shp 236 2012-03-08 18:30 points.shx --------- ------- 91743 8 files
Another point, if there are several raster datasets, but no vector datasets, then it prints out ALL of the files inside the zip, using VSIArchiveFilesystemHandler::OpenArchiveFile().
$ ogrinfo -ro -so /vsizip/gcore/data/testzip.zip ERROR 6: Support only 1 file in archive file gcore/data/testzip.zip when no explicit in-archive filename is specified You could try one of the following : /vsizip/gcore/data/testzip.zip/subdir /vsizip/gcore/data/testzip.zip/subdir/subdir /vsizip/gcore/data/testzip.zip/subdir/subdir/uint16.tif /vsizip/gcore/data/testzip.zip/subdir/subdir/test_rpc.txt /vsizip/gcore/data/testzip.zip/subdir/test_rpc.txt /vsizip/gcore/data/testzip.zip/test_rpc.txt /vsizip/gcore/data/testzip.zip/uint16.tif FAILURE: Unable to open datasource `/vsizip/gcore/data/testzip.zip' with the following drivers.
The first could be considered a bug (I think I sould file a detailed report), whereas the second means that VSIArchiveFilesystemHandler::OpenArchiveFile() could be used instead of VSIReadDir() but there is not c api for that - correct?
This also has the side-effect that changes my proposed changes to ogrinfo can generate extra output:
$ ogrinfo -ro -so /vsizip/gcore/data/twofileinsubdir.zip ERROR 6: Support only 1 file in archive file gcore/data/twofileinsubdir.zip when no explicit in-archive filename is specified You could try one of the following : /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/byte.tif /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/dummy Unable to open source `/vsizip/gcore/data/twofileinsubdir.zip' directly. The archive contains: /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/ /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/byte.tif /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/dummy
I'll hold off commiting the changes to ogrinfo in the meantime.
by , 12 years ago
Attachment: | testzip3.zip added |
---|
comment:12 by , 12 years ago
Etienne,
The behaviour with ogrinfo -ro /vsizip/testzip3.zip is expected. Technically /vsizip/testzip3.zip is seen as a directory by the shapefile driver, and a directory where thera are .shp files is handled by the shapefile driver. You can try unzipping the file in a directory and pass the directory name to ogrinfo.
Hum, when reading your ticket, it somehow reminded me of something, but I forgot I had implemented that in VSIArchiveFilesystemHandler::OpenArchiveFile()... I think you can now remove the following code from OpenArchiveFile() to avoid duplicating with your new stuff :
CPLString msg; msg.Printf("Support only 1 file in archive file %s when no explicit in-archive filename is specified", archiveFilename); const VSIArchiveContent* content = GetContentOfArchive(archiveFilename, poReader); if (content) { int i; msg += "\nYou could try one of the following :\n"; for(i=0;i<content->nEntries;i++) { msg += CPLString().Printf(" %s/%s/%s\n", GetPrefix(), archiveFilename, content->entries[i].fileName); } } CPLError(CE_Failure, CPLE_NotSupported, "%s", msg.c_str());
comment:13 by , 5 years ago
Milestone: | → closed_because_of_github_migration |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.
patch