Opened 12 years ago

Closed 5 years ago

#4658 closed enhancement (wontfix)

[PATCH] display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API

Reported by: etourigny Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: Utilities Version: svn-trunk
Severity: normal Keywords: gdalinfo vsi
Cc:

Description

This patch has 2 parts

1) show files inside a VSIFILE (/vsizip/ and /vsitar/) when gdalinfo and ogrinfo fail to open file directly. This uses VSIReadDirRecursive().

There could even be a commandline argument to force read vsi directory, in the case when the file is opened (esp. Shapefile driver which opens the file even if there are other datasets, see attached file).

2) Add VSIReaddirRecursive() function to API as a convenience function. I found this was lacking from the API. This way the listing of a vsi directory is the same as 'unzip file.zip' or 'tar tvf file.tar'.

This also works with /vsizip/vsicurlfile.zip , but only if server supports download by parts.

example output:

$   gdalinfo  /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip
ERROR 4: `/vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip' not recognised as a supported file format.

gdalinfo failed - unable to open '/vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip'.
VSIFILE contains:
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/landsat_b1.tif
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/landsat_b2.tif
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.dbf
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.prj
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.qml
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.shp
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/points.shx
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.xml
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.txt
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/tmp1.tif
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/folder/
       /vsizip//home/src/qgis/qgis-etiennesky/tests/testdata/testzip.zip/folder/landsat_b1.tif

Attachments (6)

vsireaddirrec.patch (10.7 KB ) - added by etourigny 12 years ago.
patch
testzip.zip (42.0 KB ) - added by etourigny 12 years ago.
example zip with subdir
vsireaddirrec2.patch (60.8 KB ) - added by etourigny 12 years ago.
vsireaddirrec-test.patch (2.3 KB ) - added by etourigny 12 years ago.
testzip.2.zip (5.3 KB ) - added by etourigny 12 years ago.
testzip3.zip (30.4 KB ) - added by etourigny 12 years ago.

Download all attachments as: .zip

Change History (19)

by etourigny, 12 years ago

Attachment: vsireaddirrec.patch added

patch

by etourigny, 12 years ago

Attachment: testzip.zip added

example zip with subdir

comment:1 by etourigny, 12 years ago

Summary: display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API[PATCH] display VSIFILE contents in gdalinfo and ogrinfo and add function VSIReadirRecursive to API

comment:2 by warmerdam, 12 years ago

Component: defaultUtilities
Keywords: gdalinfo vsi added

For what it's worth, I don't really feel this is an appropriate function for gdalinfo, but I won't stand in the way of adding it.

I have no objection to the addition of VSIReaddirRecursive().

comment:3 by etourigny, 12 years ago

Frank - do you mean the function declaration inside gdalinfo (which I will remove in favor of the one in port/cpl_vsi.h) is not appropriate, or the functionality of printing files inside the vsifile? Does this apply also to ogrinfo?

I find it useful, especially when loading remote files or when the gdal/ogrinfo fails, this lets the user/dev know what's inside.

comment:4 by warmerdam, 12 years ago

I just looked at the patch, and I see you actually add VSIReaddirRecursive1() inside gdalinfo.c even though you also add VSIReaddirRecursive() in gdal/port. The VSIReaddirRecursive1() is unnecessary and I presume this was an accident.

But my comment earlier was actually that I don't really feel that gdalinfo (or ogrinfo) ought to be producing file lists from inside .zip files. I feel that it out of scope for them.

But I don't feel very strongly and I won't interfere in your adding the capability.

comment:5 by Even Rouault, 12 years ago

Not commenting about the usefulness of the capability in gdalinfo/ogrinfo, but just about the patch itself.

- It would need a bit of cleanup to remove all the unused commented code.
- There's an irrelevant change in gdalinfo (printf( "End    = (%.15f,%.15f)\n", ...)
- The message "VSIFILE contains" isn't very user friendly. Something like "The archive contains" might be clearer.
- about VSIReadDirRecursive itself :
   * char szTemp1[1096], char szTemp2[1096] seems to be too small. On most systems, filenames can be longer. But I wouldn't fix that by increasing the size of the array on the stack, because due to the recursive nature of the function, there's a risk of stack overflow if encountering a specialy crafted file with a depth of several thousands of directories. "a/a/a/a/a/a/a..." ( on Win32, the stack size is 1 MB, so you 'just' need a depth of 500 to blow it up. I'd rather use dynamic allocation or CPLString.
   * The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too.
   * You should avoid using CSLCount() as the stop condition a loop because it has a linear complexity, making the loop quadratic
   * In the same idea, you could use the CPLStringList class recently integrated by Frank, which will save the linear cost of insertion of CSLAddString(). See VSIUnixStdioFilesystemHandler::ReadDir() as an example how to use it.

You can for example use the following script to generate such a pathological .zip file :

from osgeo import gdal
fmain = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip', 'wb')
filename = "a"
for i in range(4096):
    filename = filename + "/a"
finside = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip/' + filename, 'wb')
gdal.VSIFCloseL(finside)
gdal.VSIFCloseL(fmain)

By the way, while testing it, I discovered that r24421 was necessary.

in reply to:  5 ; comment:6 by etourigny, 12 years ago

Replying to rouault:

Not commenting about the usefulness of the capability in gdalinfo/ogrinfo, but just about the patch itself.

Can you please comment on that also? I was pondering adding a commandline option for this feature.

- It would need a bit of cleanup to remove all the unused commented code.

it was still a first prototype, will clean it up

  • There's an irrelevant change in gdalinfo (printf( "End = (%.15f,%.15f)\n", ...)

the "End" string is unrelated (left it there by accident) but I thought it would be a nice addition, because it gives the bounding box with full precision instead of what is given at the end of gdalinfo

  • The message "VSIFILE contains" isn't very user friendly. Something like "The archive contains" might be clearer.
  • about VSIReadDirRecursive itself :
    • char szTemp1[1096], char szTemp2[1096] seems to be too small. On most systems, filenames can be longer. But I wouldn't fix that by increasing the size of the array on the stack, because due to the recursive nature of the function, there's a risk of stack overflow if encountering a specialy crafted file with a depth of several thousands of directories. "a/a/a/a/a/a/a..." ( on Win32, the stack size is 1 MB, so you 'just' need a depth of 500 to blow it up. I'd rather use dynamic allocation or CPLString.
    • The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too.

this was for a gcc compilation warning,I think - I prefer to avoid them as much as possible

  • You should avoid using CSLCount() as the stop condition a loop because it has a linear complexity, making the loop quadratic

Will use an int variable to store CSLCount()

  • In the same idea, you could use the CPLStringList class recently integrated by Frank, which will save the linear cost of insertion of CSLAddString(). See VSIUnixStdioFilesystemHandler::ReadDir() as an example how to use it.

I was not aware that CPLStringList was more efficient thant other CPLString/CSLString functions, but rather a convenience. Will try that.

Is it ok to use CPLStringList inside each recursion, and then return StealList() - in terms of efficiency? This is probably better than change the prototype to CPLStringList VSIReadDirRecursive( const char *pszPath )

}}}

You can for example use the following script to generate such a pathological .zip file :

from osgeo import gdal
fmain = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip', 'wb')
filename = "a"
for i in range(4096):
    filename = filename + "/a"
finside = gdal.VSIFOpenL('/vsizip/bigdepthzip.zip/' + filename, 'wb')
gdal.VSIFCloseL(finside)
gdal.VSIFCloseL(fmain)

I will add that as a test script, as well as a valid zip using existing test data.

By the way, while testing it, I discovered that r24421 was necessary.

Many thanks for your comments!

in reply to:  6 comment:7 by Even Rouault, 12 years ago

Can you please comment on that also? I was pondering adding a commandline option for this feature.

I don't have a strong preference either, but I don't think a commandline option is necessary if you add it.

  • There's an irrelevant change in gdalinfo (printf( "End = (%.15f,%.15f)\n", ...)

the "End" string is unrelated (left it there by accident) but I thought it would be a nice addition, because it gives the bounding box with full precision instead of what is given at the end of gdalinfo

I don't see the interest of that redundant information. Might be better to increase the precision of the corner coordinates instead if you feel it is not sufficient enough.

  • The cast in (char *) "/" is unnecessary. And I think you can safely use "/" even on windows, because it accepts slash too.

this was for a gcc compilation warning,I think - I prefer to avoid them as much as possible

I've tested with different gcc versions. No warning

Is it ok to use CPLStringList inside each recursion, and then return StealList() - in terms of efficiency?

I think that should do it. This was just an opportunity to advertize CPLStringList.

comment:8 by etourigny, 12 years ago

Attaching another patch with your recommendations.

I also updated swig bindings (only tested with python, see autotest), but that generated quite a few changes, mainly in php. Don't know how that is updated normaly (I changed swig/include/cpl.i and did make clean; make from swig/ directory).

Let me know what you think, and if it's ok I will update the doc and commit (I'll wait your comments for the bindings).

A

by etourigny, 12 years ago

Attachment: vsireaddirrec2.patch added

by etourigny, 12 years ago

Attachment: vsireaddirrec-test.patch added

by etourigny, 12 years ago

Attachment: testzip.2.zip added

comment:9 by Even Rouault, 12 years ago

Looks fine to me. You can revert the changes in swig/php in your commit. The SWIG PHP bindings are kind of abandonned now and nobody has taken care of refreshing the corresponding swig generated files since years.

comment:10 by etourigny, 12 years ago

ok merci,

I just noticed an "odd" behaviour of ogr when dealing with zip files.

It seems that if there is one shapefile inside a zipfile, as well as other raster datasets, shapefile driver picks it up. Try with attached files testzip.zip and testzip3.zip.

$ ogrinfo -ro /vsizip/testzip3.zip
INFO: Open of `/vsizip/testzip3.zip'
      using driver `ESRI Shapefile' successful.
1: points (Point)

$ unzip -l testzip3.zip 
Archive:  testzip3.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    40684  2012-04-19 10:13   landsat_b1.tif
    40684  2012-04-19 11:39   landsat_b2.tif
     2237  2012-03-08 18:30   points.dbf
     3328  2012-05-16 08:31   points.geojson
      143  2012-03-08 18:30   points.prj
     3855  2012-03-23 18:08   points.qml
      576  2012-03-08 18:30   points.shp
      236  2012-03-08 18:30   points.shx
---------                     -------
    91743                     8 files

Another point, if there are several raster datasets, but no vector datasets, then it prints out ALL of the files inside the zip, using VSIArchiveFilesystemHandler::OpenArchiveFile().

$ ogrinfo -ro -so /vsizip/gcore/data/testzip.zip 
ERROR 6: Support only 1 file in archive file gcore/data/testzip.zip when no explicit in-archive filename is specified
You could try one of the following :
  /vsizip/gcore/data/testzip.zip/subdir
  /vsizip/gcore/data/testzip.zip/subdir/subdir
  /vsizip/gcore/data/testzip.zip/subdir/subdir/uint16.tif
  /vsizip/gcore/data/testzip.zip/subdir/subdir/test_rpc.txt
  /vsizip/gcore/data/testzip.zip/subdir/test_rpc.txt
  /vsizip/gcore/data/testzip.zip/test_rpc.txt
  /vsizip/gcore/data/testzip.zip/uint16.tif

FAILURE:
Unable to open datasource `/vsizip/gcore/data/testzip.zip' with the following drivers.

The first could be considered a bug (I think I sould file a detailed report), whereas the second means that VSIArchiveFilesystemHandler::OpenArchiveFile() could be used instead of VSIReadDir() but there is not c api for that - correct?

This also has the side-effect that changes my proposed changes to ogrinfo can generate extra output:

$ ogrinfo -ro -so /vsizip/gcore/data/twofileinsubdir.zip
ERROR 6: Support only 1 file in archive file gcore/data/twofileinsubdir.zip when no explicit in-archive filename is specified
You could try one of the following :
  /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir
  /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/byte.tif
  /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/dummy

Unable to open source `/vsizip/gcore/data/twofileinsubdir.zip' directly.
The archive contains:
       /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/
       /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/byte.tif
       /vsizip/gcore/data/twofileinsubdir.zip/twofileinsubdir/dummy

I'll hold off commiting the changes to ogrinfo in the meantime.

by etourigny, 12 years ago

Attachment: testzip3.zip added

comment:11 by etourigny, 12 years ago

commited to trunk r24423 (code, excluding ogrinfo) and r24424 (tests)

comment:12 by Even Rouault, 12 years ago

Etienne,

The behaviour with ogrinfo -ro /vsizip/testzip3.zip is expected. Technically /vsizip/testzip3.zip is seen as a directory by the shapefile driver, and a directory where thera are .shp files is handled by the shapefile driver. You can try unzipping the file in a directory and pass the directory name to ogrinfo.

Hum, when reading your ticket, it somehow reminded me of something, but I forgot I had implemented that in VSIArchiveFilesystemHandler::OpenArchiveFile()... I think you can now remove the following code from OpenArchiveFile() to avoid duplicating with your new stuff :

            CPLString msg;
            msg.Printf("Support only 1 file in archive file %s when no explicit in-archive filename is specified",
                       archiveFilename);
            const VSIArchiveContent* content = GetContentOfArchive(archiveFilename, poReader);
            if (content)
            {
                int i;
                msg += "\nYou could try one of the following :\n";
                for(i=0;i<content->nEntries;i++)
                {
                    msg += CPLString().Printf("  %s/%s/%s\n", GetPrefix(), archiveFilename, content->entries[i].fileName);
                }
            }

            CPLError(CE_Failure, CPLE_NotSupported, "%s", msg.c_str());

comment:13 by Even Rouault, 5 years ago

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.

Note: See TracTickets for help on using tickets.