Opened 2 months ago

Closed 2 months ago

Last modified 2 months ago

#7143 closed defect (fixed)

vsis3: keys with spaces and special characters cause bad behaviour

Reported by: tveastman Owned by: warmerdam
Priority: normal Milestone: 2.2.3
Component: default Version: unspecified
Severity: normal Keywords: s3 vsis3 url encoding
Cc: robert.coup@…

Description

My test bucket contains the following tree

(This bucket is public, please feel welcome to access it yourself for testing):

$ aws s3 ls --recursive public-bucket-gdal-vsis3-tests

2017-11-10 14:15:19        237 alpha
2017-11-06 11:30:47        699 alpha/a-zip-file.zip
2017-11-03 15:17:30         30 alpha/delta.txt
2017-11-03 15:17:30         23 alpha/gamma.txt
2017-11-06 11:30:47        699 beta/a-zip-file.zip
2017-11-03 15:17:30          8 beta/eta.txt
2017-11-10 12:11:34         32 directory with spaces/file.txt
2017-11-10 14:13:29         77 directory with spaces/some text.txt
2017-11-10 13:39:41         32 directory with spaces/space file.txt
2017-11-10 13:24:09         32 directory+with+plus/file.txt
2017-11-10 14:13:29          9 mācrons/some-text.txt
2017-11-10 13:24:09         32 paren()thesis/file.txt
2017-11-10 13:45:09         77 some text.txt
2017-11-10 13:24:09          9 some+text.txt
2017-11-10 13:24:09          9 some-text(1).txt
2017-11-10 12:37:59          9 some-text.txt
2017-11-10 14:15:18          9 ümlaut/some-text.txt

Using gdal.ReadDirRecursive returns:

ReadDirRecursive(/vsis3/public-bucket-gdal-vsis3-tests)
['alpha',
 'some text.txt',
 'some+text.txt',
 'some-text(1).txt',
 'some-text.txt',
 'alpha/',
 'alpha/a-zip-file.zip',
 'alpha/delta.txt',
 'alpha/gamma.txt',
 'beta/',
 'beta/a-zip-file.zip',
 'beta/eta.txt',
 'directory with spaces/',
 'directory+with+plus/',
 u'm\u0101crons/',
 'paren()thesis/',
 u'\xfcmlaut/']

First issue: ReadDir and ReadDirRecursive fail to read the contents of directories with surprising characters in them.

Attempting to read these files can have unpredictable results. *some* can be access directly if I URL-Encode the path to the file, but others still fail.

I think the underlying issue here is that paths will need to be URL-Encoded by GDAL before they get added to the request being sent to S3.

Note that the unicode 'umlaut' and 'macron' listings are getting the name correct, but failing to list the contents of the directory. A similar problem, although in order to access a file in a directory with a unicode key name i need to encode the path as utf-8 first. I wonder if GDAL should be doing that for me?

Desired behaviour:

  • Directory listings should successfully include files in directories with special characters.
  • GDAL should transparently URLEncode paths if necessary (or even if not-necessary?)
  • (Maybe?) gdal should transparently handle the utf-8 encoding of paths with unicode characters?

Change History (4)

comment:1 Changed 2 months ago by Even Rouault

Resolution: fixed
Status: newclosed

In 40676:

/vsis3, /vsigs, /vsioss, /vsiaz: fix support of non-ASCII characters in keys (fixes #7143)

comment:2 Changed 2 months ago by Even Rouault

In 40677:

/vsis3/: fix support of non-ASCII characters in keys (fixes #7143)

comment:3 Changed 2 months ago by Even Rouault

Milestone: 2.2.3

comment:4 Changed 2 months ago by tveastman

I'll give this build a test and get back to you.

I am beyond impressed at how quickly you respond to filed issues with comments and patches. Thank you so much!

Note: See TracTickets for help on using tickets.