#7143 closed defect (fixed)
vsis3: keys with spaces and special characters cause bad behaviour
Reported by: | tveastman | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | 2.2.3 |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | s3 vsis3 url encoding |
Cc: | robert.coup@… |
Description
My test bucket contains the following tree
(This bucket is public, please feel welcome to access it yourself for testing):
$ aws s3 ls --recursive public-bucket-gdal-vsis3-tests 2017-11-10 14:15:19 237 alpha 2017-11-06 11:30:47 699 alpha/a-zip-file.zip 2017-11-03 15:17:30 30 alpha/delta.txt 2017-11-03 15:17:30 23 alpha/gamma.txt 2017-11-06 11:30:47 699 beta/a-zip-file.zip 2017-11-03 15:17:30 8 beta/eta.txt 2017-11-10 12:11:34 32 directory with spaces/file.txt 2017-11-10 14:13:29 77 directory with spaces/some text.txt 2017-11-10 13:39:41 32 directory with spaces/space file.txt 2017-11-10 13:24:09 32 directory+with+plus/file.txt 2017-11-10 14:13:29 9 mācrons/some-text.txt 2017-11-10 13:24:09 32 paren()thesis/file.txt 2017-11-10 13:45:09 77 some text.txt 2017-11-10 13:24:09 9 some+text.txt 2017-11-10 13:24:09 9 some-text(1).txt 2017-11-10 12:37:59 9 some-text.txt 2017-11-10 14:15:18 9 ümlaut/some-text.txt
Using gdal.ReadDirRecursive
returns:
ReadDirRecursive(/vsis3/public-bucket-gdal-vsis3-tests) ['alpha', 'some text.txt', 'some+text.txt', 'some-text(1).txt', 'some-text.txt', 'alpha/', 'alpha/a-zip-file.zip', 'alpha/delta.txt', 'alpha/gamma.txt', 'beta/', 'beta/a-zip-file.zip', 'beta/eta.txt', 'directory with spaces/', 'directory+with+plus/', u'm\u0101crons/', 'paren()thesis/', u'\xfcmlaut/']
First issue: ReadDir
and ReadDirRecursive
fail to read the contents of directories with surprising characters in them.
Attempting to read these files can have unpredictable results. *some* can be access directly if I URL-Encode the path to the file, but others still fail.
I think the underlying issue here is that paths will need to be URL-Encoded by GDAL before they get added to the request being sent to S3.
Note that the unicode 'umlaut' and 'macron' listings are getting the name correct, but failing to list the contents of the directory. A similar problem, although in order to access a file in a directory with a unicode key name i need to encode the path as utf-8 first. I wonder if GDAL should be doing that for me?
Desired behaviour:
- Directory listings should successfully include files in directories with special characters.
- GDAL should transparently URLEncode paths if necessary (or even if not-necessary?)
- (Maybe?) gdal should transparently handle the utf-8 encoding of paths with unicode characters?
Change History (4)
comment:1 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:3 by , 6 years ago
Milestone: | → 2.2.3 |
---|
comment:4 by , 6 years ago
I'll give this build a test and get back to you.
I am beyond impressed at how quickly you respond to filed issues with comments and patches. Thank you so much!
In 40676: