#5361 closed defect (fixed)
/vsizip: troubles with cyrillic filenames
Reported by: | oleinik | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | 1.11.0 |
Component: | GDAL_Raster | Version: | 1.10.1 |
Severity: | normal | Keywords: | |
Cc: |
Description (last modified by )
Most programs on windows create zip-archives with filenames in DOS CP866 charset. Sometimes zip-archives also contains filenames in UNICODE charset. GDAL doesn't convert charset between DOS and ANSI and doesn't use UNICODE version as result we have trouble. To open dataset '/vsizip/archive.zip/image_name' with cyrillic name we need firstly convert image_name from ANSI to OEM, then concat with prefix and archive name and then convert result to UTF8. Not very easy-to-use. Moreover different parts of result dataset's description also have different charsets and as result PAM .aux.xml and .ovr filenames not suitable for OS.
Attachments (5)
Change History (16)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
follow-up: 3 comment:2 by , 10 years ago
comment:3 by , 10 years ago
Look at "APPENDIX D" of last zip specification http://www.pkware.com/documents/casestudies/APPNOTE.TXT Application can choose storage method of filenames in Unicode. I try to create zip archves in windows and ubuntu linux with linux zip, windows archiver, 7zip, winrar, winzip. 7zip and linux uses general purpose bit 11 to indicates UTF-8 encoding. Windows archiver, winrar and winzip store names in cp866 charset. Additional winrar and winzip store UTF-8 names in extra field.
comment:4 by , 10 years ago
Winrar and winzip store UTF-8 name in 0x7075 extra field (Info-ZIP Unicode Path Extra Field).
follow-up: 7 comment:6 by , 10 years ago
Milestone: | → 2.0 |
---|---|
Resolution: | → fixed |
Status: | new → closed |
comment:7 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
What about general purpose bit 11 ? Did your append additional convertion to UTF-8, if this bit isn't set ?
follow-up: 9 comment:8 by , 10 years ago
In the a.zip and b.zip samples, the filenames are already in UTF-8. In the c.zip and d.zip samples, there's a 0x7075 extra field with the filenames in UTF-8. That's what I've implemented.
So conversion to UTF-8 would be needed if there's no 0x7075 and the general purpose bit 11 is not set ? Do you have such samples ?
And you mention CP866, but the ZIP spec only mentions CP437 as the default encoding.
comment:9 by , 10 years ago
Replying to rouault:
So conversion to UTF-8 would be needed if there's no 0x7075 and the general purpose bit 11 is not set ?
YES.
And you mention CP866, but the ZIP spec only mentions CP437 as the default encoding.
I think the best way (at least on windows) is use CP_OEMCP codepage as source codepage during convertion to UTF8.
comment:10 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:11 by , 10 years ago
Milestone: | 2.0 → 1.11.0 |
---|
Is the encoding of filenames inside a ZIP file indicated somewhere in the ZIP ? If not, there's no way to address that issue automatically.