#5048 closed defect (fixed)
Support for lexical level 2 (UCS-2) NATF attributes in s57 reader
Reported by: | julius6 | Owned by: | warmerdam |
---|---|---|---|
Priority: | high | Milestone: | 1.10.0 |
Component: | OGR_SF | Version: | 1.9.2 |
Severity: | critical | Keywords: | UCS-2 NATF lexical NALL |
Cc: |
Description
Hi,
we would like to know how in GDAL 1.9.2 we could handle LL2 languages in the S57 driver: as far as we understand the driver does not check LL 0, 1 or 2 before parsing string attributes. We read here and there (either in code or tickets) of "patches" attempting to solve the problem by looking at the unit terminator, but it does not seem to be present a definitive solution to this problem.
We tried a temporary fix with success:
- Modify S57Reader to get the DSID record and read the LL (NALL or AALL)
- Accordingly, recode strings from either UCS-2 or ISO-8859-1 to UTF-8
We put our fix in S57Reader::ApplyObjectClassAttributes, when it reads the string attributes for ATTF and NATF, we intercept the strings and recode them.
Is there any plan to do something official about this issue or do you have alternative suggestions to get around this problem?
Feel free to ask for the code if you are interested
Thanks
Attachments (6)
Change History (17)
comment:1 by , 11 years ago
by , 11 years ago
Attachment: | 5048.patch added |
---|
comment:2 by , 11 years ago
Attached as a cumulative patch file. Unfortunately my datasets are not public but you can easily test it with sample LL 0,1,2 free datasets. The code was tested on Windows only, but it does not assume wchar_t to be 16-bit, so it should work on Linux, provided CPL recoding functions are OK. The difference in ogrshapedatasource.cpp is loosely related to this issue. I have added deletion of the cpg file when deleting a shape dataset.
comment:3 by , 11 years ago
Hum the patch cannot be examined easily. Not sure why. If not already done, could you generate it with "svn diff" or "diff -u" so it is more readable ? Or it might be an end-of-line character local change on your side.
Can you point to a URL with those sample LL 0,1,2 free datasets ?
by , 11 years ago
Attachment: | 5048_v2.patch added |
---|
comment:4 by , 11 years ago
Hi,
sorry for the late reply, it is a busy time :-)
Yes, it was a mix of DOS and UNIX style line endings. Now the patch file should be OK. About free datasets, the entrypoint could be www.iho.int From there you could follow pointers to far-east hydrographic offices' sites to get LL2 datasets, while NOAA should be the right source for LL0, LL1 samples.
Please let me know if you can do the research yourself, otherwise I can manage to find samples that can be released to the public.
BTW, shouldn't the GDAL test suite already contain such free samples?
Thank you and keep up your great work
comment:5 by , 11 years ago
Sorry, but how do you apply this patch ? Please rebase on trunk and use "diff -u" or "svn diff" to generate it. I tried "patch s57reader.cpp < ../../../5048_v2.patch" on trunk but there are rejects since I think there are changes in several files in the patch but no filename is indicated in it.
comment:6 by , 11 years ago
I used WinMerge. When creating the patch, I chose the option to append changes to the same patch file. I can create separate .patch files for each of the 4 difference files if you believe. Other than this, can you point me to some document which explains how to create the patch the right way?
comment:7 by , 11 years ago
Well, I don't know WinMerge but the option to append changes from different file in the same patch file without including the filenames seems pretty useless to me (try applying that patch yourself and you'll see ;-) !). Well, if you can't make it otherwise, then 4 patch files will be at least manageable. Otherwise if you work from a SVN checkout, then it's pretty simple to generate a patch with TortoiseSVN (Create patch) or with svn command line "svn diff".
by , 11 years ago
Attachment: | ogrs57datasource.cpp.patch added |
---|
by , 11 years ago
Attachment: | ogrshapedatasource.cpp.patch added |
---|
by , 11 years ago
Attachment: | s57.h.patch added |
---|
by , 11 years ago
Attachment: | s57reader.cpp.patch added |
---|
comment:8 by , 11 years ago
I have attached the 4 patch files, still created by WinMerge. Let me know if it works for you. The reference (what I am comparing against) is the last officially released gdal (1.9.2)
comment:9 by , 11 years ago
Component: | default → OGR_SF |
---|---|
Milestone: | → 1.10.0 |
Resolution: | → fixed |
Status: | new → closed |
r25905 "S57: add RECODE_BY_DSS suboption to OGR_S57_OPTIONS configuration option that can be set to YES so that the attribute values are recoded to UTF-8, from the character encoding specified in the S57 DSSI record (#5048, #3421, adapted from patch by julius6)"
I've add to modify your patch to take into account the fact that wchar_t is not 2 bytes large on Linux, and to also skip the UTF16 BOM found in the datasets at https://www1.kaiho.mlit.go.jp/KOKAI/ENC/English/sample.html
comment:10 by , 11 years ago
Hi,
thanks for accepting the fix and making it more robust, too.
I guess you included this fix in some GDAL "development" version (BTW, where is it?). Do you plan to include it in 1.9.3?
Thanks again julius6
comment:11 by , 11 years ago
GDAL 1.10 should be released in a few days with the patch. See http://lists.osgeo.org/pipermail/gdal-dev/2013-April/035953.html . At that point, and due to my relative unfamiliarity with S57 encodings, I don't plan to backport it in 1.9 branch.
Please attach your changes as a patch to the ticket. Sample datasets that illustrate the issue would be appreciated too.