Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#5048 closed defect (fixed)

Support for lexical level 2 (UCS-2) NATF attributes in s57 reader

Reported by: julius6 Owned by: warmerdam
Priority: high Milestone: 1.10.0
Component: OGR_SF Version: 1.9.2
Severity: critical Keywords: UCS-2 NATF lexical NALL
Cc:

Description

Hi,

we would like to know how in GDAL 1.9.2 we could handle LL2 languages in the S57 driver: as far as we understand the driver does not check LL 0, 1 or 2 before parsing string attributes. We read here and there (either in code or tickets) of "patches" attempting to solve the problem by looking at the unit terminator, but it does not seem to be present a definitive solution to this problem.

We tried a temporary fix with success:

  • Modify S57Reader to get the DSID record and read the LL (NALL or AALL)
  • Accordingly, recode strings from either UCS-2 or ISO-8859-1 to UTF-8

We put our fix in S57Reader::ApplyObjectClassAttributes, when it reads the string attributes for ATTF and NATF, we intercept the strings and recode them.

Is there any plan to do something official about this issue or do you have alternative suggestions to get around this problem?

Feel free to ask for the code if you are interested

Thanks

Attachments (6)

5048.patch (391.1 KB ) - added by julius6 11 years ago.
5048_v2.patch (4.7 KB ) - added by julius6 11 years ago.
ogrs57datasource.cpp.patch (287 bytes ) - added by julius6 11 years ago.
ogrshapedatasource.cpp.patch (68 bytes ) - added by julius6 11 years ago.
s57.h.patch (407 bytes ) - added by julius6 11 years ago.
s57reader.cpp.patch (4.0 KB ) - added by julius6 11 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 by Even Rouault, 11 years ago

Please attach your changes as a patch to the ticket. Sample datasets that illustrate the issue would be appreciated too.

by julius6, 11 years ago

Attachment: 5048.patch added

comment:2 by julius6, 11 years ago

Attached as a cumulative patch file. Unfortunately my datasets are not public but you can easily test it with sample LL 0,1,2 free datasets. The code was tested on Windows only, but it does not assume wchar_t to be 16-bit, so it should work on Linux, provided CPL recoding functions are OK. The difference in ogrshapedatasource.cpp is loosely related to this issue. I have added deletion of the cpg file when deleting a shape dataset.

comment:3 by Even Rouault, 11 years ago

Hum the patch cannot be examined easily. Not sure why. If not already done, could you generate it with "svn diff" or "diff -u" so it is more readable ? Or it might be an end-of-line character local change on your side.

Can you point to a URL with those sample LL 0,1,2 free datasets ?

by julius6, 11 years ago

Attachment: 5048_v2.patch added

comment:4 by julius6, 11 years ago

Hi,

sorry for the late reply, it is a busy time :-)

Yes, it was a mix of DOS and UNIX style line endings. Now the patch file should be OK. About free datasets, the entrypoint could be www.iho.int From there you could follow pointers to far-east hydrographic offices' sites to get LL2 datasets, while NOAA should be the right source for LL0, LL1 samples.

Please let me know if you can do the research yourself, otherwise I can manage to find samples that can be released to the public.

BTW, shouldn't the GDAL test suite already contain such free samples?

Thank you and keep up your great work

comment:5 by Even Rouault, 11 years ago

Sorry, but how do you apply this patch ? Please rebase on trunk and use "diff -u" or "svn diff" to generate it. I tried "patch s57reader.cpp < ../../../5048_v2.patch" on trunk but there are rejects since I think there are changes in several files in the patch but no filename is indicated in it.

comment:6 by julius6, 11 years ago

I used WinMerge. When creating the patch, I chose the option to append changes to the same patch file. I can create separate .patch files for each of the 4 difference files if you believe. Other than this, can you point me to some document which explains how to create the patch the right way?

comment:7 by Even Rouault, 11 years ago

Well, I don't know WinMerge but the option to append changes from different file in the same patch file without including the filenames seems pretty useless to me (try applying that patch yourself and you'll see ;-) !). Well, if you can't make it otherwise, then 4 patch files will be at least manageable. Otherwise if you work from a SVN checkout, then it's pretty simple to generate a patch with TortoiseSVN (Create patch) or with svn command line "svn diff".

by julius6, 11 years ago

Attachment: ogrs57datasource.cpp.patch added

by julius6, 11 years ago

by julius6, 11 years ago

Attachment: s57.h.patch added

by julius6, 11 years ago

Attachment: s57reader.cpp.patch added

comment:8 by julius6, 11 years ago

I have attached the 4 patch files, still created by WinMerge. Let me know if it works for you. The reference (what I am comparing against) is the last officially released gdal (1.9.2)

comment:9 by Even Rouault, 11 years ago

Component: defaultOGR_SF
Milestone: 1.10.0
Resolution: fixed
Status: newclosed

r25905 "S57: add RECODE_BY_DSS suboption to OGR_S57_OPTIONS configuration option that can be set to YES so that the attribute values are recoded to UTF-8, from the character encoding specified in the S57 DSSI record (#5048, #3421, adapted from patch by julius6)"

I've add to modify your patch to take into account the fact that wchar_t is not 2 bytes large on Linux, and to also skip the UTF16 BOM found in the datasets at https://www1.kaiho.mlit.go.jp/KOKAI/ENC/English/sample.html

comment:10 by julius6, 11 years ago

Hi,

thanks for accepting the fix and making it more robust, too.

I guess you included this fix in some GDAL "development" version (BTW, where is it?). Do you plan to include it in 1.9.3?

Thanks again julius6

comment:11 by Even Rouault, 11 years ago

GDAL 1.10 should be released in a few days with the patch. See http://lists.osgeo.org/pipermail/gdal-dev/2013-April/035953.html . At that point, and due to my relative unfamiliarity with S57 encodings, I don't plan to backport it in 1.9 branch.

Note: See TracTickets for help on using tickets.