Opened 10 years ago

Closed 6 months ago

Last modified 6 months ago

#455 closed defect (fixed)

OGR Provider: Investigate switching string conversion functions to always assume UTF-8 enocded strings

Reported by: traianstanev Owned by: traianstanev
Priority: major Milestone:
Component: OGR Provider Version: 3.4.0
Severity: 3 Keywords:
Cc: External ID:

Description

The OGR provider is currently using locale-dependent string conversion routines to go from OGR (char*) to FDO (wchar_t*). This task is to check if it makes sense to switch the code to always assume OGR strings are UTF8 and use UTF8->UTF16/32 conversion when exposing OGR strings via the FDO provider. The current code produces incorrect results when the provider is used against PostGIS for example, but some other sources (like SHP) could produce wrong results if we assume UTF8 encoding. However, reading SHP with the OGR provider is not a priority, since there is the dedicated SHP provider.

Change History (2)

comment:1 Changed 6 months ago by jng

Resolution: fixed
Status: newclosed

In 7670:

Replace wchar_t <-> char conversion routines in the OGR provider with calls to GDAL's CPLRecodeXXX family of string conversion functions instead. In addition, add support for a new optional "DataSourceEncoding?" connection parameter that allows a user to declare the encoding for a data source where such encoding cannot be inferred by the underlying OGR driver. When set, this declares the encoding of the [char] side of any wchar_t <-> char conversion. If not set, the [char] encoding defaults to UTF-8.

New unit tests have been added (with a test MapInfo? tab file graciously provided by Geograf A/S) to verify that TAB files with unicode characters in property names and property values are not scrambled when connecting with the encoding (ISO-8859-1 for this tab file) specified up front. All other existing tests still pass with this change.

Special thanks to Hans Milling of Geograf A/S for testing, review of these changes and providing the test data to add to the ever growing test suite for the OGR provider.

Fixes #455
Fixes #66

comment:2 Changed 6 months ago by jng

In 7671:

Merged revision(s) 7670 from trunk:
Replace wchar_t <-> char conversion routines in the OGR provider with calls to GDAL's CPLRecodeXXX family of string conversion functions instead. In addition, add support for a new optional "DataSourceEncoding?" connection parameter that allows a user to declare the encoding for a data source where such encoding cannot be inferred by the underlying OGR driver. When set, this declares the encoding of the [char] side of any wchar_t <-> char conversion. If not set, the [char] encoding defaults to UTF-8.

New unit tests have been added (with a test MapInfo? tab file graciously provided by Geograf A/S) to verify that TAB files with unicode characters in property names and property values are not scrambled when connecting with the encoding (ISO-8859-1 for this tab file) specified up front. All other existing tests still pass with this change.

Special thanks to Hans Milling of Geograf A/S for testing, review of these changes and providing the test data to add to the ever growing test suite for the OGR provider.

Fixes #455
Fixes #66
........

Note: See TracTickets for help on using tickets.