Opened 12 years ago

Closed 5 years ago

#4808 closed defect (wontfix)

[PATCH] Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified

Reported by: akaginch Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Language driver ID

Description (last modified by akaginch)

What does LDID/87 means? The page cited in the source code (ogrshapelayer.cpp) says LDID/87 is "Current ANSI Codepage", but the Shapefile Driver treats it as ISO-8859-1.

In Shapefile creation, default LDID is "87". If we create a Shapefile without specifying ENCODING option, a DBF file whose LDID is this value will be generated. Then OGR Shapefile driver recodes the attribute strings from UTF-8 to ISO-8859-1 when the user writes features into the shapefile,

Encoding conversion ability of OGR is useful, but a problem arises because many applications using GDAL have not adapted to the imporovement of the ability. So attribute strings output from such applications are garbled.

Now, I would like to propose that Shapefile driver should interpret LDID/87 as no codepage specified. If it does so, without specifying ENCODING option, the driver doesn't convert character encodings. Additionally it makes ability to handle a Shapefile that has "87" in the LDID field and attribute strings in the encoding other than ISO-8859-1, without any problem.

CPLString OGRShapeLayer::ConvertCodePage( const char *pszCodePage )
line 221
-          case 87: return CPL_ENC_ISO8859_1;
+          case 87: return osEncoding;

Related to #4787 and #4739

Attachments (1)

ogr_encoding.diff (24.9 KB ) - added by akaginch 11 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by akaginch, 12 years ago

Description: modified (diff)

comment:2 by akaginch, 11 years ago

I created this ticket because I thought that it's not preferred to convert encoding by default when outputting shapefile. Applications using OGR can disable the encoding conversion by setting empty string to SHAPE_ENCODING, but I feel there is still room for improvement.

A re-design proposal for encoding conversion of OGR:

  1. OGR Layers don't convert character encoding Strings passed from applications are outputted just as it is (without encoding conversion).
  1. OGR Layers have the ability to detect encoding Applications using OGR can get encoding of layers with GetEncoding() of OGR layer. The function of newly created layer returns the default character encoding of the layer.
  1. ogr2ogr converts character encoding instead of OGR Layers This can make encoding handling of various drivers flexible. s_enc and t_enc options are added to ogr2ogr. If both encodings are not detected and not specified by user, ogr2ogr does not convert encoding. It will help users by avoiding unintended encoding conversions and giving users a common way to specify how to treat character encoding across various drivers.
  1. Moreover, shapefile layer should interpret LDID/87 properly In Windows, GetACP() returns current codepage. Considering other OSes, it's better to provide a way to specify it with an environment variable.

comment:3 by akaginch, 11 years ago

I tried to implement the above design. Please see and try the attached diff file if you are interested in it.

by akaginch, 11 years ago

Attachment: ogr_encoding.diff added

comment:4 by Even Rouault, 11 years ago

Summary: Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified[PATCH] Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified

The proposed patch is an alternate/complementary approach to what tried to be solved by . This RFC should probably be revised if your solution is adopted.

Perhaps this is a topic you would want to add to ?

in reply to:  4 comment:5 by akaginch, 11 years ago

Replying to rouault:

Perhaps this is a topic you would want to add to ?

It will be nice if this topic is added to it and considered.

I suppose that if this is achieved, it will give advanced solution to the following issues.

comment:6 by Even Rouault, 5 years ago

Milestone: closed_because_of_github_migration
Resolution: wontfix
Status: newclosed

This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to if it is not already reported there.

Note: See TracTickets for help on using tickets.