Opened 12 years ago

Last modified 5 years ago

#4808 closed defect

Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified — at Version 1

Reported by: akaginch Owned by: warmerdam
Priority: normal Milestone: closed_because_of_github_migration
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Language driver ID
Cc:

Description (last modified by akaginch)

What does LDID/87 means? The page cited in the source code (ogrshapelayer.cpp) says LDID/87 is "Current ANSI Codepage", but the Shapefile Driver treats it as ISO-8859-1.
http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM

In Shapefile creation, default LDID is "87". If we create a Shapefile without specifying ENCODING option, a DBF file whose LDID is this value will be generated. Then OGR Shapefile driver recodes the attribute strings from UTF-8 to ISO-8859-1 when the user writes features into the shapefile,

Encoding conversion ability of OGR is useful, but a problem arises because many applications using GDAL have not adapted to the imporovement of the ability. So attribute strings output from such applications are garbled.

Now, I would like to propose that Shapefile driver should interpret LDID/87 as no codepage specified. If it does so, without specifying ENCODING option, the driver doesn't convert character encodings. Additionally it makes ability to handle a Shapefile that has "87" in the LDID field and attribute strings in the encoding other than ISO-8859-1, without any problem.

gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp
CPLString OGRShapeLayer::ConvertCodePage( const char *pszCodePage )
line 221
-          case 87: return CPL_ENC_ISO8859_1;
+          case 87: return osEncoding;

Related to #4787 and #4739

Change History (1)

comment:1 by akaginch, 12 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.