Opened 12 years ago
Last modified 5 years ago
#4808 closed defect
Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified — at Version 1
Reported by: | akaginch | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | closed_because_of_github_migration |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | Language driver ID |
Cc: |
Description (last modified by )
What does LDID/87 means? The page cited in the source code (ogrshapelayer.cpp) says LDID/87 is "Current ANSI Codepage", but the Shapefile Driver treats it as ISO-8859-1.
http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM
In Shapefile creation, default LDID is "87". If we create a Shapefile without specifying ENCODING option, a DBF file whose LDID is this value will be generated. Then OGR Shapefile driver recodes the attribute strings from UTF-8 to ISO-8859-1 when the user writes features into the shapefile,
Encoding conversion ability of OGR is useful, but a problem arises because many applications using GDAL have not adapted to the imporovement of the ability. So attribute strings output from such applications are garbled.
Now, I would like to propose that Shapefile driver should interpret LDID/87 as no codepage specified. If it does so, without specifying ENCODING option, the driver doesn't convert character encodings. Additionally it makes ability to handle a Shapefile that has "87" in the LDID field and attribute strings in the encoding other than ISO-8859-1, without any problem.
gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp CPLString OGRShapeLayer::ConvertCodePage( const char *pszCodePage ) line 221 - case 87: return CPL_ENC_ISO8859_1; + case 87: return osEncoding;