Opened 12 years ago
Closed 5 years ago
#4808 closed defect (wontfix)
[PATCH] Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified
Reported by: | akaginch | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | closed_because_of_github_migration |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | Language driver ID |
Cc: |
Description (last modified by )
What does LDID/87 means? The page cited in the source code (ogrshapelayer.cpp) says LDID/87 is "Current ANSI Codepage", but the Shapefile Driver treats it as ISO-8859-1.
http://www.autopark.ru/ASBProgrammerGuide/DBFSTRUC.HTM
In Shapefile creation, default LDID is "87". If we create a Shapefile without specifying ENCODING option, a DBF file whose LDID is this value will be generated. Then OGR Shapefile driver recodes the attribute strings from UTF-8 to ISO-8859-1 when the user writes features into the shapefile,
Encoding conversion ability of OGR is useful, but a problem arises because many applications using GDAL have not adapted to the imporovement of the ability. So attribute strings output from such applications are garbled.
Now, I would like to propose that Shapefile driver should interpret LDID/87 as no codepage specified. If it does so, without specifying ENCODING option, the driver doesn't convert character encodings. Additionally it makes ability to handle a Shapefile that has "87" in the LDID field and attribute strings in the encoding other than ISO-8859-1, without any problem.
gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp CPLString OGRShapeLayer::ConvertCodePage( const char *pszCodePage ) line 221 - case 87: return CPL_ENC_ISO8859_1; + case 87: return osEncoding;
Attachments (1)
Change History (7)
comment:1 by , 12 years ago
Description: | modified (diff) |
---|
comment:2 by , 12 years ago
comment:3 by , 12 years ago
I tried to implement the above design. Please see and try the attached diff file if you are interested in it.
by , 12 years ago
Attachment: | ogr_encoding.diff added |
---|
follow-up: 5 comment:4 by , 12 years ago
Summary: | Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified → [PATCH] Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified |
---|
The proposed patch is an alternate/complementary approach to what tried to be solved by http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode . This RFC should probably be revised if your solution is adopted.
Perhaps this is a topic you would want to add to http://trac.osgeo.org/gdal/wiki/GDAL20Changes ?
comment:5 by , 12 years ago
Replying to rouault:
Perhaps this is a topic you would want to add to http://trac.osgeo.org/gdal/wiki/GDAL20Changes ?
It will be nice if this topic is added to it and considered.
I suppose that if this is achieved, it will give advanced solution to the following issues.
- [gdal-dev] Character Encoding Problem http://lists.osgeo.org/pipermail/gdal-dev/2012-November/034840.html
- Quantum GIS Desktop - Bug #6500: Language Encoding very broken in 1.8 Lisboa - QGIS Issue Tracking http://hub.qgis.org/issues/6500
comment:6 by , 5 years ago
Milestone: | → closed_because_of_github_migration |
---|---|
Resolution: | → wontfix |
Status: | new → closed |
This ticket has been automatically closed because Trac is no longer used for GDAL bug tracking, since the project has migrated to GitHub. If you believe this ticket is still valid, you may file it to https://github.com/OSGeo/gdal/issues if it is not already reported there.
I created this ticket because I thought that it's not preferred to convert encoding by default when outputting shapefile. Applications using OGR can disable the encoding conversion by setting empty string to SHAPE_ENCODING, but I feel there is still room for improvement.
A re-design proposal for encoding conversion of OGR:
GetEncoding()
of OGR layer. The function of newly created layer returns the default character encoding of the layer.