Opened 19 years ago

Closed 13 years ago

Last modified 13 years ago

#882 closed enhancement (fixed)

Unicode support in OGR Shape/DBF

Reported by: magnus@… Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Shape
Cc: Markus Neteler, alexbruy, gislab, Jeff McKenna

Description (last modified by bishop)


Attachments (1)

ogr_encodings.patch (14.8 KB ) - added by bishop 14 years ago.
Patch to Get & Set Encodings for OGRLayers

Download all attachments as: .zip

Change History (14)

comment:1 by warmerdam, 19 years ago

I would add that the dbf reference lists byte 30 (offset 29) as the
language driver code.  There is no listed value for unicode though.  

http://www.clicketyclick.dk/databases/xbase/format/dbf.html

In OGR, for creation, we should support a layer creation option to set
the language code in the shapefile driver.  

There is no obvious means of reporting language code when reading since 
OGR has no metadata facility.  


comment:2 by magnus@…, 19 years ago

One more ref on i18n in Qt:
http://doc.trolltech.com/3.3/i18n.html

comment:3 by neteler@…, 18 years ago

(From update of attachment 296)
sorry, submitted to the wrong bug number. Please delete here.

comment:5 by warmerdam, 17 years ago

Description: modified (diff)
Priority: highnormal
Severity: majornormal
Type: defectenhancement

An RFC is under development to address this:

http://www.gdal.org/rfc5_unicode.html

Adding Andrey as a cc: in case the infrormation in this report is helpful.

Reclassifying as an enhancement.

comment:6 by Mateusz Łoskot, 17 years ago

Description: modified (diff)

by bishop, 14 years ago

Attachment: ogr_encodings.patch added

Patch to Get & Set Encodings for OGRLayers

comment:8 by bishop, 14 years ago

I propose this patch to solve this issue. It is proposed that the programmer should perform character set conversion by himself.

comment:9 by gislab, 14 years ago

Cc: alexbruy gislab added

comment:10 by warmerdam, 13 years ago

Keywords: Shape added
Milestone: 1.9.0
Summary: Unicode support in OGRUnicode support in OGR Shape/DBF

I am working on incorporation of support for shapefile encoding, including some ability to override encodings when they are not specified.

comment:11 by warmerdam, 13 years ago

Resolution: fixed
Status: assignedclosed

I have made a preliminary pass implementing support for converting to UTF-8 on read, and from UTF-8 on write in trunk (r22176). Note that LDID/87 (the default) is treated as ISO8859_1 currently rather than "local encoding" which is apparently what it should be. The SHAPE_ENCODING configuration variable can be used to override the interpretation. CPG values are not used as I don't know what would appear in the CPG file. It would be nice if we could at least handle UTF-8 via CPG.

Test welcome!

comment:12 by Jeff McKenna, 13 years ago

Cc: Jeff McKenna added

comment:13 by bishop, 13 years ago

Description: modified (diff)

comment:14 by bishop, 13 years ago

Description: modified (diff)

The CPG file is a last chance for user to set needed encoding. Because if the producer set encoding to default (LDID/87), but data is in other encoding (some local data), it's much easier to create simple CPG file, than encode whole dbf. So the CPG file should be preferable encoding on others (internal ones).

Note: See TracTickets for help on using tickets.