Opened 12 years ago

Closed 6 years ago

Last modified 6 years ago

#882 closed enhancement (fixed)

Unicode support in OGR Shape/DBF

Reported by: magnus@… Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Shape
Cc: neteler, alexbruy, gislab, Jeff McKenna

Attachments (1)

ogr_encodings.patch (14.8 KB) - added by bishop 7 years ago.
Patch to Get & Set Encodings for OGRLayers

Download all attachments as: .zip

Change History (14)

comment:1 Changed 12 years ago by warmerdam

I would add that the dbf reference lists byte 30 (offset 29) as the
language driver code.  There is no listed value for unicode though.

In OGR, for creation, we should support a layer creation option to set
the language code in the shapefile driver.  

There is no obvious means of reporting language code when reading since 
OGR has no metadata facility.  

comment:2 Changed 12 years ago by magnus@…

One more ref on i18n in Qt:

comment:3 Changed 11 years ago by neteler@…

(From update of attachment 296)
sorry, submitted to the wrong bug number. Please delete here.

comment:5 Changed 10 years ago by warmerdam

Description: modified (diff)
Priority: highnormal
Severity: majornormal
Type: defectenhancement

An RFC is under development to address this:

Adding Andrey as a cc: in case the infrormation in this report is helpful.

Reclassifying as an enhancement.

comment:6 Changed 10 years ago by Mateusz Łoskot

Description: modified (diff)

Changed 7 years ago by bishop

Attachment: ogr_encodings.patch added

Patch to Get & Set Encodings for OGRLayers

comment:8 Changed 7 years ago by bishop

I propose this patch to solve this issue. It is proposed that the programmer should perform character set conversion by himself.

comment:9 Changed 7 years ago by gislab

Cc: alexbruy gislab added

comment:10 Changed 6 years ago by warmerdam

Keywords: Shape added
Milestone: 1.9.0
Summary: Unicode support in OGRUnicode support in OGR Shape/DBF

I am working on incorporation of support for shapefile encoding, including some ability to override encodings when they are not specified.

comment:11 Changed 6 years ago by warmerdam

Resolution: fixed
Status: assignedclosed

I have made a preliminary pass implementing support for converting to UTF-8 on read, and from UTF-8 on write in trunk (r22176). Note that LDID/87 (the default) is treated as ISO8859_1 currently rather than "local encoding" which is apparently what it should be. The SHAPE_ENCODING configuration variable can be used to override the interpretation. CPG values are not used as I don't know what would appear in the CPG file. It would be nice if we could at least handle UTF-8 via CPG.

Test welcome!

comment:12 Changed 6 years ago by Jeff McKenna

Cc: Jeff McKenna added

comment:13 Changed 6 years ago by bishop

Description: modified (diff)

comment:14 Changed 6 years ago by bishop

Description: modified (diff)

The CPG file is a last chance for user to set needed encoding. Because if the producer set encoding to default (LDID/87), but data is in other encoding (some local data), it's much easier to create simple CPG file, than encode whole dbf. So the CPG file should be preferable encoding on others (internal ones).

Note: See TracTickets for help on using tickets.