Opened 12 years ago

Closed 6 years ago

Last modified 6 years ago

#882 closed enhancement (fixed)

Unicode support in OGR Shape/DBF

Reported by: magnus@… Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Shape
Cc: neteler, alexbruy, gislab, jmckenna

Attachments (1)

ogr_encodings.patch (14.8 KB) - added by bishop 7 years ago.
Patch to Get & Set Encodings for OGRLayers

Download all attachments as: .zip

Change History (14)

comment:1 Changed 12 years ago by warmerdam

I would add that the dbf reference lists byte 30 (offset 29) as the
language driver code.  There is no listed value for unicode though.  

http://www.clicketyclick.dk/databases/xbase/format/dbf.html

In OGR, for creation, we should support a layer creation option to set
the language code in the shapefile driver.  

There is no obvious means of reporting language code when reading since 
OGR has no metadata facility.  


comment:2 Changed 12 years ago by magnus@…

One more ref on i18n in Qt:
http://doc.trolltech.com/3.3/i18n.html

comment:3 Changed 11 years ago by neteler@…

(From update of attachment 296)
sorry, submitted to the wrong bug number. Please delete here.

comment:5 Changed 10 years ago by warmerdam

  • Description modified (diff)
  • Priority changed from high to normal
  • Severity changed from major to normal
  • Type changed from defect to enhancement

An RFC is under development to address this:

http://www.gdal.org/rfc5_unicode.html

Adding Andrey as a cc: in case the infrormation in this report is helpful.

Reclassifying as an enhancement.

comment:6 Changed 9 years ago by mloskot

  • Description modified (diff)

Changed 7 years ago by bishop

Patch to Get & Set Encodings for OGRLayers

comment:8 Changed 7 years ago by bishop

I propose this patch to solve this issue. It is proposed that the programmer should perform character set conversion by himself.

comment:9 Changed 7 years ago by gislab

  • Cc alexbruy gislab added

comment:10 Changed 6 years ago by warmerdam

  • Keywords Shape added
  • Milestone set to 1.9.0
  • Summary changed from Unicode support in OGR to Unicode support in OGR Shape/DBF

I am working on incorporation of support for shapefile encoding, including some ability to override encodings when they are not specified.

comment:11 Changed 6 years ago by warmerdam

  • Resolution set to fixed
  • Status changed from assigned to closed

I have made a preliminary pass implementing support for converting to UTF-8 on read, and from UTF-8 on write in trunk (r22176). Note that LDID/87 (the default) is treated as ISO8859_1 currently rather than "local encoding" which is apparently what it should be. The SHAPE_ENCODING configuration variable can be used to override the interpretation. CPG values are not used as I don't know what would appear in the CPG file. It would be nice if we could at least handle UTF-8 via CPG.

Test welcome!

comment:12 Changed 6 years ago by jmckenna

  • Cc jmckenna added

comment:13 Changed 6 years ago by bishop

  • Description modified (diff)

comment:14 Changed 6 years ago by bishop

  • Description modified (diff)

The CPG file is a last chance for user to set needed encoding. Because if the producer set encoding to default (LDID/87), but data is in other encoding (some local data), it's much easier to create simple CPG file, than encode whole dbf. So the CPG file should be preferable encoding on others (internal ones).

Note: See TracTickets for help on using tickets.