Unicode support in OGR Shape/DBF

Patch to Get & Set Encodings for OGRLayers

I would add that the dbf reference lists byte 30 (offset 29) as the
language driver code.  There is no listed value for unicode though.

In OGR, for creation, we should support a layer creation option to set
the language code in the shapefile driver.  

There is no obvious means of reporting language code when reading since 
OGR has no metadata facility.  

One more ref on i18n in Qt:

An RFC is under development to address this:

Adding Andrey as a cc: in case the infrormation in this report is helpful.

Reclassifying as an enhancement.

Patch to Get & Set Encodings for OGRLayers

I propose this patch to solve this issue. It is proposed that the programmer should perform character set conversion by himself.

Unicode support in OGR Shape/DBF

I am working on incorporation of support for shapefile encoding, including some ability to override encodings when they are not specified.

I have made a preliminary pass implementing support for converting to UTF-8 on read, and from UTF-8 on write in trunk (r22176). Note that LDID/87 (the default) is treated as ISO8859_1 currently rather than "local encoding" which is apparently what it should be. The SHAPE_ENCODING configuration variable can be used to override the interpretation. CPG values are not used as I don't know what would appear in the CPG file. It would be nice if we could at least handle UTF-8 via CPG.

Test welcome!

Description: modified (diff)

The CPG file is a last chance for user to set needed encoding. Because if the producer set encoding to default (LDID/87), but data is in other encoding (some local data), it's much easier to create simple CPG file, than encode whole dbf. So the CPG file should be preferable encoding on others (internal ones).

