Ticket #1494 (new defect)

Opened 1 year ago

Last modified 1 year ago

Erroneous character encoding in xml declaration of gml files created by ogr2ogr

Reported by: peter.rushforth@gmail.com Assigned to: warmerdam
Priority: low Milestone:
Component: OGR_SF Version: 1.4.0
Severity: normal Keywords:
Cc: neteler

Description

When reading data files containing accented characters and writing
gml files, the data is incorrectly characterised as utf-8 in the
xml declaration.  This could be difficult to resolve since different
formats will certainly have diverse means of identifying their
data encodings.

A workaround I have used is to strip out the first line of a
gml file I know has western european characters and replace it
with an xml declaration for ISO-8859-1, using head, sed etc or other
non-xml parsing utility.

Change History

02/19/07 17:18:57 changed by warmerdam

Peter,

This is a known problem.  Basically the OGR library does not know anything
about encodings or code pages.  It is hoped that we will be able to address
it with:

  http://www.gdal.org/rfc5_unicode.html

I have added a link to this bug in the RFC for better tracking.  In the meantime
there are no plans for a proper solution to this bug till RFC 5 is finalized
and implemented. 

05/08/07 07:47:56 changed by neteler

  • cc set to neteler.

I am also interested as European (who daily works with accented chars).