Opened 13 years ago

Closed 8 years ago

#1494 closed defect (fixed)

Erroneous character encoding in xml declaration of gml files created by ogr2ogr

Reported by: peter.rushforth@… Owned by: warmerdam
Priority: low Milestone:
Component: OGR_SF Version: 1.4.0
Severity: normal Keywords:
Cc: Markus Neteler

Description (last modified by warmerdam)

When reading data files containing accented characters and writing
gml files, the data is incorrectly characterised as utf-8 in the
xml declaration.  This could be difficult to resolve since different
formats will certainly have diverse means of identifying their
data encodings.

A workaround I have used is to strip out the first line of a
gml file I know has western european characters and replace it
with an xml declaration for ISO-8859-1, using head, sed etc or other
non-xml parsing utility.

Change History (4)

comment:1 Changed 13 years ago by warmerdam

Peter,

This is a known problem.  Basically the OGR library does not know anything
about encodings or code pages.  It is hoped that we will be able to address
it with:

  http://www.gdal.org/rfc5_unicode.html

I have added a link to this bug in the RFC for better tracking.  In the meantime
there are no plans for a proper solution to this bug till RFC 5 is finalized
and implemented. 

comment:2 Changed 12 years ago by Markus Neteler

Cc: Markus Neteler added

I am also interested as European (who daily works with accented chars).

comment:3 Changed 10 years ago by Even Rouault

See #2971 that forces XML output to be ASCII if it is not valid UTF-8.

comment:4 Changed 8 years ago by warmerdam

Description: modified (diff)
Resolution: fixed
Status: newclosed

I believe this is largely now addressed by the OGR UTF8 RFC and the change Even mentions above.

Note: See TracTickets for help on using tickets.