Opened 6 years ago

Closed 6 years ago

#4300 closed defect (fixed)

libkml driver generates ill-formed XML

Reported by: peifer Owned by: warmerdam
Priority: normal Milestone: 1.9.0
Component: OGR_SF Version: svn-trunk
Severity: normal Keywords: libkml
Cc:

Description

ISO-8859-1 encoded text is copied unchanged into the KML file and makes it ill-formed. See below.

$ cat test.csv
Latitude,Longitude,Name
48.1,0.25,"AaEeUu"
49.2,1.15,"ÂâÉéÜü"
$
$ file test.csv
test.csv: ISO-8859 text
$
$ cat test.vrt
<OGRVRTDataSource>
    <OGRVRTLayer name="test">
        <SrcDataSource>test.csv</SrcDataSource>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>WGS84</LayerSRS>
        <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/>
    </OGRVRTLayer>
</OGRVRTDataSource>
$
$ ogr2ogr -f libkml test.kml test.vrt
$ 
$ xmlwf test.kml
test.kml:26:14: not well-formed (invalid token)
$ 
$ nl test.kml
     1  <kml>
     2    <Document>
     3      <Schema id="test.schema">
     4        <SimpleField name="Latitude" type="string"/>
     5        <SimpleField name="Longitude" type="string"/>
     6      </Schema>
     7      <Document>
     8        <name>test</name>
     9        <Placemark>
    10          <name>AaEeUu</name>
    11          <ExtendedData>
    12            <SchemaData schemaUrl="#test.schema">
    13              <SimpleData name="Latitude">
    14  48.1            </SimpleData>
    15              <SimpleData name="Longitude">
    16  0.25            </SimpleData>
    17            </SchemaData>
    18          </ExtendedData>
    19          <Point>
    20            <coordinates>
    21              0.25,48.1,0
    22            </coordinates>
    23          </Point>
    24        </Placemark>
    25        <Placemark>
    26          <name>ÂâÉéÜü</name>    <===== Line 26, column 14
    27          <ExtendedData>
    28            <SchemaData schemaUrl="#test.schema">
    29              <SimpleData name="Latitude">
    30  49.2            </SimpleData>
    31              <SimpleData name="Longitude">
    32  1.15            </SimpleData>
    33            </SchemaData>
    34          </ExtendedData>
    35          <Point>
    36            <coordinates>
    37              1.15,49.2,0
    38            </coordinates>
    39          </Point>
    40        </Placemark>
    41      </Document>
    42    </Document>
    43  </kml>

Change History (3)

comment:1 Changed 6 years ago by Even Rouault

Milestone: 1.9.0

r23256 /trunk/gdal/ogr/ogrsf_frmts/libkml/ogrlibkmlfield.cpp: LIBKML: check that string values put in fields are valid UTF-8 (#4300)

comment:2 Changed 6 years ago by peifer

Your fix works fine for me. Is there any reason for not closing the ticket?

While thinking about potential encoding issues: could it perhaps make sense to expand the scope of the config option SHAPE_ENCODING towards a more generic SOURCE_ENCODING option? (If yes, I would open an enhancement ticket.)

comment:3 Changed 6 years ago by Even Rouault

Resolution: fixed
Status: newclosed

I just forgot to close it.

As far as encoding is concerned, I agree that the situation isn't fully satisfactory, but a broader discussion than a simple ticket would be needed. You should have a look at http://trac.osgeo.org/gdal/wiki/rfc23_ogr_unicode .

Note: See TracTickets for help on using tickets.