Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#4299 closed defect (fixed)

KML and GML drivers swallow non-ASCII characters

Reported by: peifer Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: svn-trunk
Severity: normal Keywords: KML, GML
Cc:

Description

KML and GML driver swallow all non-ASCII characters, the libkml driver works fine, see below.

$ cat test.csv
Latitude,Longitude,Name
48.1,0.25,"AaEeUu"
49.2,1.15,"ÂâÉéÜü"
$
$ cat test.vrt
<OGRVRTDataSource>
    <OGRVRTLayer name="test">
        <SrcDataSource>test.csv</SrcDataSource>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>WGS84</LayerSRS>
        <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/>
    </OGRVRTLayer>
</OGRVRTDataSource>
$
$ ogr2ogr -f kml kml.kml test.vrt
$ ogr2ogr -f libkml libkml.kml test.vrt
$ ogr2ogr -f gml gml.gml test.vrt
$
$ cat kml.kml
<?xml version="1.0" encoding="utf-8" ?>
<kml xmlns="http://www.opengis.net/kml/2.2">
<Document><Folder><name>test</name>
<Schema name="test" id="test">
        <SimpleField name="Name" type="string"></SimpleField>
        <SimpleField name="Description" type="string"></SimpleField>
        <SimpleField name="Latitude" type="string"></SimpleField>
        <SimpleField name="Longitude" type="string"></SimpleField>
</Schema>
  <Placemark>
        <name>AaEeUu</name>
        <ExtendedData><SchemaData schemaUrl="#test">
                <SimpleData name="Name">AaEeUu</SimpleData>
                <SimpleData name="Latitude">48.1</SimpleData>
                <SimpleData name="Longitude">0.25</SimpleData>
        </SchemaData></ExtendedData>
      <Point><coordinates>0.25,48.1</coordinates></Point>
  </Placemark>
  <Placemark>
        <name></name>
        <ExtendedData><SchemaData schemaUrl="#test">
                <SimpleData name="Name"></SimpleData>    <=========
                <SimpleData name="Latitude">49.2</SimpleData>
                <SimpleData name="Longitude">1.15</SimpleData>
        </SchemaData></ExtendedData>
      <Point><coordinates>1.15,49.2</coordinates></Point>
  </Placemark>
</Folder></Document></kml>
$
$ cat libkml.kml
<kml>
  <Document>
    <Schema id="test.schema">
      <SimpleField name="Latitude" type="string"/>
      <SimpleField name="Longitude" type="string"/>
    </Schema>
    <Document>
      <name>test</name>
      <Placemark>
        <name>AaEeUu</name>
        <ExtendedData>
          <SchemaData schemaUrl="#test.schema">
            <SimpleData name="Latitude">
48.1            </SimpleData>
            <SimpleData name="Longitude">
0.25            </SimpleData>
          </SchemaData>
        </ExtendedData>
        <Point>
          <coordinates>
            0.25,48.1,0
          </coordinates>
        </Point>
      </Placemark>
      <Placemark>
        <name>ÂâÉéÜü</name>  <=========
        <ExtendedData>
          <SchemaData schemaUrl="#test.schema">
            <SimpleData name="Latitude">
49.2            </SimpleData>
            <SimpleData name="Longitude">
1.15            </SimpleData>
          </SchemaData>
        </ExtendedData>
        <Point>
          <coordinates>
            1.15,49.2,0
          </coordinates>
        </Point>
      </Placemark>
    </Document>
  </Document>
</kml>
$
$ cat gml.gml
<?xml version="1.0" encoding="utf-8" ?>
<ogr:FeatureCollection
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://ogr.maptools.org/ gml.xsd"
     xmlns:ogr="http://ogr.maptools.org/"
     xmlns:gml="http://www.opengis.net/gml">
  <gml:boundedBy>
    <gml:Box>
      <gml:coord><gml:X>0.25</gml:X><gml:Y>48.1</gml:Y></gml:coord>
      <gml:coord><gml:X>1.15</gml:X><gml:Y>49.2</gml:Y></gml:coord>
    </gml:Box>
  </gml:boundedBy>
                                                                                                                              
  <gml:featureMember>
    <ogr:test fid="test.0">
      <ogr:geometryProperty><gml:Point srsName="EPSG:4326"><gml:coordinates>0.25,48.1</gml:coordinates></gml:Point></ogr:geometryProperty>
      <ogr:Latitude>48.1</ogr:Latitude>
      <ogr:Longitude>0.25</ogr:Longitude>
      <ogr:Name>AaEeUu</ogr:Name>
    </ogr:test>
  </gml:featureMember>
  <gml:featureMember>
    <ogr:test fid="test.1">
      <ogr:geometryProperty><gml:Point srsName="EPSG:4326"><gml:coordinates>1.15,49.2</gml:coordinates></gml:Point></ogr:geometryProperty>
      <ogr:Latitude>49.2</ogr:Latitude>
      <ogr:Longitude>1.15</ogr:Longitude>
      <ogr:Name></ogr:Name>     <=========
    </ogr:test>
  </gml:featureMember>
</ogr:FeatureCollection>
$

Change History (4)

comment:1 by Even Rouault, 12 years ago

Which encoding are your characters encoded ? It if is UTF8, then it should work with GML and KML driver. If it is not UTF8, then indeed the GML and KML driver will sanetize it (meaning remove non ASCII chars), and issue a warning telling so, so that the resulting XML file can validate. It is possible that the LIBKML driver is less strict on that, but in that case, the resulting KML will not validate...

To me, what you describe is a bug of the LIBKML driver that should also swallow those characters if they are not valid UTF-8

comment:2 by peifer, 12 years ago

test.csv *is* UTF-8 encoded and the accented characters are just silently swallowed, which looks to me like a bug.

$ file test.csv
test.csv: UTF-8 Unicode text

I just recoded test.csv to ISO-8859-1, and now GML and KML driver do what you describe:

Warning 1: ▒▒▒▒▒▒ is not a valid UTF-8 string. Forcing it to ASCII.
If you still want the original string and change the XML file encoding
afterwards, you can define OGR_FORCE_ASCII=NO as configuration option.
This warning won't be issued anymore

...

    <SimpleData name="Name">??????</SimpleData>
...
    <ogr:Name>??????</ogr:Name>

The libkml driver doesn't seem to care and dumps the ISO-8859-1 encoded text simply into the KML file, the resulting XML is obviously not well-formed.

$ ogr2ogr -f libkml libkml.kml test.vrt
$ xmlwf libkml.kml
libkml.kml:26:14: not well-formed (invalid token)

comment:3 by Even Rouault, 12 years ago

Resolution: fixed
Status: newclosed

Thanks for reporting. This was indeed a (trunk) regression. (I have left the lack of checking of the LIBKML driver. Would deserve a specific ticket)

r23254 /trunk/gdal/port/cpl_string.cpp: CPLEscapeString(, CPLES_XML): avoid dropping bytes >= 128 (fix regression of #4117 - trunk only - raised in #4299)

comment:4 by Even Rouault, 12 years ago

r23255 /trunk/autotest/ogr/ogr_gml_read.py: Test writing non-ASCII UTF-8 content (#4117, #4299)

Note: See TracTickets for help on using tickets.