#4299 closed defect (fixed)
KML and GML drivers swallow non-ASCII characters
Reported by: | peifer | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | svn-trunk |
Severity: | normal | Keywords: | KML, GML |
Cc: |
Description
KML and GML driver swallow all non-ASCII characters, the libkml driver works fine, see below.
$ cat test.csv Latitude,Longitude,Name 48.1,0.25,"AaEeUu" 49.2,1.15,"ÂâÉéÜü" $ $ cat test.vrt <OGRVRTDataSource> <OGRVRTLayer name="test"> <SrcDataSource>test.csv</SrcDataSource> <GeometryType>wkbPoint</GeometryType> <LayerSRS>WGS84</LayerSRS> <GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude"/> </OGRVRTLayer> </OGRVRTDataSource> $ $ ogr2ogr -f kml kml.kml test.vrt $ ogr2ogr -f libkml libkml.kml test.vrt $ ogr2ogr -f gml gml.gml test.vrt $ $ cat kml.kml <?xml version="1.0" encoding="utf-8" ?> <kml xmlns="http://www.opengis.net/kml/2.2"> <Document><Folder><name>test</name> <Schema name="test" id="test"> <SimpleField name="Name" type="string"></SimpleField> <SimpleField name="Description" type="string"></SimpleField> <SimpleField name="Latitude" type="string"></SimpleField> <SimpleField name="Longitude" type="string"></SimpleField> </Schema> <Placemark> <name>AaEeUu</name> <ExtendedData><SchemaData schemaUrl="#test"> <SimpleData name="Name">AaEeUu</SimpleData> <SimpleData name="Latitude">48.1</SimpleData> <SimpleData name="Longitude">0.25</SimpleData> </SchemaData></ExtendedData> <Point><coordinates>0.25,48.1</coordinates></Point> </Placemark> <Placemark> <name></name> <ExtendedData><SchemaData schemaUrl="#test"> <SimpleData name="Name"></SimpleData> <========= <SimpleData name="Latitude">49.2</SimpleData> <SimpleData name="Longitude">1.15</SimpleData> </SchemaData></ExtendedData> <Point><coordinates>1.15,49.2</coordinates></Point> </Placemark> </Folder></Document></kml> $ $ cat libkml.kml <kml> <Document> <Schema id="test.schema"> <SimpleField name="Latitude" type="string"/> <SimpleField name="Longitude" type="string"/> </Schema> <Document> <name>test</name> <Placemark> <name>AaEeUu</name> <ExtendedData> <SchemaData schemaUrl="#test.schema"> <SimpleData name="Latitude"> 48.1 </SimpleData> <SimpleData name="Longitude"> 0.25 </SimpleData> </SchemaData> </ExtendedData> <Point> <coordinates> 0.25,48.1,0 </coordinates> </Point> </Placemark> <Placemark> <name>ÂâÉéÜü</name> <========= <ExtendedData> <SchemaData schemaUrl="#test.schema"> <SimpleData name="Latitude"> 49.2 </SimpleData> <SimpleData name="Longitude"> 1.15 </SimpleData> </SchemaData> </ExtendedData> <Point> <coordinates> 1.15,49.2,0 </coordinates> </Point> </Placemark> </Document> </Document> </kml> $ $ cat gml.gml <?xml version="1.0" encoding="utf-8" ?> <ogr:FeatureCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ogr.maptools.org/ gml.xsd" xmlns:ogr="http://ogr.maptools.org/" xmlns:gml="http://www.opengis.net/gml"> <gml:boundedBy> <gml:Box> <gml:coord><gml:X>0.25</gml:X><gml:Y>48.1</gml:Y></gml:coord> <gml:coord><gml:X>1.15</gml:X><gml:Y>49.2</gml:Y></gml:coord> </gml:Box> </gml:boundedBy> <gml:featureMember> <ogr:test fid="test.0"> <ogr:geometryProperty><gml:Point srsName="EPSG:4326"><gml:coordinates>0.25,48.1</gml:coordinates></gml:Point></ogr:geometryProperty> <ogr:Latitude>48.1</ogr:Latitude> <ogr:Longitude>0.25</ogr:Longitude> <ogr:Name>AaEeUu</ogr:Name> </ogr:test> </gml:featureMember> <gml:featureMember> <ogr:test fid="test.1"> <ogr:geometryProperty><gml:Point srsName="EPSG:4326"><gml:coordinates>1.15,49.2</gml:coordinates></gml:Point></ogr:geometryProperty> <ogr:Latitude>49.2</ogr:Latitude> <ogr:Longitude>1.15</ogr:Longitude> <ogr:Name></ogr:Name> <========= </ogr:test> </gml:featureMember> </ogr:FeatureCollection> $
Change History (4)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
test.csv *is* UTF-8 encoded and the accented characters are just silently swallowed, which looks to me like a bug.
$ file test.csv test.csv: UTF-8 Unicode text
I just recoded test.csv to ISO-8859-1, and now GML and KML driver do what you describe:
Warning 1: ▒▒▒▒▒▒ is not a valid UTF-8 string. Forcing it to ASCII. If you still want the original string and change the XML file encoding afterwards, you can define OGR_FORCE_ASCII=NO as configuration option. This warning won't be issued anymore ... <SimpleData name="Name">??????</SimpleData> ... <ogr:Name>??????</ogr:Name>
The libkml driver doesn't seem to care and dumps the ISO-8859-1 encoded text simply into the KML file, the resulting XML is obviously not well-formed.
$ ogr2ogr -f libkml libkml.kml test.vrt $ xmlwf libkml.kml libkml.kml:26:14: not well-formed (invalid token)
comment:3 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Thanks for reporting. This was indeed a (trunk) regression. (I have left the lack of checking of the LIBKML driver. Would deserve a specific ticket)
r23254 /trunk/gdal/port/cpl_string.cpp: CPLEscapeString(, CPLES_XML): avoid dropping bytes >= 128 (fix regression of #4117 - trunk only - raised in #4299)
Which encoding are your characters encoded ? It if is UTF8, then it should work with GML and KML driver. If it is not UTF8, then indeed the GML and KML driver will sanetize it (meaning remove non ASCII chars), and issue a warning telling so, so that the resulting XML file can validate. It is possible that the LIBKML driver is less strict on that, but in that case, the resulting KML will not validate...
To me, what you describe is a bug of the LIBKML driver that should also swallow those characters if they are not valid UTF-8