Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#2264 closed defect (fixed)

KML Driver creates invalid output

Reported by: darkblueB Owned by: condit
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: KML
Cc: warmerdam

Description

When converting a specific file, the current KML code creates a file which has a (null) tag for the DATE element in the Schema section, and contains damaged UTF-8 elsewhere... (522 damaged lines, for example line 146 in the attached output)

Change History (9)

comment:1 Changed 12 years ago by darkblueB

comment:2 Changed 12 years ago by Even Rouault

Cc: warmerdam added
Owner: changed from warmerdam to condit

comment:3 Changed 12 years ago by condit

Status: newassigned

Right, currently KML lacks support for date and/or time types. I'll make sure they go out as strings. As for the UTF-8 problem, I'm not seeing it. In your example file I fixed the EST_DATE line (18) to : <SimpleField? name="EST_DATE" type="string"></SimpleField?> And everything renders correctly for me in Google Earth, include the umlaut on line 146. Why do you think the UTF-8 is faulty?

I'll check in a patch to resolve that first issue ASAP... -Chris

comment:4 in reply to:  3 Changed 12 years ago by condit

OK - I've checked in ogrkmllayer.cpp which should resolve the first part of this problem. Please check when you get a chance...

As per Brian's email:

The UTF-8 is an unsolved mystery then I have a high-end text editor, which checks the UTF-8 on opening a file It clearly warns of problem encoding.. and to make things weirder, my Google Earth shows the character as damaged in the HTML balloon, too

Mac OS 10.4.10 BBEdit 8.6.2 Google Earth 4.2.205.5730 PowerPC processor

For me: Windows XP SP 2 Notepad++ Google Earth 4.2.0181.2634 (beta) And the file renders correctly in GE and my text editor...

comment:5 Changed 12 years ago by condit

per email: Brian, you're right. I looked at the hex for this file and now I see the problem. Currently, the e w/umlaut is encoded in as f6 which would work for windows. If this were correctly utf-8 encoded this should be c3 b6 (two bytes). I'm not overly familiar with character encodings in the GDAL API so I'll look into this and get back to you.

What happens on mac / linux if you change the xml declaration to: <?xml version="1.0" encoding="ISO-8859-1" ?>

I think that should solve the problem. If so, please let me know and I'll adjust the output from the KML driver.

comment:6 Changed 12 years ago by Even Rouault

Chris, I think you shouldn't tweak too much the encoding parameter in the KML output. Basically, OGR doesn't know which encoding its input data comes from, so there's nothing we can really do too make it bulletproof. Brian's data happens here to be ISO-8859-1, but it could have been another encoding.

RFC 5 deals with encoding, but it has not yet been adopted. One of the difficulties I can see is that we have in some/most cases no way to know which encoding is used in the data we read. The exception are XML-based formats like KML...

comment:7 in reply to:  6 Changed 12 years ago by condit

Resolution: fixed
Status: assignedclosed

That's absolutely correct. I hadn't read much about RFC 5 and thought there was some sort of UTF-8 support in OGR, but not yet. I'm going to close this ticket since the main problem is resolved (although date types aren't supported in kml) and if/when OGR gets encoding support correct the output...

comment:8 Changed 12 years ago by crschmidt

The KML output declares the output encoding, which is (in my opinion) the bug here. Since GDAL has no knowledge about the encoding the KML should be in:

  • The default encoding should be the one most likely to be used in the data. (I believe this is currently true.)
  • The user should be able to use their knowledge of the data they are converting to override the encoding. (I don't believe this is currently true.)

A format/datasource creation option such as KML_ENCODING (or just OUTPUT_ENCODING) to override the string used in the KML output seems to me like it would allow users to get what they need without the need for deeper support in OGR...

(Was writing this before condit's comment; commenting for the record, but not reopening.)

comment:9 Changed 12 years ago by condit

I think users who are aware of encoding issues will be savvy enough to open the file and edit the xml declaration. But if there's a need for this, please file a feature request and I'll add it in the next round of code changes...

Note: See TracTickets for help on using tickets.