#4117 closed defect (fixed)
GML Driver writes illegal control characters but can't read them.
Reported by: | warmerdam | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | gml xml |
Cc: | chaitanya, Even Rouault |
Description
The GML driver will write control characters like 0xB (vertical scroll) but it cannot read them - at least with Xerces due to an error like:
ERROR 1: XML Parsing Error: invalid character 0xB
A review of the XML specification http://www.w3.org/TR/2008/REC-xml-20081126/#charsets seems to support the contention of the Xerces library FAQ that most control characters are not legal in XML. In particular the only characters allowed below 0x20 are 0x9, 0xA and 0xD.
Change History (3)
comment:1 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:2 by , 13 years ago
your assumption is correct. utf8 is such that chars below 127 are ascii. non ascii chars have their byte sequences necessaily above 128 (msb set).
comment:3 by , 13 years ago
Note:
See TracTickets
for help on using tickets.
Digging around I have chosen CPLEscapeString() for scheme CPLES_XML as the place to discard illegal low control characters. I'm not sure if this is entirely appropriate. I am not sure if this will interfere in multi-byte utf-8 sequences or have other unexpected side effects.
Note, my research has not suggested that these control characters are illegal in unicode though they might be.
Applied in trunk (r22526).
If folks are pretty confident in the change (I'm not yet) it could be back ported to 1.8 branch.