Opened 13 years ago

Closed 13 years ago

Last modified 12 years ago

#4117 closed defect (fixed)

GML Driver writes illegal control characters but can't read them.

Reported by: warmerdam Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: gml xml
Cc: chaitanya, Even Rouault

Description

The GML driver will write control characters like 0xB (vertical scroll) but it cannot read them - at least with Xerces due to an error like:

ERROR 1: XML Parsing Error: invalid character 0xB

A review of the XML specification http://www.w3.org/TR/2008/REC-xml-20081126/#charsets seems to support the contention of the Xerces library FAQ that most control characters are not legal in XML. In particular the only characters allowed below 0x20 are 0x9, 0xA and 0xD.

Change History (3)

comment:1 by warmerdam, 13 years ago

Resolution: fixed
Status: newclosed

Digging around I have chosen CPLEscapeString() for scheme CPLES_XML as the place to discard illegal low control characters. I'm not sure if this is entirely appropriate. I am not sure if this will interfere in multi-byte utf-8 sequences or have other unexpected side effects.

Note, my research has not suggested that these control characters are illegal in unicode though they might be.

Applied in trunk (r22526).

If folks are pretty confident in the change (I'm not yet) it could be back ported to 1.8 branch.

comment:2 by Even Rouault, 13 years ago

your assumption is correct. utf8 is such that chars below 127 are ascii. non ascii chars have their byte sequences necessaily above 128 (msb set).

comment:3 by Even Rouault, 12 years ago

r23254 /trunk/gdal/port/cpl_string.cpp: CPLEscapeString(, CPLES_XML): avoid dropping bytes >= 128 (fix regression of #4117 - trunk only - raised in #4299)

Note: See TracTickets for help on using tickets.