Ticket #4117 (closed defect: fixed)

Opened 2 years ago

Last modified 20 months ago

GML Driver writes illegal control characters but can't read them.

Reported by: warmerdam Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: gml xml
Cc: chaitanya, rouault

Description

The GML driver will write control characters like 0xB (vertical scroll) but it cannot read them - at least with Xerces due to an error like:

ERROR 1: XML Parsing Error: invalid character 0xB

A review of the XML specification  http://www.w3.org/TR/2008/REC-xml-20081126/#charsets seems to support the contention of the Xerces library FAQ that most control characters are not legal in XML. In particular the only characters allowed below 0x20 are 0x9, 0xA and 0xD.

Change History

Changed 2 years ago by warmerdam

  • status changed from new to closed
  • resolution set to fixed

Digging around I have chosen CPLEscapeString() for scheme CPLES_XML as the place to discard illegal low control characters. I'm not sure if this is entirely appropriate. I am not sure if this will interfere in multi-byte utf-8 sequences or have other unexpected side effects.

Note, my research has not suggested that these control characters are illegal in unicode though they might be.

Applied in trunk (r22526).

If folks are pretty confident in the change (I'm not yet) it could be back ported to 1.8 branch.

Changed 2 years ago by rouault

your assumption is correct. utf8 is such that chars below 127 are ascii. non ascii chars have their byte sequences necessaily above 128 (msb set).

Changed 20 months ago by rouault

r23254 /trunk/gdal/port/cpl_string.cpp: CPLEscapeString(, CPLES_XML): avoid dropping bytes >= 128 (fix regression of #4117 - trunk only - raised in #4299)

Note: See TracTickets for help on using tickets.