Opened 14 years ago

Closed 7 years ago

#2738 closed defect (fixed)

OGR XML-based drivers vulnerable to malicious content

Reported by: Even Rouault Owned by: Even Rouault
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: ogr xml vulnerability
Cc: warmerdam, pka, condit


The GML, Interlis 2, KML, GPX and GeoRSS drivers are all vulnerable to several malicous XML content, in particular to the "million laugh" attack that, which in the case of Expat and Xerces, causes the process to hang forever. This attack is particularly nasty because the size of the files that trigger it can be very short, so a criteria based on file size will not prevent it. A radical workaround is to filter out any XML content that uses the <!ENTITY> feature. This is not the approach taken here.

The commit to follow will fix the million laugh attack :

  • in the Expat based drivers by detecting that the character callback is called more often than the size of the buffer being processed.
  • in the Xerces based drivers by adding a startEntity() method that counts the number of times it is called consecutively without the startElement/endElements callbacks to be called (limit authorized : 1000 calls).

This will let the drivers still work with reasonnable usage of <!ENTITY>.

The GPX and GeoRSS drivers will also be fixed against most attacks based on very long files :

  • many elements names (<a1></a1><a2></a2><a3></a3>...) : causes Expat to allocate lots of memory (Expat has a hash set with all the distinc element names found during parsing), and causes the driver to create a lot of field definitions, which has a quadratic complexity. Fix : use Expat's ability of providing custom functions for memory allocation and return NULL if Expat tries to allocate more than 100 000 bytes at once. This will cause the parsing to be interrupted. Also, interrupt the parsing if more than 100 fields have been created.
  • many attributes (<a attr1="" attr2="" attr3=""... >) : the startElement() callback is not called while the attribute list has not finished. Fix : if after 10 buffers being processed, none of the beginElement(), endElement() and characherData() callback has been called, interrupt the parsing.
  • many characters in an element (<a>content.......very long</a>) : ends up in a out-of-memory situation. Fix : test the result of VSIRealloc against NULL and interrupt the parsing if the total size of the current buffer exceeds 100 000 bytes.

Due to lack of familiarity with the other drivers and/or Xerces, only partial fixes will be applied for :

  • The KML driver : against the many element names and many attributes cases.
  • The GML driver : against the out-of-memory situation.

Change History (4)

comment:1 by Even Rouault, 14 years ago

Above mentionned fixes done in trunk in r15947

comment:2 by Even Rouault, 14 years ago

In r16673, restore compatibility with expat >= 1.95.0 and expat < 1.95.8

comment:3 by Jukka Rahkonen, 8 years ago

Was the work done for r15947 and r16673 good enough so that this ticket could be closed?

comment:4 by Jukka Rahkonen, 7 years ago

Resolution: fixed
Status: newclosed

So many new XML vulnerabilities have appeared since 2008 that it this was not fixed then the ticket is probably worthless anyhow.

Note: See TracTickets for help on using tickets.