Ticket #755 (closed defect: fixed)

Opened 7 years ago

Last modified 5 years ago

minixml - can't read complex DOCTYPE elements

Reported by: warmerdam Owned by: mloskot
Priority: normal Milestone: 1.4.2
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description (last modified by mloskot) (diff)

The cpl_minixml.cpp is unable to consume the attached document with a complex DOCTYPE declaration that looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE WMT_MS_Capabilities SYSTEM
 "http://schemas.cubewerx.com/schemas/wms/1.1.2/WMT_MS_Capabilities.dtd"
 [
 <!-- vendor-specific elements defined here -->
 <!ELEMENT VendorSpecificCapabilities (CubeSERV?)>
 <!ELEMENT CubeSERV (Extract?, MultibandLayers?)>
 <!ATTLIST CubeSERV version CDATA #REQUIRED>
 <!ELEMENT Extract (ExtractableLayers, ArchiveFormats, DCPType+)>
 <!ELEMENT ExtractableLayers (ExtractableLayer*)>
 <!ELEMENT ExtractableLayer (ExtractFormat+)>
 <!ATTLIST ExtractableLayer name CDATA #REQUIRED>
 <!ELEMENT ExtractFormat EMPTY>
 <!ATTLIST ExtractFormat name CDATA #REQUIRED>
 <!ELEMENT ArchiveFormats (ArchiveFormat+)>
 <!ELEMENT ArchiveFormat EMPTY>
 <!ATTLIST ArchiveFormat name CDATA #REQUIRED>
 <!ELEMENT MultibandLayers (MultibandLayer*)>
 <!ELEMENT MultibandLayer EMPTY>
 <!ATTLIST MultibandLayer name CDATA #REQUIRED numOfChannels CDATA #REQUIRED>
 ]>
...

Attachments

cubeserv.cgi Download (18.9 KB) - added by warmerdam 7 years ago.
Problem XML document (Cubeserv capabilities)
test_example.tar.gz Download (4.3 KB) - added by mloskot 5 years ago.
Test program can be used to see the problem before and after it's fixed.
xmlreformat_out.xml Download (22.8 KB) - added by mloskot 5 years ago.
Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

Change History

Changed 7 years ago by warmerdam

Problem XML document (Cubeserv capabilities)

Changed 5 years ago by warmerdam

  • owner changed from warmerdam to mloskot
  • priority changed from high to normal
  • description modified (diff)
  • milestone set to 1.4.2

Matuesz,

I'd appreciate your reviewing this to see if it is still a problem. I think it has already been fixed.

Changed 5 years ago by mloskot

  • description modified (diff)

Changed 5 years ago by mloskot

  • status changed from new to assigned

Changed 5 years ago by mloskot

Test program can be used to see the problem before and after it's fixed.

Changed 5 years ago by mloskot

  • status changed from assigned to closed
  • resolution set to fixed

I fixed it by ignoring the whole block between [] brackets:

<!DOCTYPE RootElement [ ...declarations... ]>

So, reading and parsing markup declarations is still not supported (see comment in the code).

Fixed in r11276.

Changed 5 years ago by mloskot

I added new test case minixml_3 to the autotest/gcore/minixml.py that reads XML document data/doctype.xml with complex DOCTYPE element (r11277).

Changed 5 years ago by mloskot

I ported fixes and tests to the stable branch (r11279).

Changed 5 years ago by warmerdam

  • status changed from closed to reopened
  • resolution fixed deleted

Mateusz,

I believe we want to capture the whole DOCTYPE declaration into the token. We don't need to interprete the stuff between [] but we do need to suck it all up. So this loop:

            if( chNext == '[' )
            {
                do
                {
                    chNext = ReadChar( psContext );
                }
                while( chNext != ']'
                    && !EQUALN(psContext->pszInput+psContext->nInputOffset,"]>", 2) );

                // Skip "]" character to point to the closing ">"
                chNext = ReadChar( psContext );
                chNext = ReadChar( psContext );
            }

is pretty good, but needs to be extended to append all the chars to the token, including the [ and ] brackets. You should verify that afterwards using the "xmlreformat" program in gdal/port that the output document preserves the whole DOCTYPE declaration.

Changed 5 years ago by mloskot

Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

Changed 5 years ago by mloskot

  • status changed from reopened to closed
  • resolution set to fixed

The fix has been improved according to Frank's suggestions (r11319).

The attached file xmlreformat_out.xml includes output of port/xmlreformat program and states for a kind of proof of the fix.

Note: See TracTickets for help on using tickets.