Opened 15 years ago

Closed 13 years ago

#755 closed defect (fixed)

minixml - can't read complex DOCTYPE elements

Reported by: warmerdam Owned by: Mateusz Łoskot
Priority: normal Milestone: 1.4.2
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description (last modified by Mateusz Łoskot)

The cpl_minixml.cpp is unable to consume the attached document with a complex DOCTYPE declaration that looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE WMT_MS_Capabilities SYSTEM
 "http://schemas.cubewerx.com/schemas/wms/1.1.2/WMT_MS_Capabilities.dtd"
 [
 <!-- vendor-specific elements defined here -->
 <!ELEMENT VendorSpecificCapabilities (CubeSERV?)>
 <!ELEMENT CubeSERV (Extract?, MultibandLayers?)>
 <!ATTLIST CubeSERV version CDATA #REQUIRED>
 <!ELEMENT Extract (ExtractableLayers, ArchiveFormats, DCPType+)>
 <!ELEMENT ExtractableLayers (ExtractableLayer*)>
 <!ELEMENT ExtractableLayer (ExtractFormat+)>
 <!ATTLIST ExtractableLayer name CDATA #REQUIRED>
 <!ELEMENT ExtractFormat EMPTY>
 <!ATTLIST ExtractFormat name CDATA #REQUIRED>
 <!ELEMENT ArchiveFormats (ArchiveFormat+)>
 <!ELEMENT ArchiveFormat EMPTY>
 <!ATTLIST ArchiveFormat name CDATA #REQUIRED>
 <!ELEMENT MultibandLayers (MultibandLayer*)>
 <!ELEMENT MultibandLayer EMPTY>
 <!ATTLIST MultibandLayer name CDATA #REQUIRED numOfChannels CDATA #REQUIRED>
 ]>
...

Attachments (3)

cubeserv.cgi (18.9 KB) - added by warmerdam 15 years ago.
Problem XML document (Cubeserv capabilities)
test_example.tar.gz (4.3 KB) - added by Mateusz Łoskot 13 years ago.
Test program can be used to see the problem before and after it's fixed.
xmlreformat_out.xml (22.8 KB) - added by Mateusz Łoskot 13 years ago.
Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

Download all attachments as: .zip

Change History (11)

Changed 15 years ago by warmerdam

Attachment: cubeserv.cgi added

Problem XML document (Cubeserv capabilities)

comment:4 Changed 13 years ago by warmerdam

Description: modified (diff)
Milestone: 1.4.2
Owner: changed from warmerdam to Mateusz Łoskot
Priority: highnormal

Matuesz,

I'd appreciate your reviewing this to see if it is still a problem. I think it has already been fixed.

comment:5 Changed 13 years ago by Mateusz Łoskot

Description: modified (diff)

comment:6 Changed 13 years ago by Mateusz Łoskot

Status: newassigned

Changed 13 years ago by Mateusz Łoskot

Attachment: test_example.tar.gz added

Test program can be used to see the problem before and after it's fixed.

comment:7 Changed 13 years ago by Mateusz Łoskot

Resolution: fixed
Status: assignedclosed

I fixed it by ignoring the whole block between [] brackets:

<!DOCTYPE RootElement [ ...declarations... ]>

So, reading and parsing markup declarations is still not supported (see comment in the code).

Fixed in r11276.

comment:8 Changed 13 years ago by Mateusz Łoskot

I added new test case minixml_3 to the autotest/gcore/minixml.py that reads XML document data/doctype.xml with complex DOCTYPE element (r11277).

comment:9 Changed 13 years ago by Mateusz Łoskot

I ported fixes and tests to the stable branch (r11279).

comment:10 Changed 13 years ago by warmerdam

Resolution: fixed
Status: closedreopened

Mateusz,

I believe we want to capture the whole DOCTYPE declaration into the token. We don't need to interprete the stuff between [] but we do need to suck it all up. So this loop:

            if( chNext == '[' )
            {
                do
                {
                    chNext = ReadChar( psContext );
                }
                while( chNext != ']'
                    && !EQUALN(psContext->pszInput+psContext->nInputOffset,"]>", 2) );

                // Skip "]" character to point to the closing ">"
                chNext = ReadChar( psContext );
                chNext = ReadChar( psContext );
            }

is pretty good, but needs to be extended to append all the chars to the token, including the [ and ] brackets. You should verify that afterwards using the "xmlreformat" program in gdal/port that the output document preserves the whole DOCTYPE declaration.

Changed 13 years ago by Mateusz Łoskot

Attachment: xmlreformat_out.xml added

Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

comment:11 Changed 13 years ago by Mateusz Łoskot

Resolution: fixed
Status: reopenedclosed

The fix has been improved according to Frank's suggestions (r11319).

The attached file xmlreformat_out.xml includes output of port/xmlreformat program and states for a kind of proof of the fix.

Note: See TracTickets for help on using tickets.