Ticket #755 (closed defect: fixed)

Opened 3 years ago

Last modified 1 year ago

minixml - can't read complex DOCTYPE elements

Reported by: warmerdam Assigned to: mloskot
Priority: normal Milestone: 1.4.2
Component: default Version: unspecified
Severity: normal Keywords:
Cc:

Description (Last modified by mloskot)

The cpl_minixml.cpp is unable to consume the attached document with a complex DOCTYPE declaration that looks like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE WMT_MS_Capabilities SYSTEM
 "http://schemas.cubewerx.com/schemas/wms/1.1.2/WMT_MS_Capabilities.dtd"
 [
 <!-- vendor-specific elements defined here -->
 <!ELEMENT VendorSpecificCapabilities (CubeSERV?)>
 <!ELEMENT CubeSERV (Extract?, MultibandLayers?)>
 <!ATTLIST CubeSERV version CDATA #REQUIRED>
 <!ELEMENT Extract (ExtractableLayers, ArchiveFormats, DCPType+)>
 <!ELEMENT ExtractableLayers (ExtractableLayer*)>
 <!ELEMENT ExtractableLayer (ExtractFormat+)>
 <!ATTLIST ExtractableLayer name CDATA #REQUIRED>
 <!ELEMENT ExtractFormat EMPTY>
 <!ATTLIST ExtractFormat name CDATA #REQUIRED>
 <!ELEMENT ArchiveFormats (ArchiveFormat+)>
 <!ELEMENT ArchiveFormat EMPTY>
 <!ATTLIST ArchiveFormat name CDATA #REQUIRED>
 <!ELEMENT MultibandLayers (MultibandLayer*)>
 <!ELEMENT MultibandLayer EMPTY>
 <!ATTLIST MultibandLayer name CDATA #REQUIRED numOfChannels CDATA #REQUIRED>
 ]>
...

Attachments

cubeserv.cgi (18.9 kB) - added by warmerdam on 01/28/05 02:14:56.
Problem XML document (Cubeserv capabilities)
test_example.tar.gz (4.3 kB) - added by mloskot on 04/17/07 19:31:13.
Test program can be used to see the problem before and after it's fixed.
xmlreformat_out.xml (22.8 kB) - added by mloskot on 04/20/07 19:02:22.
Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

Change History

01/28/05 02:14:56 changed by warmerdam

  • attachment cubeserv.cgi added.

Problem XML document (Cubeserv capabilities)

04/03/07 00:22:26 changed by warmerdam

  • priority changed from high to normal.
  • owner changed from warmerdam to mloskot.
  • description changed.
  • milestone set to 1.4.2.

Matuesz,

I'd appreciate your reviewing this to see if it is still a problem. I think it has already been fixed.

04/08/07 17:06:16 changed by mloskot

  • description changed.

04/17/07 16:42:12 changed by mloskot

  • status changed from new to assigned.

04/17/07 19:31:13 changed by mloskot

  • attachment test_example.tar.gz added.

Test program can be used to see the problem before and after it's fixed.

04/17/07 19:34:18 changed by mloskot

  • status changed from assigned to closed.
  • resolution set to fixed.

I fixed it by ignoring the whole block between [] brackets:

<!DOCTYPE RootElement [ ...declarations... ]>

So, reading and parsing markup declarations is still not supported (see comment in the code).

Fixed in r11276.

04/17/07 20:38:49 changed by mloskot

I added new test case minixml_3 to the autotest/gcore/minixml.py that reads XML document data/doctype.xml with complex DOCTYPE element (r11277).

04/17/07 21:29:57 changed by mloskot

I ported fixes and tests to the stable branch (r11279).

04/18/07 23:48:39 changed by warmerdam

  • status changed from closed to reopened.
  • resolution deleted.

Mateusz,

I believe we want to capture the whole DOCTYPE declaration into the token. We don't need to interprete the stuff between [] but we do need to suck it all up. So this loop:

            if( chNext == '[' )
            {
                do
                {
                    chNext = ReadChar( psContext );
                }
                while( chNext != ']'
                    && !EQUALN(psContext->pszInput+psContext->nInputOffset,"]>", 2) );

                // Skip "]" character to point to the closing ">"
                chNext = ReadChar( psContext );
                chNext = ReadChar( psContext );
            }

is pretty good, but needs to be extended to append all the chars to the token, including the [ and ] brackets. You should verify that afterwards using the "xmlreformat" program in gdal/port that the output document preserves the whole DOCTYPE declaration.

04/20/07 19:02:22 changed by mloskot

  • attachment xmlreformat_out.xml added.

Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.

04/20/07 19:04:37 changed by mloskot

  • status changed from reopened to closed.
  • resolution set to fixed.

The fix has been improved according to Frank's suggestions (r11319).

The attached file xmlreformat_out.xml includes output of port/xmlreformat program and states for a kind of proof of the fix.