Opened 19 years ago
Closed 17 years ago
#755 closed defect (fixed)
minixml - can't read complex DOCTYPE elements
Reported by: | warmerdam | Owned by: | Mateusz Łoskot |
---|---|---|---|
Priority: | normal | Milestone: | 1.4.2 |
Component: | default | Version: | unspecified |
Severity: | normal | Keywords: | |
Cc: |
Description (last modified by )
The cpl_minixml.cpp is unable to consume the attached document with a complex DOCTYPE declaration that looks like this:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE WMT_MS_Capabilities SYSTEM "http://schemas.cubewerx.com/schemas/wms/1.1.2/WMT_MS_Capabilities.dtd" [ <!-- vendor-specific elements defined here --> <!ELEMENT VendorSpecificCapabilities (CubeSERV?)> <!ELEMENT CubeSERV (Extract?, MultibandLayers?)> <!ATTLIST CubeSERV version CDATA #REQUIRED> <!ELEMENT Extract (ExtractableLayers, ArchiveFormats, DCPType+)> <!ELEMENT ExtractableLayers (ExtractableLayer*)> <!ELEMENT ExtractableLayer (ExtractFormat+)> <!ATTLIST ExtractableLayer name CDATA #REQUIRED> <!ELEMENT ExtractFormat EMPTY> <!ATTLIST ExtractFormat name CDATA #REQUIRED> <!ELEMENT ArchiveFormats (ArchiveFormat+)> <!ELEMENT ArchiveFormat EMPTY> <!ATTLIST ArchiveFormat name CDATA #REQUIRED> <!ELEMENT MultibandLayers (MultibandLayer*)> <!ELEMENT MultibandLayer EMPTY> <!ATTLIST MultibandLayer name CDATA #REQUIRED numOfChannels CDATA #REQUIRED> ]> ...
Attachments (3)
Change History (11)
by , 19 years ago
Attachment: | cubeserv.cgi added |
---|
comment:4 by , 17 years ago
Description: | modified (diff) |
---|---|
Milestone: | → 1.4.2 |
Owner: | changed from | to
Priority: | high → normal |
Matuesz,
I'd appreciate your reviewing this to see if it is still a problem. I think it has already been fixed.
comment:5 by , 17 years ago
Description: | modified (diff) |
---|
comment:6 by , 17 years ago
Status: | new → assigned |
---|
by , 17 years ago
Attachment: | test_example.tar.gz added |
---|
Test program can be used to see the problem before and after it's fixed.
comment:7 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I fixed it by ignoring the whole block between [] brackets:
<!DOCTYPE RootElement [ ...declarations... ]>
So, reading and parsing markup declarations is still not supported (see comment in the code).
Fixed in r11276.
comment:8 by , 17 years ago
I added new test case minixml_3 to the autotest/gcore/minixml.py that reads XML document data/doctype.xml with complex DOCTYPE element (r11277).
comment:10 by , 17 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Mateusz,
I believe we want to capture the whole DOCTYPE declaration into the token. We don't need to interprete the stuff between [] but we do need to suck it all up. So this loop:
if( chNext == '[' ) { do { chNext = ReadChar( psContext ); } while( chNext != ']' && !EQUALN(psContext->pszInput+psContext->nInputOffset,"]>", 2) ); // Skip "]" character to point to the closing ">" chNext = ReadChar( psContext ); chNext = ReadChar( psContext ); }
is pretty good, but needs to be extended to append all the chars to the token, including the [ and ] brackets. You should verify that afterwards using the "xmlreformat" program in gdal/port that the output document preserves the whole DOCTYPE declaration.
by , 17 years ago
Attachment: | xmlreformat_out.xml added |
---|
Output from port/xmlreformat utility executed on the sample XML attached to the ticket report.
comment:11 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
The fix has been improved according to Frank's suggestions (r11319).
The attached file xmlreformat_out.xml includes output of port/xmlreformat program and states for a kind of proof of the fix.
Problem XML document (Cubeserv capabilities)