Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1844 closed defect (fixed)

GML reader truncates field values

Reported by: nf Owned by: Mateusz Łoskot
Priority: normal Milestone: 1.4.4
Component: OGR_SF Version: 1.4.0
Severity: critical Keywords: gml
Cc: warmerdam, maphew

Description

When I attempt to convert a GML file using ogr2ogr, the generated .gfs file stores the incorrect field width of at least some string elements. (It appears to be selecting only a single feature (aka: record) from the GML file to calculate the widths. While this is reasonably valid behaviour for detecting datatype and the field's name, it is very bad for field width.)

The resultant shapefile contains truncated values, which cause serious problems when the shapefiles are displayed. (In this case, I am creating tropical cyclone advisories, and the result during testing is that some of the coastal warning areas were not highlighted, as the identifiers in the shapefile were corrupt.)

A workaround is to manually manage build the .gfs myself. While this is tolerable for a couple of products, it is unacceptable for large-scale work.

IMHO this is a major bug, mainly because many users will not detect the data corruption for quite a while - by which time, they may have deleted the original GML files.

Attachments (3)

areas.gml (89.9 KB) - added by nf 12 years ago.
Sample GML file - areaType is 'Warning Area' (12 chars)
areas.gfs (1.3 KB) - added by nf 12 years ago.
Auto-generated GFS which shows 10 character areaType
gml2gfs.xsl (9.4 KB) - added by nf 12 years ago.
XSL to generate a GFS from a GML, working around this bug

Download all attachments as: .zip

Change History (18)

Changed 12 years ago by nf

Attachment: areas.gml added

Sample GML file - areaType is 'Warning Area' (12 chars)

Changed 12 years ago by nf

Attachment: areas.gfs added

Auto-generated GFS which shows 10 character areaType

comment:1 Changed 12 years ago by warmerdam

Cc: warmerdam added
Keywords: gml added
Milestone: 1.4.3
Owner: changed from warmerdam to Mateusz Łoskot
Priority: highnormal

Mateusz,

Could you try and reproduce this? The intention I believe was that the gml reader produces the field widths for string values based on the longest string encountered in a scan of the whole file.

Changed 12 years ago by nf

Attachment: gml2gfs.xsl added

XSL to generate a GFS from a GML, working around this bug

comment:2 Changed 12 years ago by nf

A clarification: an incorrect .gfs only appears to cause data loss if the resultant data format is fixed-width (e.g. ESRI shapefile).

ie: ogr2ogr -f GML does not cause data loss, but ogr2ogr -2 "ESRI Shapefile" does.

For anyone who is experiencing this problem, I have attached a workaround which correctly calculates the maximum string length of the GML file.

comment:3 Changed 12 years ago by Mateusz Łoskot

Status: newassigned

comment:4 Changed 12 years ago by Mateusz Łoskot

Resolution: fixed
Status: assignedclosed

I applied small patch for this issue so it's supposed to be fixed now (r12279).

I tested with various files consisting of longer-first or shorter-first attribute values order.

comment:5 Changed 12 years ago by Mateusz Łoskot

The fix has been ported to the branches/1.4 (r12511)

comment:6 Changed 12 years ago by maphew

Resolution: fixed
Status: closedreopened

I just got bit by this bug HARD. 4 full days worth of work down the tube. Output of attached test-case (minus wget and 7zip messages), note the length of ID does not match. I think a bug of this nature is critical, not major.

GDAL 1.5dev, FWTools 2.0.0, released 2007/11/12
  ID (String) = 11CF43A8EFEAE5F4E0409C8467120387
  ID (String) = 11CF43A8EFEAE5F4E0409C84671203

comment:7 Changed 12 years ago by maphew

@echo off
REM
REM gml-truncate-test-case.bat
REM
wget --continue ftp://ftp2.cits.rncan.gc.ca/pub/canvec/50k_gml/105/a/canvec_105a01_gml.zip
7z x canvec_105a01_gml.zip
REM or 'unzip canvec_105a01_gml.zip'

gdalinfo --version
ogrinfo 105a01_1_0.gml LI_1210009_2 | grep "ID (String)"
ogr2ogr -f "esri shapefile" outshp 105a01_1_0.gml LI_1210009_2
ogrinfo outshp LI_1210009_2 | grep "ID (String)"

comment:8 Changed 12 years ago by maphew

Severity: majorcritical

comment:9 Changed 12 years ago by Mateusz Łoskot

Maphew, I'm really sorry.

I thought this issue has been fixed. At least, I tested the original files and applied patches the fixed this issue for the files attached to the report.

I'm working on it today.

comment:10 Changed 12 years ago by Mateusz Łoskot

Milestone: 1.4.31.4.4

comment:11 Changed 12 years ago by Mateusz Łoskot

Resolution: fixed
Status: reopenedclosed

I've fixed this bug in trunk (r12781) and branches/1.4 (r12782)

comment:12 Changed 12 years ago by Mateusz Łoskot

Cc: maphew added

Please, check if the problem has been fixed for you and reopen the ticket if it hasn't.

comment:13 Changed 12 years ago by maphew

As of fwtools 2.0.1 (win32) it appears the conversion from gml no longer truncates fields. Thank you!

comment:14 Changed 12 years ago by c911469

Hi, I have a very similar problem which I'm not sure is related to this problem or unrelated. I am also using tropical cyclone data - I have gml files that I am viewing in QGIS which truncates part of the data in the "date" attribute/element eg month & day-of-month. I am also using FWTools 2.0.3 (Win32), and it too loses part of the date in the same way - using ogrinfo or ogr2ogr has the same result. The Date field is specified as "xs:string" in the ".xsd" file, but in the resulting ".gfs" file it comes out as "integer". Dates are specified originally as "2007-01-23", but result as "2007". Should I log this as a new fault, or is this part of this original fault? (I have sample data files if required.)

comment:15 Changed 12 years ago by Mateusz Łoskot

c911469,

Please submit your sample GML files with exact ogrinfo and ogr2ogr commands you've used.

I'd suggest to reopen new ticket indicating in title that it's about truncating date values and attach your files to it. This way your attachments won't be mixed with files that belong to this ticket. IMHO it will be less confusing.

Note: See TracTickets for help on using tickets.