Opened 17 years ago

Closed 16 years ago

Last modified 16 years ago

#1844 closed defect (fixed)

GML reader truncates field values

Reported by: nf Owned by: Mateusz Łoskot
Priority: normal Milestone: 1.4.4
Component: OGR_SF Version: 1.4.0
Severity: critical Keywords: gml
Cc: warmerdam, maphew

Description

When I attempt to convert a GML file using ogr2ogr, the generated .gfs file stores the incorrect field width of at least some string elements. (It appears to be selecting only a single feature (aka: record) from the GML file to calculate the widths. While this is reasonably valid behaviour for detecting datatype and the field's name, it is very bad for field width.)

The resultant shapefile contains truncated values, which cause serious problems when the shapefiles are displayed. (In this case, I am creating tropical cyclone advisories, and the result during testing is that some of the coastal warning areas were not highlighted, as the identifiers in the shapefile were corrupt.)

A workaround is to manually manage build the .gfs myself. While this is tolerable for a couple of products, it is unacceptable for large-scale work.

IMHO this is a major bug, mainly because many users will not detect the data corruption for quite a while - by which time, they may have deleted the original GML files.

Attachments (3)

areas.gml (89.9 KB ) - added by nf 17 years ago.
Sample GML file - areaType is 'Warning Area' (12 chars)
areas.gfs (1.3 KB ) - added by nf 17 years ago.
Auto-generated GFS which shows 10 character areaType
gml2gfs.xsl (9.4 KB ) - added by nf 17 years ago.
XSL to generate a GFS from a GML, working around this bug

Download all attachments as: .zip

Change History (18)

by nf, 17 years ago

Attachment: areas.gml added

Sample GML file - areaType is 'Warning Area' (12 chars)

by nf, 17 years ago

Attachment: areas.gfs added

Auto-generated GFS which shows 10 character areaType

comment:1 by warmerdam, 17 years ago

Cc: warmerdam added
Keywords: gml added
Milestone: 1.4.3
Owner: changed from warmerdam to Mateusz Łoskot
Priority: highnormal

Mateusz,

Could you try and reproduce this? The intention I believe was that the gml reader produces the field widths for string values based on the longest string encountered in a scan of the whole file.

by nf, 17 years ago

Attachment: gml2gfs.xsl added

XSL to generate a GFS from a GML, working around this bug

comment:2 by nf, 17 years ago

A clarification: an incorrect .gfs only appears to cause data loss if the resultant data format is fixed-width (e.g. ESRI shapefile).

ie: ogr2ogr -f GML does not cause data loss, but ogr2ogr -2 "ESRI Shapefile" does.

For anyone who is experiencing this problem, I have attached a workaround which correctly calculates the maximum string length of the GML file.

comment:3 by Mateusz Łoskot, 16 years ago

Status: newassigned

comment:4 by Mateusz Łoskot, 16 years ago

Resolution: fixed
Status: assignedclosed

I applied small patch for this issue so it's supposed to be fixed now (r12279).

I tested with various files consisting of longer-first or shorter-first attribute values order.

comment:5 by Mateusz Łoskot, 16 years ago

The fix has been ported to the branches/1.4 (r12511)

comment:6 by maphew, 16 years ago

Resolution: fixed
Status: closedreopened

I just got bit by this bug HARD. 4 full days worth of work down the tube. Output of attached test-case (minus wget and 7zip messages), note the length of ID does not match. I think a bug of this nature is critical, not major.

GDAL 1.5dev, FWTools 2.0.0, released 2007/11/12
  ID (String) = 11CF43A8EFEAE5F4E0409C8467120387
  ID (String) = 11CF43A8EFEAE5F4E0409C84671203

comment:7 by maphew, 16 years ago

@echo off
REM
REM gml-truncate-test-case.bat
REM
wget --continue ftp://ftp2.cits.rncan.gc.ca/pub/canvec/50k_gml/105/a/canvec_105a01_gml.zip
7z x canvec_105a01_gml.zip
REM or 'unzip canvec_105a01_gml.zip'

gdalinfo --version
ogrinfo 105a01_1_0.gml LI_1210009_2 | grep "ID (String)"
ogr2ogr -f "esri shapefile" outshp 105a01_1_0.gml LI_1210009_2
ogrinfo outshp LI_1210009_2 | grep "ID (String)"

comment:8 by maphew, 16 years ago

Severity: majorcritical

comment:9 by Mateusz Łoskot, 16 years ago

Maphew, I'm really sorry.

I thought this issue has been fixed. At least, I tested the original files and applied patches the fixed this issue for the files attached to the report.

I'm working on it today.

comment:10 by Mateusz Łoskot, 16 years ago

Milestone: 1.4.31.4.4

comment:11 by Mateusz Łoskot, 16 years ago

Resolution: fixed
Status: reopenedclosed

I've fixed this bug in trunk (r12781) and branches/1.4 (r12782)

comment:12 by Mateusz Łoskot, 16 years ago

Cc: maphew added

Please, check if the problem has been fixed for you and reopen the ticket if it hasn't.

comment:13 by maphew, 16 years ago

As of fwtools 2.0.1 (win32) it appears the conversion from gml no longer truncates fields. Thank you!

comment:14 by c911469, 16 years ago

Hi, I have a very similar problem which I'm not sure is related to this problem or unrelated. I am also using tropical cyclone data - I have gml files that I am viewing in QGIS which truncates part of the data in the "date" attribute/element eg month & day-of-month. I am also using FWTools 2.0.3 (Win32), and it too loses part of the date in the same way - using ogrinfo or ogr2ogr has the same result. The Date field is specified as "xs:string" in the ".xsd" file, but in the resulting ".gfs" file it comes out as "integer". Dates are specified originally as "2007-01-23", but result as "2007". Should I log this as a new fault, or is this part of this original fault? (I have sample data files if required.)

comment:15 by Mateusz Łoskot, 16 years ago

c911469,

Please submit your sample GML files with exact ogrinfo and ogr2ogr commands you've used.

I'd suggest to reopen new ticket indicating in title that it's about truncating date values and attach your files to it. This way your attachments won't be mixed with files that belong to this ticket. IMHO it will be less confusing.

Note: See TracTickets for help on using tickets.