Ticket #1844 (closed defect: fixed)

Opened 10 months ago

Last modified 6 months ago

GML reader truncates field values

Reported by: nf Assigned to: mloskot
Priority: normal Milestone: 1.4.4
Component: OGR_SF Version: 1.4.0
Severity: critical Keywords: gml
Cc: warmerdam, maphew

Description

When I attempt to convert a GML file using ogr2ogr, the generated .gfs file stores the incorrect field width of at least some string elements. (It appears to be selecting only a single feature (aka: record) from the GML file to calculate the widths. While this is reasonably valid behaviour for detecting datatype and the field's name, it is very bad for field width.)

The resultant shapefile contains truncated values, which cause serious problems when the shapefiles are displayed. (In this case, I am creating tropical cyclone advisories, and the result during testing is that some of the coastal warning areas were not highlighted, as the identifiers in the shapefile were corrupt.)

A workaround is to manually manage build the .gfs myself. While this is tolerable for a couple of products, it is unacceptable for large-scale work.

IMHO this is a major bug, mainly because many users will not detect the data corruption for quite a while - by which time, they may have deleted the original GML files.

Attachments

areas.gml (89.9 kB) - added by nf on 09/17/07 04:25:16.
Sample GML file - areaType is 'Warning Area' (12 chars)
areas.gfs (1.3 kB) - added by nf on 09/17/07 04:26:53.
Auto-generated GFS which shows 10 character areaType
gml2gfs.xsl (9.4 kB) - added by nf on 09/19/07 03:06:19.
XSL to generate a GFS from a GML, working around this bug

Change History

09/17/07 04:25:16 changed by nf

  • attachment areas.gml added.

Sample GML file - areaType is 'Warning Area' (12 chars)

09/17/07 04:26:53 changed by nf

  • attachment areas.gfs added.

Auto-generated GFS which shows 10 character areaType

09/17/07 09:33:45 changed by warmerdam

  • priority changed from high to normal.
  • keywords set to gml.
  • cc set to warmerdam.
  • owner changed from warmerdam to mloskot.
  • milestone set to 1.4.3.

Mateusz,

Could you try and reproduce this? The intention I believe was that the gml reader produces the field widths for string values based on the longest string encountered in a scan of the whole file.

09/19/07 03:06:19 changed by nf

  • attachment gml2gfs.xsl added.

XSL to generate a GFS from a GML, working around this bug

09/19/07 03:09:25 changed by nf

A clarification: an incorrect .gfs only appears to cause data loss if the resultant data format is fixed-width (e.g. ESRI shapefile).

ie: ogr2ogr -f GML does not cause data loss, but ogr2ogr -2 "ESRI Shapefile" does.

For anyone who is experiencing this problem, I have attached a workaround which correctly calculates the maximum string length of the GML file.

09/28/07 14:19:57 changed by mloskot

  • status changed from new to assigned.

09/28/07 19:44:04 changed by mloskot

  • status changed from assigned to closed.
  • resolution set to fixed.

I applied small patch for this issue so it's supposed to be fixed now (r12279).

I tested with various files consisting of longer-first or shorter-first attribute values order.

10/23/07 09:04:15 changed by mloskot

The fix has been ported to the branches/1.4 (r12511)

11/14/07 19:13:02 changed by maphew

  • status changed from closed to reopened.
  • resolution deleted.

I just got bit by this bug HARD. 4 full days worth of work down the tube. Output of attached test-case (minus wget and 7zip messages), note the length of ID does not match. I think a bug of this nature is critical, not major.

GDAL 1.5dev, FWTools 2.0.0, released 2007/11/12
  ID (String) = 11CF43A8EFEAE5F4E0409C8467120387
  ID (String) = 11CF43A8EFEAE5F4E0409C84671203

11/14/07 19:14:05 changed by maphew

@echo off
REM
REM gml-truncate-test-case.bat
REM
wget --continue ftp://ftp2.cits.rncan.gc.ca/pub/canvec/50k_gml/105/a/canvec_105a01_gml.zip
7z x canvec_105a01_gml.zip
REM or 'unzip canvec_105a01_gml.zip'

gdalinfo --version
ogrinfo 105a01_1_0.gml LI_1210009_2 | grep "ID (String)"
ogr2ogr -f "esri shapefile" outshp 105a01_1_0.gml LI_1210009_2
ogrinfo outshp LI_1210009_2 | grep "ID (String)"

11/14/07 19:22:28 changed by maphew

  • severity changed from major to critical.

11/15/07 05:59:37 changed by mloskot

Maphew, I'm really sorry.

I thought this issue has been fixed. At least, I tested the original files and applied patches the fixed this issue for the files attached to the report.

I'm working on it today.

11/15/07 10:22:57 changed by mloskot

  • milestone changed from 1.4.3 to 1.4.4.

11/15/07 10:24:53 changed by mloskot

  • status changed from reopened to closed.
  • resolution set to fixed.

I've fixed this bug in trunk (r12781) and branches/1.4 (r12782)

11/15/07 10:33:39 changed by mloskot

  • cc changed from warmerdam to warmerdam, maphew.

Please, check if the problem has been fixed for you and reopen the ticket if it hasn't.

11/29/07 17:24:43 changed by maphew

As of fwtools 2.0.1 (win32) it appears the conversion from gml no longer truncates fields. Thank you!

01/08/08 01:21:10 changed by c911469

Hi, I have a very similar problem which I'm not sure is related to this problem or unrelated. I am also using tropical cyclone data - I have gml files that I am viewing in QGIS which truncates part of the data in the "date" attribute/element eg month & day-of-month. I am also using FWTools 2.0.3 (Win32), and it too loses part of the date in the same way - using ogrinfo or ogr2ogr has the same result. The Date field is specified as "xs:string" in the ".xsd" file, but in the resulting ".gfs" file it comes out as "integer". Dates are specified originally as "2007-01-23", but result as "2007". Should I log this as a new fault, or is this part of this original fault? (I have sample data files if required.)

01/08/08 01:29:10 changed by mloskot

c911469,

Please submit your sample GML files with exact ogrinfo and ogr2ogr commands you've used.

I'd suggest to reopen new ticket indicating in title that it's about truncating date values and attach your files to it. This way your attachments won't be mixed with files that belong to this ticket. IMHO it will be less confusing.