Ticket #1844 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

GML reader truncates field values

Reported by: nf Owned by: mloskot
Priority: normal Milestone: 1.4.4
Component: OGR_SF Version: 1.4.0
Severity: critical Keywords: gml
Cc: warmerdam, maphew

Description

When I attempt to convert a GML file using ogr2ogr, the generated .gfs file stores the incorrect field width of at least some string elements. (It appears to be selecting only a single feature (aka: record) from the GML file to calculate the widths. While this is reasonably valid behaviour for detecting datatype and the field's name, it is very bad for field width.)

The resultant shapefile contains truncated values, which cause serious problems when the shapefiles are displayed. (In this case, I am creating tropical cyclone advisories, and the result during testing is that some of the coastal warning areas were not highlighted, as the identifiers in the shapefile were corrupt.)

A workaround is to manually manage build the .gfs myself. While this is tolerable for a couple of products, it is unacceptable for large-scale work.

IMHO this is a major bug, mainly because many users will not detect the data corruption for quite a while - by which time, they may have deleted the original GML files.

Attachments

areas.gml Download (89.9 KB) - added by nf 4 years ago.
Sample GML file - areaType is 'Warning Area' (12 chars)
areas.gfs Download (1.3 KB) - added by nf 4 years ago.
Auto-generated GFS which shows 10 character areaType
gml2gfs.xsl Download (9.4 KB) - added by nf 4 years ago.
XSL to generate a GFS from a GML, working around this bug

Change History

Changed 4 years ago by nf

Sample GML file - areaType is 'Warning Area' (12 chars)

Changed 4 years ago by nf

Auto-generated GFS which shows 10 character areaType

Changed 4 years ago by warmerdam

  • cc warmerdam added
  • keywords gml added
  • priority changed from high to normal
  • owner changed from warmerdam to mloskot
  • milestone set to 1.4.3

Mateusz,

Could you try and reproduce this? The intention I believe was that the gml reader produces the field widths for string values based on the longest string encountered in a scan of the whole file.

Changed 4 years ago by nf

XSL to generate a GFS from a GML, working around this bug

Changed 4 years ago by nf

A clarification: an incorrect .gfs only appears to cause data loss if the resultant data format is fixed-width (e.g. ESRI shapefile).

ie: ogr2ogr -f GML does not cause data loss, but ogr2ogr -2 "ESRI Shapefile" does.

For anyone who is experiencing this problem, I have attached a workaround which correctly calculates the maximum string length of the GML file.

Changed 4 years ago by mloskot

  • status changed from new to assigned

Changed 4 years ago by mloskot

  • status changed from assigned to closed
  • resolution set to fixed

I applied small patch for this issue so it's supposed to be fixed now (r12279).

I tested with various files consisting of longer-first or shorter-first attribute values order.

Changed 4 years ago by mloskot

The fix has been ported to the branches/1.4 (r12511)

Changed 4 years ago by maphew

  • status changed from closed to reopened
  • resolution fixed deleted

I just got bit by this bug HARD. 4 full days worth of work down the tube. Output of attached test-case (minus wget and 7zip messages), note the length of ID does not match. I think a bug of this nature is critical, not major.

GDAL 1.5dev, FWTools 2.0.0, released 2007/11/12
  ID (String) = 11CF43A8EFEAE5F4E0409C8467120387
  ID (String) = 11CF43A8EFEAE5F4E0409C84671203

Changed 4 years ago by maphew

@echo off
REM
REM gml-truncate-test-case.bat
REM
wget --continue ftp://ftp2.cits.rncan.gc.ca/pub/canvec/50k_gml/105/a/canvec_105a01_gml.zip
7z x canvec_105a01_gml.zip
REM or 'unzip canvec_105a01_gml.zip'

gdalinfo --version
ogrinfo 105a01_1_0.gml LI_1210009_2 | grep "ID (String)"
ogr2ogr -f "esri shapefile" outshp 105a01_1_0.gml LI_1210009_2
ogrinfo outshp LI_1210009_2 | grep "ID (String)"

Changed 4 years ago by maphew

  • severity changed from major to critical

Changed 4 years ago by mloskot

Maphew, I'm really sorry.

I thought this issue has been fixed. At least, I tested the original files and applied patches the fixed this issue for the files attached to the report.

I'm working on it today.

Changed 4 years ago by mloskot

  • milestone changed from 1.4.3 to 1.4.4

Changed 4 years ago by mloskot

  • status changed from reopened to closed
  • resolution set to fixed

I've fixed this bug in trunk (r12781) and branches/1.4 (r12782)

Changed 4 years ago by mloskot

  • cc maphew added

Please, check if the problem has been fixed for you and reopen the ticket if it hasn't.

Changed 4 years ago by maphew

As of fwtools 2.0.1 (win32) it appears the conversion from gml no longer truncates fields. Thank you!

Changed 4 years ago by c911469

Hi, I have a very similar problem which I'm not sure is related to this problem or unrelated. I am also using tropical cyclone data - I have gml files that I am viewing in QGIS which truncates part of the data in the "date" attribute/element eg month & day-of-month. I am also using FWTools 2.0.3 (Win32), and it too loses part of the date in the same way - using ogrinfo or ogr2ogr has the same result. The Date field is specified as "xs:string" in the ".xsd" file, but in the resulting ".gfs" file it comes out as "integer". Dates are specified originally as "2007-01-23", but result as "2007". Should I log this as a new fault, or is this part of this original fault? (I have sample data files if required.)

Changed 4 years ago by mloskot

c911469,

Please submit your sample GML files with exact ogrinfo and ogr2ogr commands you've used.

I'd suggest to reopen new ticket indicating in title that it's about truncating date values and attach your files to it. This way your attachments won't be mixed with files that belong to this ticket. IMHO it will be less confusing.

Note: See TracTickets for help on using tickets.