Ticket #2286 (closed defect: duplicate)

Opened 5 years ago

Last modified 3 years ago

Possible data truncation and loss of information during Shapefile import

Reported by: Kosta Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: 1.5.0
Severity: critical Keywords: Shapefile, truncation
Cc:

Description

I would like to apply the attached patch to the Shapefile Driver in order to avoid loss or truncation of data during shapefile import.

The problem is: if a Shapefile specifies a column of type "N" (number) with a width larger than 10 digits, the data type is promoted to "double" (since "int32" can only represent 10 digits); but "double" values can only represent up to 15 or 16 digits; so this "double" promotion will result in loss of information if the original data has a width larger than 15 digits... [note: e.g., TeleAtlas use 30-digit numbers as unique IDs for some of their features!]

The attached patch will promote "numbers":

  • with at least 1 decimal to "doubles",
  • with less than 11 digits to "ints", and
  • everything else to "strings",

avoiding loss or truncation of information.

Attachments

gdal-shapefile-datatype.patch Download (1.5 KB) - added by Kosta 5 years ago.

Change History

Changed 5 years ago by Kosta

Changed 5 years ago by rouault

This is the most often reported bug... See #2152, #1627, #1112, #933, #809

Changed 5 years ago by Kosta

  • keywords Shapefile, truncation added; Shapefile removed

OK, then this issue should be fixed somehow, asap.

The current implementation leads to data truncation and therefore to a loss of information. My proposed patch keeps all data around while reporting a wrong data type (btw, this is also already happening for integers with more than 10 digits, since these are promoted to another data type, i.e. "double", too).

I would prefer the second approach since the original data can be reconstructed by simply changing the data type...

Changed 5 years ago by warmerdam

  • milestone 1.5.2 deleted

I am not in favor of the proposed changed because of the disruption it would cause to applications expecting the values to be numeric (and for numeric query operations to work against them).

The correct fix is introducing 64bit integer column types in Shapelib and OGR. It is not clear when this will happen.

Changed 5 years ago by Kosta

But the introduction of 64-bit integer would even more disrupt existing applications, since they are not aware of 64-bit integers yet and cannot handle them.

And using 64-bit integer only solve the problem for numeric values up to 20(?) digits; for even larger numerics the problem would not be fixed at all.

Would it be possible/acceptable to return the full original data containing all digits if requested as "string" instead of "integer" or "double" (via  OGRFeature::GetFieldAsString())?

Changed 3 years ago by warmerdam

  • status changed from new to closed
  • resolution set to duplicate

I'm making #3615 the official ticket to address wide integer fields in dbf.

Note: See TracTickets for help on using tickets.