Opened 12 years ago

Closed 10 years ago

#2286 closed defect (duplicate)

Possible data truncation and loss of information during Shapefile import

Reported by: Kosta Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: 1.5.0
Severity: critical Keywords: Shapefile, truncation
Cc:

Description

I would like to apply the attached patch to the Shapefile Driver in order to avoid loss or truncation of data during shapefile import.

The problem is: if a Shapefile specifies a column of type "N" (number) with a width larger than 10 digits, the data type is promoted to "double" (since "int32" can only represent 10 digits); but "double" values can only represent up to 15 or 16 digits; so this "double" promotion will result in loss of information if the original data has a width larger than 15 digits... [note: e.g., TeleAtlas use 30-digit numbers as unique IDs for some of their features!]

The attached patch will promote "numbers":

  • with at least 1 decimal to "doubles",
  • with less than 11 digits to "ints", and
  • everything else to "strings",

avoiding loss or truncation of information.

Attachments (1)

gdal-shapefile-datatype.patch (1.5 KB) - added by Kosta 12 years ago.

Download all attachments as: .zip

Change History (6)

Changed 12 years ago by Kosta

comment:1 Changed 12 years ago by Even Rouault

This is the most often reported bug... See #2152, #1627, #1112, #933, #809

comment:2 Changed 12 years ago by Kosta

Keywords: truncation added

OK, then this issue should be fixed somehow, asap.

The current implementation leads to data truncation and therefore to a loss of information. My proposed patch keeps all data around while reporting a wrong data type (btw, this is also already happening for integers with more than 10 digits, since these are promoted to another data type, i.e. "double", too).

I would prefer the second approach since the original data can be reconstructed by simply changing the data type...

comment:3 Changed 12 years ago by warmerdam

Milestone: 1.5.2

I am not in favor of the proposed changed because of the disruption it would cause to applications expecting the values to be numeric (and for numeric query operations to work against them).

The correct fix is introducing 64bit integer column types in Shapelib and OGR. It is not clear when this will happen.

comment:4 Changed 12 years ago by Kosta

But the introduction of 64-bit integer would even more disrupt existing applications, since they are not aware of 64-bit integers yet and cannot handle them.

And using 64-bit integer only solve the problem for numeric values up to 20(?) digits; for even larger numerics the problem would not be fixed at all.

Would it be possible/acceptable to return the full original data containing all digits if requested as "string" instead of "integer" or "double" (via OGRFeature::GetFieldAsString())?

comment:5 Changed 10 years ago by warmerdam

Resolution: duplicate
Status: newclosed

I'm making #3615 the official ticket to address wide integer fields in dbf.

Note: See TracTickets for help on using tickets.