Opened 16 years ago
Closed 14 years ago
#2286 closed defect (duplicate)
Possible data truncation and loss of information during Shapefile import
Reported by: | Kosta | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | 1.5.0 |
Severity: | critical | Keywords: | Shapefile, truncation |
Cc: |
Description
I would like to apply the attached patch to the Shapefile Driver in order to avoid loss or truncation of data during shapefile import.
The problem is: if a Shapefile specifies a column of type "N" (number) with a width larger than 10 digits, the data type is promoted to "double" (since "int32" can only represent 10 digits); but "double" values can only represent up to 15 or 16 digits; so this "double" promotion will result in loss of information if the original data has a width larger than 15 digits... [note: e.g., TeleAtlas use 30-digit numbers as unique IDs for some of their features!]
The attached patch will promote "numbers":
- with at least 1 decimal to "doubles",
- with less than 11 digits to "ints", and
- everything else to "strings",
avoiding loss or truncation of information.
Attachments (1)
Change History (6)
by , 16 years ago
Attachment: | gdal-shapefile-datatype.patch added |
---|
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Keywords: | truncation added |
---|
OK, then this issue should be fixed somehow, asap.
The current implementation leads to data truncation and therefore to a loss of information. My proposed patch keeps all data around while reporting a wrong data type (btw, this is also already happening for integers with more than 10 digits, since these are promoted to another data type, i.e. "double", too).
I would prefer the second approach since the original data can be reconstructed by simply changing the data type...
comment:3 by , 16 years ago
Milestone: | 1.5.2 |
---|
I am not in favor of the proposed changed because of the disruption it would cause to applications expecting the values to be numeric (and for numeric query operations to work against them).
The correct fix is introducing 64bit integer column types in Shapelib and OGR. It is not clear when this will happen.
comment:4 by , 16 years ago
But the introduction of 64-bit integer would even more disrupt existing applications, since they are not aware of 64-bit integers yet and cannot handle them.
And using 64-bit integer only solve the problem for numeric values up to 20(?) digits; for even larger numerics the problem would not be fixed at all.
Would it be possible/acceptable to return the full original data containing all digits if requested as "string" instead of "integer" or "double" (via OGRFeature::GetFieldAsString())?
comment:5 by , 14 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
I'm making #3615 the official ticket to address wide integer fields in dbf.
This is the most often reported bug... See #2152, #1627, #1112, #933, #809