Opened 16 years ago
Closed 13 years ago
#2152 closed defect (fixed)
[PATCH] ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT
Reported by: | halmueller | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | 1.7.3 |
Component: | OGR_SF | Version: | 1.5.0 |
Severity: | normal | Keywords: | shapefile |
Cc: | eadam@… |
Description
I'm getting weird results when trying to use CSVT files to specify field types, importing a CSV file and creating a shapefile.
I have it down to a 2 line CSV, 1 line CSVT, and an OGRVT.
Note that, upon conversion, all of the fields requested as Integer come up as Real, except for the final one, which comes up as String.
This is with GDAL 1.5.0 running on RHEL4, compiled from source. I also saw the problem on 1.4.2.
ships3.csv: "FID","LAT", "LON", "CALL", "NAME", "TIMESTAMP", "TIMESTRING", "AGEHOURS", "TALLSHIP", "CRUISE", "RESEARCH", "YOTREP", "BUOY" 0,58.500000,-140.200000,WDD6494,"Polar Viking",1193414400,"2007-Oct-26 0900",1854,0,0,0,0,0 ships3.csvt: "Integer","Real","Real","String","String","Integer","String","Integer","Integer","Integer","Integer","Integer","Integer" ships3OGRVRT.xml: <OGRVRTDataSource> <OGRVRTLayer name="ships2"> <SrcDataSource>/home/ldm/csv-test//ships3.csv</SrcDataSource> <SrcLayer>ships3</SrcLayer> <GeometryType>wkbPoint</GeometryType> <LayerSRS>WGS84</LayerSRS> <GeometryField encoding="PointFromColumns" x="LON" y="LAT"/> </OGRVRTLayer> </OGRVRTDataSource> command: ogr2ogr shipsOut.shp ships3OGRVRT.xml ogrinfo -al shipsOut.shp INFO: Open of `shipsOut.shp' using driver `ESRI Shapefile' successful. Layer name: shipsOut Geometry: Point Feature Count: 1 Extent: (-140.200000, 58.500000) - (-140.200000, 58.500000) Layer SRS WKT: GEOGCS["GCS_WGS_1984", DATUM["WGS_1984", SPHEROID["WGS_1984",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]] FID: Real (11.0) LAT: Real (24.15) LON: Real (24.15) CALL: String (80.0) NAME: String (80.0) TIMESTAMP: Real (11.0) TIMESTRING: String (80.0) AGEHOURS: Real (11.0) TALLSHIP: Real (11.0) CRUISE: Real (11.0) RESEARCH: Real (11.0) YOTREP: Real (11.0) BUOY: String (80.0) OGRFeature(shipsOut):0 FID (Real) = 0 LAT (Real) = 58.500000000000000 LON (Real) = -140.199999999999989 CALL (String) = WDD6494 NAME (String) = Polar Viking TIMESTAMP (Real) = 1193414400 TIMESTRING (String) = 2007-Oct-26 0900 AGEHOURS (Real) = 1854 TALLSHIP (Real) = 0 CRUISE (Real) = 0 RESEARCH (Real) = 0 YOTREP (Real) = 0 BUOY (String) = 0 POINT (-140.199999999999989 58.5)
Attachments (1)
Change History (8)
comment:1 by , 16 years ago
Keywords: | shapefile added |
---|---|
Milestone: | → 1.5.1 |
Status: | new → assigned |
comment:2 by , 16 years ago
That makes sense.
I also noticed, though, that the result depends on the order of the columns. That is,
"Integer","Integer","String"
gives different results than
"Integer","String","Integer"
(in my case the workaround is just to notice it, and adapt appropriately in my use of the Shapefiles)
comment:3 by , 16 years ago
I propose a patch tat adds a new option to ogr2ogr : "-adjust_width". This option does a prescan of all the features to compute the minimum width of columns of types integer or string when it's set to 0.
On the above example, 'ogr2ogr -adjust_width shipsOut.shp ships3OGRVRT.xml' gives :
INFO: Open of `shipsOut.shp' using driver `ESRI Shapefile' successful. Layer name: shipsOut Geometry: Point Feature Count: 1 Extent: (-140.200000, 58.500000) - (-140.200000, 58.500000) Layer SRS WKT: GEOGCS["GCS_WGS_1984", DATUM["WGS_1984", SPHEROID["WGS_1984",6378137,298.257223563]], PRIMEM["Greenwich",0], UNIT["Degree",0.017453292519943295]] FID: Integer (1.0) LAT: Real (24.15) LON: Real (24.15) CALL: String (7.0) NAME: String (12.0) TIMESTAMP: Integer (10.0) TIMESTRING: String (16.0) AGEHOURS: Integer (4.0) TALLSHIP: Integer (1.0) CRUISE: Integer (1.0) RESEARCH: Integer (1.0) YOTREP: Integer (1.0) BUOY: Integer (1.0) OGRFeature(shipsOut):0 FID (Integer) = 0 LAT (Real) = 58.500000000000000 LON (Real) = -140.199999999999989 CALL (String) = WDD6494 NAME (String) = Polar Viking TIMESTAMP (Integer) = 1193414400 TIMESTRING (String) = 2007-Oct-26 0900 AGEHOURS (Integer) = 1854 TALLSHIP (Integer) = 0 CRUISE (Integer) = 0 RESEARCH (Integer) = 0 YOTREP (Integer) = 0 BUOY (Integer) = 0 POINT (-140.199999999999989 58.5)
by , 16 years ago
Attachment: | gdal_svn_trunk_ogr2ogr_adjust_width.patch added |
---|
comment:4 by , 16 years ago
Summary: | ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT → [PATCH] ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT |
---|
comment:5 by , 16 years ago
After a bit of digging into Trac, I've identified that the "DBF width 11 integer" problem seems to be coming again and again. Bugs #809, #933, #1112 and #1627 relate to that issue. I've not come with a perfect solution to fix that however.
As suggested in one of those bug reports, we could introduce a int64 in OGRSF model that would be used if the integer field width >= 11. However from a previous discussion on IRC, it seems difficult to define a portable int64 type. Another idea would be to add a OFTIntegerAsString type, that would mean "a string containing an integer value". Or maybe let's disturbing, just keep the current OFTString and add a new property, let call it a 'hint', to a field definition. With that, we could support arbitrary large integers as the DBF format allows, both in write and read mode. Or scanning the whole DBF file to see if we really need more than 4 bytes to store those 11 chars... Stupid idea...
This apparently silly and simple problem is really challenging !
comment:6 by , 14 years ago
Cc: | added |
---|
comment:7 by , 13 years ago
Milestone: | 1.5.4 → 1.7.3 |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
I have changed OGRShapeLayer to create integer fields with no know width as 10 wide instead of 11 in the dbf file. This prevents it being immediately treated as a real field. Change applied in trunk (r20805) and 1.7 (r20806). As Even notes this isn't the greatest fix, but the current situation is painful.
In the longer term RFC 31 should address this issue with the use of 64bit integer types in OGR.
Hal,
I'm getting the same effect. I believe the problem is that the integer fields have no width set and the shapefile driver is improperly writing these as "width 11" integers so they can hold the largest possible 32bit integer. These fields are then treated as floating point when being read back since they could hold integers larger than could be represented in a 32bit integer!
Note - this is entirely a problem with the shapefile driver, not the .csv driver.
No workaround comes to mind.