Opened 16 years ago

Closed 13 years ago

#2152 closed defect (fixed)

[PATCH] ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT

Reported by: halmueller Owned by: warmerdam
Priority: normal Milestone: 1.7.3
Component: OGR_SF Version: 1.5.0
Severity: normal Keywords: shapefile
Cc: eadam@…

Description

I'm getting weird results when trying to use CSVT files to specify field types, importing a CSV file and creating a shapefile.

I have it down to a 2 line CSV, 1 line CSVT, and an OGRVT.

Note that, upon conversion, all of the fields requested as Integer come up as Real, except for the final one, which comes up as String.

This is with GDAL 1.5.0 running on RHEL4, compiled from source. I also saw the problem on 1.4.2.

ships3.csv:
"FID","LAT", "LON", "CALL", "NAME", "TIMESTAMP", "TIMESTRING", "AGEHOURS", "TALLSHIP", "CRUISE", "RESEARCH", "YOTREP", "BUOY"
0,58.500000,-140.200000,WDD6494,"Polar Viking",1193414400,"2007-Oct-26 0900",1854,0,0,0,0,0


ships3.csvt:
"Integer","Real","Real","String","String","Integer","String","Integer","Integer","Integer","Integer","Integer","Integer"

ships3OGRVRT.xml:
<OGRVRTDataSource>

    <OGRVRTLayer name="ships2">
        <SrcDataSource>/home/ldm/csv-test//ships3.csv</SrcDataSource>
 <SrcLayer>ships3</SrcLayer>
<GeometryType>wkbPoint</GeometryType>
        <LayerSRS>WGS84</LayerSRS>
<GeometryField encoding="PointFromColumns" x="LON" y="LAT"/>
    </OGRVRTLayer>

</OGRVRTDataSource>

command:  ogr2ogr shipsOut.shp ships3OGRVRT.xml

ogrinfo -al shipsOut.shp
INFO: Open of `shipsOut.shp'
      using driver `ESRI Shapefile' successful.

Layer name: shipsOut
Geometry: Point
Feature Count: 1
Extent: (-140.200000, 58.500000) - (-140.200000, 58.500000)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
    DATUM["WGS_1984",
        SPHEROID["WGS_1984",6378137,298.257223563]],
    PRIMEM["Greenwich",0],
    UNIT["Degree",0.017453292519943295]]
FID: Real (11.0)
LAT: Real (24.15)
LON: Real (24.15)
CALL: String (80.0)
NAME: String (80.0)
TIMESTAMP: Real (11.0)
TIMESTRING: String (80.0)
AGEHOURS: Real (11.0)
TALLSHIP: Real (11.0)
CRUISE: Real (11.0)
RESEARCH: Real (11.0)
YOTREP: Real (11.0)
BUOY: String (80.0)
OGRFeature(shipsOut):0
  FID (Real) =           0
  LAT (Real) =       58.500000000000000
  LON (Real) =     -140.199999999999989
  CALL (String) = WDD6494
  NAME (String) = Polar Viking
  TIMESTAMP (Real) =  1193414400
  TIMESTRING (String) = 2007-Oct-26 0900
  AGEHOURS (Real) =        1854
  TALLSHIP (Real) =           0
  CRUISE (Real) =           0
  RESEARCH (Real) =           0
  YOTREP (Real) =           0
  BUOY (String) = 0
  POINT (-140.199999999999989 58.5)

Attachments (1)

gdal_svn_trunk_ogr2ogr_adjust_width.patch (9.9 KB ) - added by Even Rouault 16 years ago.

Download all attachments as: .zip

Change History (8)

comment:1 by warmerdam, 16 years ago

Keywords: shapefile added
Milestone: 1.5.1
Status: newassigned

Hal,

I'm getting the same effect. I believe the problem is that the integer fields have no width set and the shapefile driver is improperly writing these as "width 11" integers so they can hold the largest possible 32bit integer. These fields are then treated as floating point when being read back since they could hold integers larger than could be represented in a 32bit integer!

Note - this is entirely a problem with the shapefile driver, not the .csv driver.

No workaround comes to mind.

comment:2 by halmueller, 16 years ago

That makes sense.

I also noticed, though, that the result depends on the order of the columns. That is,

"Integer","Integer","String"

gives different results than

"Integer","String","Integer"

(in my case the workaround is just to notice it, and adapt appropriately in my use of the Shapefiles)

comment:3 by Even Rouault, 16 years ago

I propose a patch tat adds a new option to ogr2ogr : "-adjust_width". This option does a prescan of all the features to compute the minimum width of columns of types integer or string when it's set to 0.

On the above example, 'ogr2ogr -adjust_width shipsOut.shp ships3OGRVRT.xml' gives :

INFO: Open of `shipsOut.shp'
      using driver `ESRI Shapefile' successful.

Layer name: shipsOut
Geometry: Point
Feature Count: 1
Extent: (-140.200000, 58.500000) - (-140.200000, 58.500000)
Layer SRS WKT:
GEOGCS["GCS_WGS_1984",
    DATUM["WGS_1984",
        SPHEROID["WGS_1984",6378137,298.257223563]],
    PRIMEM["Greenwich",0],
    UNIT["Degree",0.017453292519943295]]
FID: Integer (1.0)
LAT: Real (24.15)
LON: Real (24.15)
CALL: String (7.0)
NAME: String (12.0)
TIMESTAMP: Integer (10.0)
TIMESTRING: String (16.0)
AGEHOURS: Integer (4.0)
TALLSHIP: Integer (1.0)
CRUISE: Integer (1.0)
RESEARCH: Integer (1.0)
YOTREP: Integer (1.0)
BUOY: Integer (1.0)
OGRFeature(shipsOut):0
  FID (Integer) = 0
  LAT (Real) =       58.500000000000000
  LON (Real) =     -140.199999999999989
  CALL (String) = WDD6494
  NAME (String) = Polar Viking
  TIMESTAMP (Integer) = 1193414400
  TIMESTRING (String) = 2007-Oct-26 0900
  AGEHOURS (Integer) = 1854
  TALLSHIP (Integer) = 0
  CRUISE (Integer) = 0
  RESEARCH (Integer) = 0
  YOTREP (Integer) = 0
  BUOY (Integer) = 0
  POINT (-140.199999999999989 58.5)

by Even Rouault, 16 years ago

comment:4 by Even Rouault, 16 years ago

Summary: ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT[PATCH] ogr2ogr field types wrong, creating .SHP from .CSV/.CSVT

comment:5 by Even Rouault, 16 years ago

After a bit of digging into Trac, I've identified that the "DBF width 11 integer" problem seems to be coming again and again. Bugs #809, #933, #1112 and #1627 relate to that issue. I've not come with a perfect solution to fix that however.

As suggested in one of those bug reports, we could introduce a int64 in OGRSF model that would be used if the integer field width >= 11. However from a previous discussion on IRC, it seems difficult to define a portable int64 type. Another idea would be to add a OFTIntegerAsString type, that would mean "a string containing an integer value". Or maybe let's disturbing, just keep the current OFTString and add a new property, let call it a 'hint', to a field definition. With that, we could support arbitrary large integers as the DBF format allows, both in write and read mode. Or scanning the whole DBF file to see if we really need more than 4 bytes to store those 11 chars... Stupid idea...

This apparently silly and simple problem is really challenging !

comment:6 by EliL, 14 years ago

Cc: eadam@… added

comment:7 by warmerdam, 13 years ago

Milestone: 1.5.41.7.3
Resolution: fixed
Status: assignedclosed

I have changed OGRShapeLayer to create integer fields with no know width as 10 wide instead of 11 in the dbf file. This prevents it being immediately treated as a real field. Change applied in trunk (r20805) and 1.7 (r20806). As Even notes this isn't the greatest fix, but the current situation is painful.

In the longer term RFC 31 should address this issue with the use of 64bit integer types in OGR.

Note: See TracTickets for help on using tickets.