Opened 7 years ago

Last modified 6 years ago

#3407 new defect

v.in.ogr fails to import ESRI Shapefile with empty attribute table cells

Reported by: mikatt Owned by: grass-dev@…
Priority: normal Milestone: 7.2.4
Component: Default Version: 7.2.1
Keywords: Cc:
CPU: x86-64 Platform: Linux

Description

GRASS 7.2.1 (on Linux Mint 18) produces an error when trying to read in ESRI shapefiles that have empty cells. This used to work fine until recently, and I have no problems with GRASS 7.2.0 on Windows 7.

Here is a minimal example including messages:

Create a point vector from csv (note the empty 'val' cell in the last line):

v.in.ascii --overwrite input=points.csv output=points_imp_g separator=comma skip=1 columns=num integer,X double precision,Y double precision,val double precision x=2 y=3
                                                    
v.db.select map=points_imp_g                                 
cat|num|X|Y|val
1|1|2487491.643|5112118.33|1.5
2|2|2481985.459|5109162.78|2.3
3|3|2478284.289|5105331.04|

Export to ESRI shape (works fine):

v.out.ogr input=points_imp_g output=points_exp_g format=ESRI_Shapefile output_layer=points_exp_g

Exporting 3 features...
v.out.ogr complete. 3 features (Point type) written to <points_exp_g> (ESRI_Shapefile format).

Import the created shape fails:

v.import input=points_exp_g/points_exp_g.shp layer=points_exp_g output=points_imp_g2

WARNING: All available OGR layers will be imported into vector map <points_exp_g>
Check if OGR layer <points_exp_g> contains polygons...
Column name <cat> renamed to <cat_>
Importing 3 features (OGR layer <points_exp_g>)...
DBMI-SQLite driver error:
Error in sqlite3_prepare():
near ")": syntax error
DBMI-SQLite driver error:
Error in sqlite3_prepare():
near ")": syntax error
ERROR: Cannot insert new row: insert into points_imp_g2 values ( 3, 3, 3, 2478284.288999999873340, 5105331.040000000037253,  )
ERROR: Unable to import <points_exp_g/points_exp_g.shp>

Have there been substantial changes to v.in.ogr or could this be a RGDAL/OGR issue?

Change History (10)

in reply to:  description ; comment:1 by mmetz, 7 years ago

Replying to mikatt:

GRASS 7.2.1 (on Linux Mint 18) produces an error when trying to read in ESRI shapefiles that have empty cells. This used to work fine until recently, and I have no problems with GRASS 7.2.0 on Windows 7.

What are the GDAL/OGR versions?

Here is a minimal example including messages:

[...]

Thanks for the example, I could reproduce the error.

Have there been substantial changes to v.in.ogr or could this be a RGDAL/OGR issue?

This is a GDAL/OGR issue. As of GDAL 2.2, OGR_F_IsFieldSet() apparently returns true even if the field is empty, and OGR_F_GetFieldAsString() returns a zero-length string (not NULL) for an empty field.

v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for the new behaviour of GDAL 2.2, supporting older versions of GDAL.

in reply to:  1 ; comment:2 by rouault, 7 years ago

This is a GDAL/OGR issue.

Actually it is a feature / per design. See https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT about RFC 67 : https://trac.osgeo.org/gdal/wiki/rfc67_nullfieldvalues

v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for the new behaviour of GDAL 2.2, supporting older versions of GDAL.

The fix is not ideal as it considers an empty string as NULL. I'd suggest instead using OGR_F_IsFieldSetAndNotNull() when GDAL_VERSION_NUM >= 2020000 instead of OGR_F_IsFieldSet()

Last edited 7 years ago by rouault (previous) (diff)

in reply to:  2 comment:3 by mmetz, 7 years ago

Replying to rouault:

This is a GDAL/OGR issue.

Actually it is a feature / per design. See https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT about RFC 67 : https://trac.osgeo.org/gdal/wiki/rfc67_nullfieldvalues

v.in.ogr has been fixed in r71431,2 (trunk, relbr72) to account for the new behaviour of GDAL 2.2, supporting older versions of GDAL.

The fix is not ideal as it considers an empty string as NULL.

v.in.ogr translates attributes from OGR to the GRASS default database driver, and (at least for sqlite), empty strings are not allowed, but need to be replaced with something else. Therefore I decided to set empty strings to "NULL". Testing the imported vector with v.db.select yields the expected result, i.e. an empty field as in the original csv file.

Do you have another suggestion about how to replace an empty string?

comment:4 by rouault, 7 years ago

I'm just speaking about the OGR side of things. With GDAL 2.2, you have 3 potential states for a field:

  • unset -> OGR_F_IsFieldSet() == FALSE. OGR_F_GetFieldAsString() will return an empty string (it is not allowed to return NULL)
  • set, but NULL -> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() == TRUE. OGR_F_GetFieldAsString() will also eturn an empty string
  • set, and not NULL --> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() == FALSE. An empty string is considered as a not-null set field, so a potential return of OGR_F_GetFieldAsString()

Depending on drivers, they will decide whether to dispatch a field as unset, set or set with empty string. The shapefile driver as of GDAL 2.2 will set all fields, either to NULL, or to a not-NULL set value. Note: for strings, there's no way in .dbf to distinguish NULL from empty string, so an empty string is mapped to a NULL field (in previous GDAL versions, this was mapped to a unset field) The CSV driver has traditionnally mapped an empty field as a an empty string, and this hasn't changed in GDAL 2.2

Hope that clarifies things. Ah nullness and emptyness....

So perhaps given GRASS constraints and concepts, testing the return of OGR_F_GetFieldAsString() against the empty string is good enough if you don't need to distinguish between the 3 above states.

in reply to:  4 ; comment:5 by mmetz, 7 years ago

Replying to rouault:

I'm just speaking about the OGR side of things. With GDAL 2.2, you have 3 potential states for a field:

  • unset -> OGR_F_IsFieldSet() == FALSE. OGR_F_GetFieldAsString() will return an empty string (it is not allowed to return NULL)
  • set, but NULL -> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() == TRUE. OGR_F_GetFieldAsString() will also eturn an empty string
  • set, and not NULL --> OGR_F_IsFieldSet() == TRUE and OGR_F_IsFieldNull() == FALSE. An empty string is considered as a not-null set field, so a potential return of OGR_F_GetFieldAsString()

Depending on drivers, they will decide whether to dispatch a field as unset, set or set with empty string. The shapefile driver as of GDAL 2.2 will set all fields, either to NULL, or to a not-NULL set value. Note: for strings, there's no way in .dbf to distinguish NULL from empty string, so an empty string is mapped to a NULL field (in previous GDAL versions, this was mapped to a unset field) The CSV driver has traditionnally mapped an empty field as a an empty string, and this hasn't changed in GDAL 2.2

Hope that clarifies things. Ah nullness and emptyness....

So perhaps given GRASS constraints and concepts,

Constraints and concepts of dbf/sqlite/postgresql/mysql, to be precise.

As you explained, the OGR dbf driver, which is also used internally in GRASS for dbf databases, does not distinguish between unset and NULL, thus import with dbf as GRASS database should work as before.

Regarding import into GRASS, there was no need to distinguish between the 3 possible states, at least there are no known problems.

testing the return of OGR_F_GetFieldAsString() against the empty string is good enough if you don't need to distinguish between the 3 above states.

Regarding export from GRASS to OGR, what happens if a field is not explicitly set or unset? Is it now unset or set, but NULL? Do we need to adjust v.out.ogr using OGR_F_SetFieldNull() and/or OGR_F_UnsetField() instead of not touching the field at all?

in reply to:  5 ; comment:6 by rouault, 7 years ago

Regarding export from GRASS to OGR, what happens if a field is not explicitly set or unset? Is it now unset or set, but NULL? Do we need to adjust v.out.ogr using OGR_F_SetFieldNull() and/or OGR_F_UnsetField() instead of not touching the field at all?

By default and by definition, a field that is not set ... is unset :-) This is no different in GDAL 2.2 than in previous versions. So depending on drivers, they might react differently on writing if a field is unset or if it set to an empty string. For CSV and Shapefile, this will result in the same file. For GeoJSON, a unset field doesn't appear at all in the result, whereas an empty string is written as '"field" : "" '

With GDAL 2.2, you can also set a field to NULL. The behaviour will depend on drivers. For CSV and Shapefile, NULL will be dealt as unset or empty string. For GeoJSON, it will be written as ' "field": null '.

in reply to:  6 comment:7 by mmetz, 7 years ago

Replying to rouault:

Regarding export from GRASS to OGR, what happens if a field is not explicitly set or unset? Is it now unset or set, but NULL? Do we need to adjust v.out.ogr using OGR_F_SetFieldNull() and/or OGR_F_UnsetField() instead of not touching the field at all?

By default and by definition, a field that is not set ... is unset :-) This is no different in GDAL 2.2 than in previous versions. So depending on drivers, they might react differently on writing if a field is unset or if it set to an empty string. For CSV and Shapefile, this will result in the same file. For GeoJSON, a unset field doesn't appear at all in the result, whereas an empty string is written as '"field" : "" '

Ok, a field that is not set is unset or set, but NULL, depending on the driver ;-)

With GDAL 2.2, you can also set a field to NULL. The behaviour will depend on drivers. For CSV and Shapefile, NULL will be dealt as unset or empty string. For GeoJSON, it will be written as ' "field": null '.

From https://svn.osgeo.org/gdal/branches/2.2/gdal/MIGRATION_GUIDE.TXT:

"before, a unset field was written as field_name: null. Starting with GDAL 2.2, only fields explicitly set as null with OGR_F_SetFieldNull() will be written with a null value."

In order to preserve backwards compatibility for other software packages, GRASS will need to use OGR_F_SetFieldNull() starting with GDAL 2.2 if data export should not change.

comment:8 by rouault, 7 years ago

In order to preserve backwards compatibility for other software packages, GRASS will need to use OGR_F_SetFieldNull() starting with GDAL 2.2 if data export should not change.

Yes that only affects GeoJSON based drivers and GML, that are the only ones that can translate differently unset and NULL in the format. Other drivers will translate in the same way a unset field or a null field.

in reply to:  8 comment:9 by mmetz, 7 years ago

Replying to rouault:

In order to preserve backwards compatibility for other software packages, GRASS will need to use OGR_F_SetFieldNull() starting with GDAL 2.2 if data export should not change.

Yes that only affects GeoJSON based drivers and GML, that are the only ones that can translate differently unset and NULL in the format. Other drivers will translate in the same way a unset field or a null field.

OK, export from GRASS to OGR (v.out.ogr) has been adjusted in r71433,4 (trunk, relbr72) such that exported data are the same independent of the GDAL version, important for compatibility with other software not (yet) using GDAL 2.2.

comment:10 by martinl, 6 years ago

Milestone: 7.2.4
Note: See TracTickets for help on using tickets.