wiki:rfc67_nullfieldvalues

Version 1 (modified by Even Rouault, 7 years ago) ( diff )

Preliminary version of RFC 67

RFC 67 : Null values in OGR

Author: Even Rouault

Contact: even.rouault at spatialys.com

Status: Development

Target version: 2.2

Summary

This RFC implement the concept of null value for the field of a feature, in addition to the existing unset status.

Rationale

Currently, OGR supports one single concept to indicate that a field value is missing : the concept of unset field.

So assuming a JSon feature collection with 2 features would properties would be { "foo": "bar" } and { "foo": "bar", "other_field": null }, OGR currently returns that the other_field is unset in both cases.

What is proposed here is that in the first case where the "other_field" keyword is totally absent, we use the current unset field concept. And for the other case, we add a new concept of null field.

This distinction beetween both concepts apply to all GeoJSON based formats and protocols, so GeoJSON, ElasticSearch, MongoDB, CouchDB, Cloudant.

This also applies for GML where the semantics of a missing element would be mapped to unset field and an element with a xsi:nil="true" attribute would be mapped to a null field.

Changes

OGRField

The Set structure in the "raw field" union is modified to add a third marker

struct {

int nMarker1; int nMarker2; int nMarker3;

} Set;

This is not strictly related to this work but the 3rd marker decreases the likelyhood of a genuine value to be misinterpreted as unset / null. This does not increase the size of the structure that is already at least 12 bytes large.

The current special value of OGRUnsetMarker = -21121 will be set in the 3 markers for unset field (currently set to the first 2 markers).

Similarly for the new Null state, the new value OGRNullMarker = -21122 will be set to the 3 markers.

OGRFeature

The methods int IsFieldNull( int nFieldIdx ) and void SetNullField ( int nFieldIdx ) are added.

The accessors GetFieldXXXX() are modified to take into account the null case, in the same way as if they are called on a unset field, so returning 0 for numeric field types, empty string for string fields, FALSE for date time fields and NULL for list-based types.

C API

The following functions will be added:

int    CPL_DLL OGR_F_IsFieldNull( OGRFeatureH, int );
void   CPL_DLL OGR_F_SetNullField( OGRFeatureH, int );

Lower-level functions will be added to manipulate directly the raw field union (for use mostly in core and a few drivers), instead of directly testing/ setting the markers :

int    CPL_DLL OGR_RawField_IsUnset( OGRField* );
int    CPL_DLL OGR_RawField_IsNull( OGRField* );
int    CPL_DLL OGR_RawField_SetUnset( OGRField* );
int    CPL_DLL OGR_RawField_SetNull( OGRField* );

SWIG bindings (Python / Java / C# / Perl) changes

The new methods will mapped to SWIG.

Drivers

The following drivers will be modified to take into account the unset and NULL state as distinct states: GeoJSON, ElasticSearch, MongoDB, CouchDB, Cloudant, GML, GMLAS, WFS.

All drivers that in their writing part test if the source feature has a field unset will also test if the field is null.

For SQL based drivers (PG, PGDump, Carto, MySQL, OCI, SQLite, GPKG), on reading a SQL NULL value will be mapped to the new Null state. On writing, a unset field will not be mentionned in the corresponding INSERT or UPDATE statement. Whereas a Null field will be mentionned and set to NULL. On insertion, there will generally be no difference of behaviour, unless a default value is defined on the field, in which case it will be used by the database engine to set the value in the unset case. On update, a unset field will not see its content updated by the database, where as a field set to NULL will be updated to NULL.

Utilities

No change planned at this stage.

Documentation

All new methods/functions are documented.

Test Suite

Core changes and updated drivers will be tested.

Compatibility Issues

All code, in GDAL source code, and in calling external code, that currently uses OGRFeature::IsFieldUnset() / OGR_F_IsFieldUnset() should also be updated to used IsFieldNull() / OGR_F_IsFieldNull(), either to act exactly as in the unset case, or add a new appropriate behaviour. Failure to do so, the existing code will see 0 for numeric field types, empty string for string fields, FALSE for date time fields and NULL for list-based types.

Related ticket

None

Implementation

The implementation will be done by Even Rouault (Spatialys) and be sponsored by Safe Software.

Voting history

TBD

Note: See TracWiki for help on using the wiki.