wiki:rfc31_ogr_64

RFC 31: OGR 64bit Integer Fields and FIDs

Authors: Frank Warmerdam, Even Rouault
Contact: warmerdam@…, even dot rouault at spatialys.com
Status: Adopted, implemented in GDAL 2.0

Summary

This RFC addresses steps to upgrade OGR to support 64bit integer fields and feature ids. Many feature data formats support wide integers, and the inability to transform these through OGR causes increasing numbers of problems.

64bit FID, feature index and feature count

Feature id's will be handled as type "GIntBig" instead of "long" internally. This will include the nFID field of the OGRFeature. The existing GetFID() and SetFID() methods on the OGRFeature use type long and are changed to return (respectively accept) GIntBig instead. The change of return type for GetFID() will require application code to carefully adapt to avoid potential issues (for example if GetFID() is used in printf-like expression). SetFID() change should be mostly transparent. So the changes in the OGRFeature class are:

  GIntBig  GetFID();
  OGRErr   SetFID(GIntBig nFID );

At the C API level:

  GIntBig CPL_DLL OGR_F_GetFID( OGRFeatureH );
  OGRErr CPL_DLL OGR_F_SetFID( OGRFeatureH, GIntBig );

Note that the old interfaces using "long" are already 64bit on 64bit operating systems (excluding Windows target compilers where long is 32bit even on 64bit builds), so there is little harm to applications continuing to use these interfaces on 64bit operating systems.

A layer that can discover in a relatively cheap way that it holds features with 64bit FID should advertize the OLMD_FID64 metadata item to "YES", so ogr2ogr can pass the FID64 creation option to drivers that support it.

The OGRLayer class allows several operations based on the FID. The signature of these will be *altered* to accept GIntBig instead of long. In theory this should not require any changes to application code since long can be converted to GIntBig losslessly. However, all existing OGR drivers require changes, including private drivers. This will also result in a backwards incompatible change in the C ABI. While we are at it, we want GetFeatureCount() to be able to return more than 2 billion record (currently returning 32 bit integer), and thus it will return GIntBig. Similarly to GetFID(), this change of return type will require caution in application code.

So at the OGRLayer C++ class level:

    virtual OGRFeature *GetFeature( GIntBig nFID );
    virtual OGRErr      DeleteFeature( GIntBig nFID );
    virtual OGRErr      SetNextByIndex( GIntBig nIndex );
    virtual GIntBig     GetFeatureCount( int bForce = TRUE );

At the C API level :

  OGRFeatureH CPL_DLL OGR_L_GetFeature( OGRLayerH, GIntBig );
  OGRErr CPL_DLL OGR_L_DeleteFeature( OGRLayerH, GIntBig );
  OGRErr CPL_DLL OGR_L_SetNextByIndex( OGRLayerH, GIntBig );
  GIntBig CPL_DLL OGR_L_GetFeatureCount( OGRLayerH, int );

64bit Fields

New field types will be introduced for 64bit integers:

   OFTInteger64 = 12
   OFTInteger64List = 13

The OGRField union will be extended to include:

    GIntBig     Integer64;
    struct {
        int nCount;
        GIntBig *paList;
    } Integer64List;

The OGRFeature class will be extended with these new methods:

    GIntBig             GetFieldAsInteger64( int i );
    GIntBig             GetFieldAsInteger64( const char *pszFName );
    const int          *GetFieldAsInteger64List( const char *pszFName,
                                               int *pnCount );
    const int          *GetFieldAsInteger64List( int i, int *pnCount );

    void                SetField( int i, GIntBig nValue );
    void                SetField( int i, int nCount, const GIntBig * panValues );
    void                SetField( const char *pszFName, GIntBig nValue )
    void                SetField( const char *pszFName, int nCount,
                                  const GIntBig * panValues )

At the C level, the following functions are added :

    GIntBig CPL_DLL OGR_F_GetFieldAsInteger64( OGRFeatureH, int );
    const GIntBig CPL_DLL *OGR_F_GetFieldAsInteger64List( OGRFeatureH, int, int * );
    void   CPL_DLL OGR_F_SetFieldInteger64( OGRFeatureH, int, GIntBig );
    void   CPL_DLL OGR_F_SetFieldInteger64List( OGRFeatureH, int, int, const GIntBig * );

Furthermore, the new interfaces will internally support setting/getting integer fields, and the integer field methods will support getting/setting 64bit integer fields so that one case can be used for both field types where convenient (except GetFieldAsInteger64List() that can only operate on Integer64List fields)

A GDAL_DMD_CREATIONFIELDDATATYPES = "DMD_CREATIONFIELDDATATYPES" driver metadata item is added so as drivers to be able to declare the field types they support on creation. For example "Integer Integer64 Real String Date DateTime Time IntegerList Integer64List RealList StringList Binary". Commonly used drivers will be updated to declare it.

OGR SQL

A SWQ_INTEGER64 internal type is added so as to be able to map/from OFTInteger64 fields. The int_value member of the swq_expr_node class is extended from int to GIntBig (so both SWQ_INTEGER and SWQ_INTEGER64 refer to that member).

Python / Java / C# / perl Changes

The following changes have been done :

  • GetFID(), GetFeatureCount() have been changed to return a 64 bit integer
  • SetFID(), GetFeature(), DeleteFeature(), SetNextByIndex() have been changed to accept a 64 bit integer as argument
  • GetFieldAsInteger64() and SetFieldInteger64() have been added
  • In Python, GetField(), SetField() can accept/return 64 bit values
  • GetFieldAsInteger64List() and SetFieldInteger64List() have been added (Python only, due to lack of relevant typemaps for other languages, but could potentially be done)

The change in return type of GetFID() and GetFeatureCount() might cause warnings at compilation time in some languages (Java YES, Python not relevant, Perl/C# ?). All changes to existing methods will are an ABI change for Java bytecode.

Utilities

ogr2ogr and ogrinfo are updated to support the new 64bit interfaces.

A new option is added to ogr2ogr : -mapFieldType. Can be used like this -mapFieldType Integer64=Integer,Date=String to mean that Integer64 field in the source layer should be created as Integer, and Date as String. ogr2ogr will also warn if attempting to create a field in an output driver that advertizes a GDAL_DMD_CREATIONFIELDDATATYPES metadata item that does not mention the required field type. For Integer64 fields, if it is not advertized in GDAL_DMD_CREATIONFIELDDATATYPES metadata item or GDAL_DMD_CREATIONFIELDDATATYPES is missing, conversion to Real is done by default with a warning. ogr2ogr will also query the source layer to check if the OLMD_FID64 metadata item is declared and if the output driver has the FID64 layer creation option. In which case it will set it.

Documentation

New/modified API are documented. Updates in drivers with new options/behaviours are documented. MIGRATION_GUIDE.TXT extended with a section related to this RFC. OGR API updated.

File Formats

As appropriate, existing OGR drivers have been updated to support the new/updated interfaces. In particular an effort has been made to update a few database drivers to support 64bit integer columns for use as feature id, though they don't always create FID columns as 64bit by default when creating new layers as this may cause problems for other applications.

Apart from the mechanical changes due to interface changes, the detailed list of changes is :

  • Shapefile: OFTInteger fields are created by default with a width of 9 characters, so to be unambiguously read as OFTInteger (and if specifying integer that require 10 or 11 characters. the field is dynamically extended like managed since a few versions). OFTInteger64 fields are created by default with a width of 18 digits, so to be unambiguously read as OFTInteger64, and extented to 19 or 20 if needed. Integer fields of width between 10 and 18 will be read as OFTInteger64. Above they will be treated as OFTReal. In previous GDAL versions, Integer fields were created with a default with of 10, and thus will be now read as OFTInteger64. An open option, DETECT_TYPE=YES, can be specified so as OGR does a full scan of the DBF file to see if integer fields of size 10 or 11 hold 32 bit or 64 bit values and adjust the type accordingly (and same for integer fields of size 19 or 20, in case of overflow of 64 bit integer, OFTReal is chosen)
  • PG: updated to read and create OFTInteger64 as INT8 and OFTInteger64List as bigint[]. 64 bit FIDs are supported. By default, on layer creation, the FID field is created as a SERIAL (32 bit integer) to avoid compatibility issues. The FID64=YES creation option can be passed to create it as a BIGSERIAL instead. If needed, the drivers will dynamically alter the schema to extend a 32 bit integer FID field to 64 bit. GetFeatureCount() modified to return 64 bit values. OLMD_FID64 = "YES" advertized as soon as the FID column is 64 bit.
  • PGDump: Integer64, Integer64List and 64 bit FID supported in read/write. FID64=YES creation option available.
  • GeoJSON: Integer64, Integer64List and 64 bit FID supported in read/write. The 64 bit variants are reported only if needed, otherwise OFTInteger/OFTIntegerList is used. OLMD_FID64 = "YES" advertized if needed
  • CSV: Integer64 supported in read/write, including the autodetection feature of field types.
  • GPKG: Integer64 and 64 bit FID supported in read/write. Conforming with the GeoPackage spec, "INT" or "INTEGER" columns are considered 64 bits, whereas "MEDIUMINT" is considered 32 bit. OLMD_FID64 = "YES" advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount() modified to return 64 bit values.
  • SQLite: Integer64 and 64 bit FID supported in read/write. On write, Integer64 are createad as "BIGINT" and on read BIGINT or INT8 are considered as Integer64. However it might be possible that databases produced by other tools are created with "INTEGER" and hold 64 bit values, in which case OGR will not be able to detect it. The OGR_PROMOTE_TO_INTEGER64=YES configuration option can then be passed to workaround that issue. OLMD_FID64 = "YES" advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount() modified to return 64 bit values.
  • MySQL: Integer64 and 64 bit FID supported in read/write. Similarly to PG, FID column is created as 32 bit by default, unless FID64=YES creation option is specified. OLMD_FID64 = "YES" advertized as soon as the FID column is 64 bit. GetFeatureCount() modified to return 64 bit values.
  • OCI: Integer64 and 64 bit FID supported in read/write. Detecting Integer/Integer64 on read is tricky since there's only a NUMBER SQL type with a field width. It is assumed that if the width is <= 9 or if it is the unspecified value (38), then it is a Integer. On creation, OGR will set a width of 20 for OFTInteger64, so a NUMBER without decimal part and with a width of 20 will be considered as a Integer64.
  • MEM: Integer64 and 64 bit FID supported in read/write. GetFeatureCount() modified to return 64 bit values.
  • VRT: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount() modified to return 64 bit values.
  • JML: Integer64 supported on creation (created as "OBJECT"). On read, returned as String
  • GML: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount() modified to return 64 bit values.
  • WFS: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount() modified to return 64 bit values.
  • CartoDB: Integer64 supported on creation. On read returned as Real (CartoDB only advertizes a 'Number' type). GetFeatureCount() modified to return 64 bit values.
  • XLSX: Integer64 supported in read/write.
  • ODS: Integer64 supported in read/write.
  • MSSQLSpatial: GetFeatureCount() modified to return 64 bit values. No Integer64 support implemented although could likely be done.
  • OSM: FID is now always set even when sizeof(long) != 8
  • LIBKML: KML 'uint' advertized as Integer64.
  • MITAB: Change the way FID of Seamless tables are generated to make it more robust and accept arbitrary number of index tables made of an arbitrary number of features, by using full 64bit width of IDs

Test Suite

The test suite is extended to test the new capabilities:

  • core SetField/GetField methods
  • updated drivers: Shapefile, PG, GeoJSON, CSV, GPKG, SQLite, MySQL, VRT, GML, XLSX, ODS, MITAB
  • OGR SQL
  • option -mapFieldType of ogr2ogr

Compatibility Issues

Driver Code Changes

  • Drivers supporting CreateField() likely ought to be extended to support OFTInteger64 as an integer/real/string field if nothing else is available (and if bApproxOK is TRUE). ogr2ogr will convert Integer64 to Real if Integer64 support is not advertized
  • Drivers reporting FIDs via Debug statements, printf's or using sprintfs like statements to format them for output have been updated to use CPL_FRMT_GIB to format the FID. Failure to make these changes may result in code crashing. Due to the use of GCC annotation to advertize printf()-like formatting syntax in CPL functions, we are reasonably confident to have done the required changes in in-tree drivers (except in some proprietary drivers, like SDE, IDB, INGRES, ArcObjects, where this couldn't be compiled-checked). The same holds true for GetFeatureCount()

Application Code

  • Application code may need to be updated to use GIntBig for FIDs and feature count in order to avoid warnings about downcasting.
  • Application code formatting FIDs or feature count using printf like facilities may also need to be changed to downcast explicitly or to use CPL_FRMT_GIB.
  • Application code may need to add Integer64 handling in order to utilize wide fields.

Behavioral Changes

  • Wide integer fields that were previously treated as "real" or Integer by the shapefile driver will now be treated as Integer64 which will likely not work with some applications, and translation to other formats may fail.

Related tickets

Related topics out of scope of this RFC

The possibility of having a Numeric type that corresponds to the matching SQL type, i.e. a decimal number with an arbitrary number of significant figures has been considered. In OGR, this could be implemented as a full type like Integer, Integer64 etc., or possibly as a subtype of String (see RFC 50: OGR field subtypes). The latter approach would be easier to implement and mostly usefull for lossless conversion between database drivers (and shapefile). The former approach would require more work, and would ideally involve OGR SQL support, which would require supporting arithmetic of arbitrary length. The use cases for such a numeric type have been considered marginal enough to let that aside for now.

Implementation

Implementation will be done by Even Rouault (Spatialys), and sponsored by LINZ (Land Information New Zealand).

The proposed implementation lies in the "rfc31_64bit" branch of the https://github.com/rouault/gdal2/tree/rfc31_64bit repository.

The list of changes : https://github.com/rouault/gdal2/compare/rfc31_64bit

Voting history

+1 from JukkaR, DanielM, TamasS, HowardB and EvenR

Last modified 9 years ago Last modified on Jan 30, 2015, 7:19:13 AM
Note: See TracWiki for help on using the wiki.