Changes between Version 4 and Version 5 of rfc31_ogr_64


Ignore:
Timestamp:
Jan 22, 2015, 2:23:40 PM (9 years ago)
Author:
Even Rouault
Comment:

Update RFC 31

Legend:

Unmodified
Added
Removed
Modified
  • rfc31_ogr_64

    v4 v5  
    11= RFC 31: OGR 64bit Integer Fields and FIDs =
    22
    3 Authors: Frank Warmerdam [[BR]]
    4 Contact: warmerdam@pobox.com[[BR]]
     3Authors: Frank Warmerdam, Even Rouault [[BR]]
     4Contact: warmerdam@pobox.com, even dot rouault at spatialys.com [[BR]]
    55Status: Development
    66
     
    99This RFC addresses steps to upgrade OGR to support 64bit integer fields and feature ids.  Many feature data formats support wide integers, and the inability to transform these through OGR causes increasing numbers of problems.   
    1010
    11 == 64bit FID ==
     11== 64bit FID, feature index and feature count ==
    1212
    13 It is planned that feature id's will be handled as type "GIntBig" instead of "long" internally.
    14 This will include the nFID field of the OGRFeature.  The existing GetFID() and SetFID() methods on the OGRFeature use type long.  It is difficult to change this without significant disruption to existing application code, so it is intended to introduce new methods to the OGRFeature class:
     13Feature id's will be handled as type "GIntBig" instead of "long" internally.
     14This will include the nFID field of the OGRFeature.  The existing GetFID() and SetFID() methods on the OGRFeature use type long.  It is difficult to change GetFID() without significant disruption to existing application code, but SetFID() can be changed to accept GIntBig instead of long. Existing GetFID() will be deprecated in favor of GetFID64(), and will throw  a warning if it appears that long cannot hold the FID. So the changes in the OGRFeature class are:
    1515
    1616{{{
    17   GIntBig  OGRFeature::GetFID64();
    18   OGRErr   OGRFeature::SetFID64(GIntBig nFID );
     17  long     GetFID() CPL_WARN_DEPRECATED("Use GetFID64()");
     18  GIntBig  GetFID64();
     19  OGRErr   SetFID(GIntBig nFID );
    1920}}}
    2021
    21 The old methods will be deprecated in favor of the new interfaces in documentation, etc.   Howevever the will continue to exist, and will just cast as needed.  Note that the old interfaces using "long" are already 64bit on 64bit operating systems so there is little harm to applications continuing to use these interfaces on 64bit operating systems.
     22At the C API level:
    2223
    23 The OGRLayer class allows several operations based on the FID.  The signature of these will be *altered* to accept GIntBig instead of long.  In theory this should not require any changes to application code since long can be converted to GIntBig losslessly.  However, all existing OGR drivers will require changes, including private drivers.  This will also result in a backwards incompatible change in the C ABI.
     24{{{
     25  long CPL_DLL OGR_F_GetFID( OGRFeatureH ) CPL_WARN_DEPRECATED("Use OGR_F_GetFID64() instead");
     26  GIntBig CPL_DLL OGR_F_GetFID64( OGRFeatureH );
     27  OGRErr CPL_DLL OGR_F_SetFID( OGRFeatureH, GIntBig );
     28}}}
    2429
     30Note that the old interfaces using "long" are already 64bit on 64bit operating systems (excluding Windows target compilers where long is 32bit even on 64bit builds), so there is little harm to applications continuing to use these interfaces on 64bit operating systems.
     31
     32A layer that can discover in a relatively cheap way that it holds features with 64bit FID should advertize the OLMD_FID64 metadata item to "YES", so ogr2ogr can pass the FID64 creation option to drivers that support it.
     33
     34The OGRLayer class allows several operations based on the FID.  The signature of these will be *altered* to accept GIntBig instead of long.  In theory this should not require any changes to application code since long can be converted to GIntBig losslessly.  However, all existing OGR drivers require changes, including private drivers.  This will also result in a backwards incompatible change in the C ABI. While we are at it, we want GetFeatureCount() to be able to return more than 2 billion record (currently returning 32 bit integer). For the same reason as it is dangerous modifying the return type of GetFeatureCount(), we introduce GetFeatureCount64() and deprecate GetFeatureCount(), removing its virtual attribute.
     35
     36So at the OGRLayer C++ class level:
    2537{{{
    2638    virtual OGRFeature *GetFeature( GIntBig nFID );
    2739    virtual OGRErr      DeleteFeature( GIntBig nFID );
     40    virtual OGRErr      SetNextByIndex( GIntBig nIndex );
     41    int                 GetFeatureCount( int bForce = TRUE ) CPL_WARN_DEPRECATED("Use GetFeatureCount64() instead");
     42    virtual GIntBig     GetFeatureCount64( int bForce = TRUE );
     43}}}
     44
     45At the C API level :
     46{{{
     47  OGRFeatureH CPL_DLL OGR_L_GetFeature( OGRLayerH, GIntBig );
     48  OGRErr CPL_DLL OGR_L_DeleteFeature( OGRLayerH, GIntBig );
     49  OGRErr CPL_DLL OGR_L_SetNextByIndex( OGRLayerH, GIntBig );
     50  int    CPL_DLL OGR_L_GetFeatureCount( OGRLayerH, int ) CPL_WARN_DEPRECATED("Use OGR_L_GetFeatureCount64() instead");
     51  GIntBig CPL_DLL OGR_L_GetFeatureCount64( OGRLayerH, int );
    2852}}}
    2953
     
    6387}}}
    6488
    65 Furthermore, the new interfaces will internally support setting/getting integer fields, and the integer field methods will support getting/setting 64bit integer fields so that one case can be used for both field types where convenient.
     89At the C level, the following functions are added :
     90{{{
     91    GIntBig CPL_DLL OGR_F_GetFieldAsInteger64( OGRFeatureH, int );
     92    const GIntBig CPL_DLL *OGR_F_GetFieldAsInteger64List( OGRFeatureH, int, int * );
     93    void   CPL_DLL OGR_F_SetFieldInteger64( OGRFeatureH, int, GIntBig );
     94    void   CPL_DLL OGR_F_SetFieldInteger64List( OGRFeatureH, int, int, const GIntBig * );
     95}}}
     96
     97Furthermore, the new interfaces will internally support setting/getting integer fields, and the integer field methods will support getting/setting 64bit integer fields so that one case can be used for both field types where convenient (except GetFieldAsInteger64List() that can only operate on Integer64List fields)
     98
     99A GDAL_DMD_CREATIONFIELDDATATYPES = "DMD_CREATIONFIELDDATATYPES" driver metadata item is added so as drivers to be able to declare the field types they support on creation. For example "Integer Integer64 Real String Date DateTime Time IntegerList Integer64List RealList StringList Binary". Commonly used drivers will be updated to declare it.
     100
     101== OGR SQL ==
     102
     103A SWQ_INTEGER64 internal type is added so as to be able to map/from OFTInteger64 fields. The int_value member of the swq_expr_node class is extended from int to GIntBig (so both SWQ_INTEGER and SWQ_INTEGER64 refer to that member).
    66104
    67105== Python / Java / C# / perl Changes ==
    68106
    69 No thoughts yet on the impact to the various SWIG derived interfaces.
     107Typing issues are less critical in those languages, so :
     108  * GetFID(), GetFeatureCount() have been changed to return a 64 bit integer
     109  * SetFID(), GetFeature(), DeleteFeature(), SetNextByIndex() have been changed to accept a 64 bit integer as argument
     110  * GetFieldAsInteger64() and SetFieldInteger64() have been added
     111  * In Python, GetField(), SetField() can accept/return 64 bit values
     112  * GetFieldAsInteger64List() and SetFieldInteger64List() have been added (Python only, due to lack of relevant typemaps for other languages)
    70113
    71114== Utilities ==
    72115
    73 ogr2ogr, ogrinfo and other utilities will be updated to support the new 64bit interfaces.
     116ogr2ogr and ogrinfo are updated to support the new 64bit interfaces.
     117
     118A new option is added to ogr2ogr : -mapFieldType. Can be used like this -mapFieldType Integer64=Integer,Date=String to mean that Integer64 field in the source layer should be created as Integer, and Date as String. ogr2ogr will also warn if attempting to create a field in an output driver that advertizes a GDAL_DMD_CREATIONFIELDDATATYPES metadata item that does not mention the required field type.
     119
     120== Documentation ==
     121
     122New/modified API are documented. Updates in drivers with new options/behaviours are documented.
     123MIGRATION_GUIDE.TXT extended with a section related to this RFC.
     124OGR API updated.
    74125
    75126== File Formats ==
    76127
    77 As appropriate, existing OGR drivers will be updated to support the new interfaces.  In particular an effort will be made to update the database driver interfaces to support 64bit integer columns for use as feature id, though I am not convinced we should create FID columns as 64bit by default when creating new layers as this may cause problems for other applications.
     128As appropriate, existing OGR drivers have been updated to support the new/updated interfaces.  In particular an effort has been made to update a few database drivers to support 64bit integer columns for use as feature id, though they don't always create FID columns as 64bit by default when creating new layers as this may cause problems for other applications.
    78129
    79 For prototyping purposes the Shapefile, and PostGIS drivers have been updated to properly support 64bit integer fields. 
    80 
    81 Also, all drivers need to be updated to use GIntBig for the FID in the GetFeature() and DeleteFeature() interfaces.
     130Apart from the mechanical changes due to interface changes, the detailed list of changes is :
     131  * Shapefile: OFTInteger fields are created by default with a width of 9 characters, so to be unambiguously read as OFTInteger (and if specifying integer that require 10 or 11 characters. the field is dynamically extended like managed since a few versions). OFTInteger64 fields are created by default with a width of 18 digits, so to be unambiguously read as OFTInteger64, and extented to 19 or 20 if needed. Integer fields of width between 10 and 18 will be read as OFTInteger64. Above they will be treated as OFTReal. In previous GDAL versions, Integer fields were created with a default with of 10, and thus will be now read as OFTInteger64. An open option, DETECT_TYPE=YES, can be specified so as OGR does a full scan of the DBF file to see if integer fields of size 10 or 11 hold 32 bit or 64 bit values and adjust the type accordingly (and same for integer fields of size 19 or 20, in case of overflow of 64 bit integer, OFTReal is chosen)
     132  * PG: updated to read and create OFTInteger64 as INT8 and OFTInteger64List as bigint[]. 64 bit FIDs are supported. By default, on layer creation, the FID field is created as a SERIAL (32 bit integer) to avoid compatibility issues. The FID64=YES creation option can be passed to create it as a BIGSERIAL instead. If needed, the drivers will dynamically alter the schema to extend a 32 bit integer FID field to 64 bit. GetFeatureCount64() modified to return 64 bit values. OLMD_FID64 = "YES" advertized as soon as the FID column is 64 bit.
     133  * PGDump: Integer64, Integer64List and 64 bit FID supported in read/write. FID64=YES creation option available.
     134  * GeoJSON: Integer64, Integer64List and 64 bit FID supported in read/write. The 64 bit variants are reported only if needed, otherwise OFTInteger/OFTIntegerList is used. OLMD_FID64 = "YES" advertized if needed
     135  * CSV: Integer64 supported in read/write, including the autodetection feature of field types.
     136  * GPKG: Integer64 and 64 bit FID supported in read/write. Conforming with the GeoPackage spec, "INT" or "INTEGER" columns are considered 64 bits, whereas "MEDIUMINT" is considered 32 bit. OLMD_FID64 = "YES" advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount64() modified to return 64 bit values.
     137  * SQLite: Integer64 and 64 bit FID supported in read/write. On write, Integer64 are createad as "BIGINT" and on read BIGINT or INT8 are considered as Integer64. However it might be possible that databases produced by other tools are created with "INTEGER" and hold 64 bit values, in which case OGR will not be able to detect it. The OGR_PROMOTE_TO_INTEGER64=YES configuration option can then be passed to workaround that issue. OLMD_FID64 = "YES" advertized as soon as MAX(fid_column) is 64 bit. GetFeatureCount64() modified to return 64 bit values.
     138  * MySQL: Integer64 and 64 bit FID supported in read/write. Similarly to PG, FID column is created as 32 bit by default, unless FID64=YES creation option is specified. OLMD_FID64 = "YES" advertized as soon as the FID column is 64 bit. GetFeatureCount64() modified to return 64 bit values.
     139  * OCI: Integer64 and 64 bit FID supported in read/write. Detecting Integer/Integer64 on read is tricky since there's only a NUMBER SQL type with a field width. It is assumed that if the width is <= 9 or if it is the unspecified value (38), then it is a Integer. On creation, OGR will set a width of 20 for OFTInteger64, so a NUMBER without decimal part and with a width of 20 will be considered as a Integer64.
     140  * MEM: Integer64 and 64 bit FID supported in read/write. GetFeatureCount64() modified to return 64 bit values.
     141  * VRT: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount64() modified to return 64 bit values.
     142  * JML: Integer64 supported on creation (created as "OBJECT"). On read, returned as String
     143  * GML: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount64() modified to return 64 bit values.
     144  * WFS: Integer64, Integer64List and 64 bit FID supported in read/write. GetFeatureCount64() modified to return 64 bit values.
     145  * CartoDB: Integer64 supported on creation. On read returned as Real (CartoDB only advertizes a 'Number' type). GetFeatureCount64() modified to return 64 bit values.
     146  * XLSX: Integer64 supported in read/write.
     147  * ODS: Integer64 supported in read/write.
     148  * MSSQLSpatial: GetFeatureCount64() modified to return 64 bit values. No Integer64 support implemented although could likely be done.
     149  * OSM: FID is now always set even when sizeof(long) != 8
     150  * LIBKML: KML 'uint' advertized as Integer64.
    82151
    83152== Test Suite ==
    84153
    85 The test suite will be moderately extended to test the new capabilities.
     154The test suite is extended to test the new capabilities:
     155 * core SetField/GetField methods
     156 * updated drivers: Shapefile, PG, GeoJSON, CSV, GPKG, SQLite, MySQL, VRT, GML, XLSX, ODS
     157 * OGR SQL
    86158
    87159== Compatibility Issues ==
     
    89161=== Driver Code Changes ===
    90162
    91  * All drivers implementing DeleteFeature() or GetFeature() will need modest changes.
     163 * All drivers implementing SetNextByIndex(), DeleteFeature() or GetFeature() will need modest changes.
    92164
    93  * Most drivers supporting CreateField() likely ought to be extended to support OFTInteger64 as an integer field if nothing else is available (and if bApproxOK is TRUE.
     165 * All drivers implementing GetFeatureCount() should be modified to implement GetFeatureCount64() instead
    94166
    95  * Drivers reporting FIDs via Debug statements, printf's or using sprintfs like statements to format them for output will need updates to either cast the FID to long, or to use CPL_FRMT_GIB to format the FID.  Failure to make these changes may result in code crashing.
     167 * Most drivers supporting CreateField() likely ought to be extended to support OFTInteger64 as an integer field if nothing else is available (and if bApproxOK is TRUE).
     168
     169 * Drivers reporting FIDs via Debug statements, printf's or using sprintfs like statements to format them for output have been updated to use CPL_FRMT_GIB to format the FID.  Failure to make these changes may result in code crashing. Due to the use of GCC annotation to advertize printf()-like formatting syntax in CPL functions, we are reasonably confident to have done the required changes in in-tree drivers (except in some proprietary drivers, like SDE, IDB, INGRES, ArcObjects, where this couldn't be compiled-checked)
    96170
    97171=== Application Code ===
    98172
    99  * Application code may need to be updated to use GIntBig for FIDs in order to avoid warnings about downcasting.
     173 * Application code may need to be updated to use GIntBig for FIDs and feature count in order to avoid warnings about downcasting.
    100174
    101  * Application code formatting FIDs using printf like facilities may also need to be changed to downcast explicitly or to use CPL_FRMT_GIB.
     175 * Application code formatting FIDs or feature count using printf like facilities may also need to be changed to downcast explicitly or to use CPL_FRMT_GIB.
    102176
    103177 * Application code may need to add Integer64 handling in order to utilize wide fields.
     
    105179=== Behavioral Changes ===
    106180
    107  * Wide integer fields that were previously treated as "real" by the shapefile driver will now be treated as Integer64 which will likely not work with some applications, and translation to other formats will often fail.
     181 * Wide integer fields that were previously treated as "real" or Integer by the shapefile driver will now be treated as Integer64 which will likely not work with some applications, and translation to other formats may fail.
    108182
     183=== Implementation ===
    109184
     185Implementation will be done by Even Rouault ([http://spatialys.com Spatialys]), and sponsored by [http://www.linz.govt.nz/ LINZ (Land Information New Zealand)].
     186
     187The proposed implementation lies in the "rfc31_64bit" branch of the  https://github.com/rouault/gdal2/tree/rfc31_64bit repository.
     188
     189The list of changes :  https://github.com/rouault/gdal2/compare/rfc31_64bit
     190
     191=== Voting history ===
     192
     193TBD