GDAL/OGR 2.0 Changes
From time to time there has been a suggestion we should at some point have a GDAL/OGR 2.0 release at which point we would relax the normally quite strict desire for backward compatibility in the GDAL/OGR API. Unification of the GDAL and OGR components of the library is often listed as the main objective of such an overhaul (and it has now been implemented per RFC 46). However, this page is primarily intended to track various other smaller changes that are desirable from a cleanup point of view, but that have been avoided over the last decade to avoid backwards incompatibility.
Please list thoughts on stuff we should revisit at a GDAL/OGR 2.0 release here for future consideration.
Note that it is still hoped that old GDAL using applications should still be fairly easily recompiled with GDAL 2.0 with a minimum of adjustments, so radical restructuring - particularly on the GDAL side - is not necessarily on the table. But we can certainly afford C/C++ ABI incompatibility.
Get rid of CPL_STDCALL
CPL_STDCALL is a windows only macro marking parts of the GDAL C API as using standard call rather than cdecl calling conventions on windows. This was done to make these functions easier to call from traditional VB. It is not clear that this offers useful value, and it might be prudent to just strip all reference to CPL_STDCALL from the GDAL/OGR tree.
Stricter typedefs for C types
Currently GDALDatasetH and GDALRasterBandH are "typedef void *" (see http://trac.osgeo.org/gdal/browser/trunk/gdal/gcore/gdal.h#L158), so they can be used wrongly. Define them as non equivalent types to trigger programming errors. Caution: what to do with GDALMajorObjectH that is the base class for GDALDatasetH and GDALMajorObjectH.
For OGR, work has been done to sanitize that (see http://trac.osgeo.org/gdal/browser/trunk/gdal/ogr/ogr_api.h#L50), but it is currently only enforced in debug mode. For GDAL 2.0, use the stricter typing in release mode too.
Do not alias VSILFILE* to FILE* in release mode
Since GDAL 1.8 ?, when DEBUG is defined, VSILFILE is defined as "typedef struct _VSILFILE VSILFILE;", whereas in release mode it is defined as "typedef FILE VSILFILE;". This was done as a soft way of transitionning old code to using VSILFILE* when appropriate instead of using FILE* improperly. But this can be confusing in some situations ( see http://lists.osgeo.org/pipermail/gdal-dev/2012-November/034610.html ), so it is better to enforce strict typing in all situations. User code that would still improperly used FILE* in VSILFILE* context should be adapted. The GDAL base code itself is ready for that.
This transition was anticipated in http://trac.osgeo.org/gdal/wiki/rfc7_vsilapi .
Fix const correctness
Try to correct const correctness though as much of the API as possible - particularly for strings, and string lists. I do not necessarily propose that "major objects" like GDALDataset or GDALRasterBand should have const operations on them.
Fix OSRSpatialReference::importFromESRI to match other importFrom* methods
Use 64 bit integers in GDAL RasterIO() methods in order to be able to request huge areas
The most limiting factor being the size of the nBandOffset argument. For example a 50000x50000x3 dataset cannot be RasterIO()'ed with band interleaving.
==> Implemented in GDAL 2.0dev per RFC 51
64bit dimensions might also be necessary for WMS global datasets with very high precision, where the number of pixels might exceed 2 billion in the base overview level.
Consider standard memsize-types: size_t and ptrdiff_t as portable and safe equivalents.
Use 64 bit integers for statistics array
GDALRasterBand::GetHistogram?(), GDALRasterBand::GetDefaultHistogram?() use a int* panHistogram array to store the value counts. For big rasters, this might not be sufficient and integer overflow occurs.
Related ticket #5159
Add char papszOptions (ahem, I meant const char* const* ) in some API so that they can later be extended easily
GDALOpen(), GDALRasterIO(), ...
Generally, code review for stricter type would be beneficial, where possible.
Increase size of OGRField structure
Might be necessary to have sub-second precision for OFTDateTime. Has been postponed because of its impact on ABI. Related ticket : #2680
Add new types of OGR geometries (curve geometries)
Adding new types of OGR geometries (wkbCircularString, and other curve geometries that are mentionned in the last versions of Single Feature specification) can potentially break code (including drivers) not ready to deal with them. We could perhaps imagine ways (*) of adding them without being considered as a breaking change however, so not necessarily 2.0 material.
(*) For example, a OGREnableCurveGeometries() method that user code could call to mention it is aware that the new types of geometries exist, in which case OGR will feel free to return them. If this method is not called, those curve geometries could be stroked as regular linestrings.
==> Implemented in GDAL 2.0dev per RFC 49.
Support XYZM coordinates
This has been raised a few times in the email list: http://lists.osgeo.org/pipermail/gdal-dev/2012-May/032869.html
OGR 64bit Integer Fields and FID
See RFC 31: OGR 64bit Integer Fields and FID. For exactly the same reasons given in the previous section, this could be considered as breaking change.
Also OGR_L_GetFeatureCount() currently returns a int, which might not be enough for huge datasets.
OGR_L_SetNextByIndex() currently takes a long, which might not be appropriate on 32bit platforms or 64bit Windows.
==> Implemented in GDAL 2.0dev per RFC 31.
Fix object lifecycle management in Python bindings
Not strictly an API/ABI issue, but fixing some of the issues might require changes in user scripts.
More stricter OGR SQL syntax to distinguish literals from identifiers
Current OGR SQL syntax interprets as equivalent single-quoting and double-quoting. However there are situations where using carefully single-quoting or double-quoting is the only way to understand the intent of the user. The SQL standard (at least as implemented in sqlite and postgresql) being : 'a_literal' and "a_identifier" (identifier being column or table name). See http://www.sqlite.org/lang_keywords.html
Example : SELECT CONCAT('x',PRFEDEA) FROM poly is OK. But now imagine that the PRFEDEA field is in fact called DESC, which is a reserved SQL keyword, thus needing quoting. And here we realize that we can't distinguish between a literal and a field name (should we concat 'x' with 'DESC'-the-string of 'x' with 'DESC'-the-field ?), if we consider single and double quoting as equivalent...
Not strictly an API/ABI issue, but fixing this would break user scripts/code/practice if they don't follow the SQL standard and use indifferently single and double-quotes.
Related ticket : #4280
==> Implemented in GDAL 2.0dev per RFC 52.
OGR : providing a base implementation of CreateFeature?(), SetFeature?(), etc... called before driver specific code to implement generic checks
Currently the virtual methods in OGR layer are overloaded in each driver to provide the appropriate implementation. But there might be generic checks that could be done in a more centralized way as a preliminary step. For example, checking that the datasource is opened in update mode.
A solution would be to make CreateFeature?() non-virtual that would call a ICreateFeature() virtual method implemented in each driver.
Related ticket : #4620.
==> Partially implemented in GDAL 2.0dev per RFC 49: CreateFeature?() and ISetFeature() now call the virtual methods ICreateFeature() and ISetFeature(). The check for the update flag hasn't been done.
Provide improved handling for Large Raster Attribute Tables
The current GetDefaultRAT/SetDefaultRAT implementation is not suitable for large RAT's - the whole RAT must be read and held in memory. A solution would be to create a new method to return a derived class that handles the actual reading and writing much like GetRasterBand?. This derived class could have methods for reading/writing parts of specified columns from/to application managed memory and handling data type conversion much like RasterIO.
==> Implemented per RFC 40 in GDAL 1.11
Revise how OGR handles string encodings ?
RFC23 has left some users unhappy, particularly with how encoding management is done with shapefiles. There are quite a few tickets on that issue and in #4808 a patch is proposed so that OGR drivers report their source encoding ( OGRLayer::GetEncoding?() ). RFC23 should be amended if this approach was accepted.
Expose shapefiles of type Polygon as MultiPolygon? (and LineString? as MultiLineString?) ?
The OGR Shapefile driver expose a layer geometry type Polygon, even if the shapefile contains multipolygons. This causes issues when converting such shapefiles with ogr2ogr to formats that have strict geometry type checking (PostGIS, Spatialite, etc...). A possibility would be to expose a layer geometry type MultiPolygon?, and automatically promote single part polygons as multipolygons.
The issue is discussed in #4939.
- install headers into PREFIX/include/gdal
- separate between private build-time headers (not to be installed) and public (for example cpl_config.h should not be installed)
- more consistent naming of files: sometimes it is gdalgrid.h and sometimes it is gdal_alg.h
- review the directory structure and make improvements for clarity
User changes (command line tools)
There are some things that can be hard when starting on gdal and are frequently on the mailing list. Here is a starter list, if those types of changes are being considered.
- Standardize command names. Some are gdal_command others are gdalcommand
- Running gdalinfo --formats should list the formats in some meaningful order. On linux I can | grep 'format name' or sort or various other things to make it useful. On Windows, I copy the result into notepad and then search (although more recently I installed gnuwin32). Sort of nice that a report on supported formats is so numerous as to be a problem. (note: they are returned in a meaningful order - the order they are tried - which is important to know sometimes)
- Coordinate order of [-projwin ulx uly lrx lry] makes sense if you think of it as a picture with 0,0 starting in the upper left. Many geospatial people have been trained to think of the 'proper' coordinate order differently. There is unlikely to be much agreement on this item so it may be best left alone.
- gdalinfo displays the palette, which is boring (and generally unneeded). I would prefer that gdalinfo doesn't display it by default, and that a new -ct option replaces the -noct option.
- ogr2ogr arguments are destination file then source file, while gdal_translate is source file then destination file.
- These aren't that hard to learn
- This will inconvenience long time user who will chronically do it the original way on their first try
- This will break a lot of written and running scripts that will need to be changed
Tickets related to other similar issues raised in the past:
- #289 - OGRPoint::get*() members should be const
- #1629 - Replace all occurrences of std::string with CPLString
- #1743 - Const correctness in cpl_http module
- #1752 - Make new method in OGREnvelope following const-correctness
- #2680 - OGR OFTDateTime needs more precision
- #3127 - Use char const instead of char for input strings with pointer reset
- #3153 - Define OGRErr as enumerated type
- #3818 - GDAL Command line tool naming consistency
- #3592 - Python bindings: allow chaining method calls while preserving C++ objects alive
- #4280 - [PATCH] SQL parser : differentiate quoting of string literals from identifiers
- #4620 - OGRPGTableLayer does not check for bUpdateAccess in CreateField?
- #4808 - [PATCH] Shapefile: interpreting LDID/87 not as ISO-8859-1 but as no codepage specified
- #4939 - shapefile to postgresql : ERROR: Geometry type (MultiPolygon?) does not match column type (Polygon)
- #5159 - gdalinfo -hist reports negative counts