Changes between Initial Version and Version 1 of rfc66_randomlayerreadwrite


Ignore:
Timestamp:
Sep 28, 2016, 4:56:01 AM (8 years ago)
Author:
Even Rouault
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • rfc66_randomlayerreadwrite

    v1 v1  
     1= RFC 66 : OGR random layer read/write capabilities =
     2
     3Author: Even Rouault[[BR]]
     4
     5Contact: even.rouault at spatialys.com[[BR]]
     6
     7Status: Development[[BR]]
     8
     9Target version: 2.2
     10
     11== Summary ==
     12
     13This RFC introduces a new API to be able to iterate over vector features at
     14dataset level, in addition to the existing capability of doing it at the
     15layer level.
     16The existing capability of writing features in layers in random order, that is
     17supported by most drivers with output capabilities, is formalized with a new
     18dataset capability flag.
     19
     20== Rationale ==
     21
     22Some vector formats mix features that belong to different layers in an
     23interleaved way, which make the current feature iteration per layer rather
     24inefficient (this requires for each layer to read the whole file).
     25One example of such drivers is the OSM driver. For this driver, a hack had
     26been developped in the past to be able to use the OGRLayer::GetNextFeature()
     27method, but with a really particular semantics. See "Interleaved reading"
     28paragraph of http://gdal.org/drv_osm.html for more details. A similar need
     29arises with the development of a new driver, GMLAS (for GML Application Schemas),
     30that reads GML files with arbitrary element nesting, and thus can return them
     31in a apparent random order, because it works in a streaming way.
     32For example, let's consider the following simplified XML content :
     33{{{
     34<A>
     35    ...
     36    <B>
     37        ...
     38    </B>
     39    ...
     40</A>
     41}}}
     42The driver will be first able to complete the building of feature B before
     43emitting feature A. So when reading sequences of this pattern, the driver
     44will emit features in the order B,A,B,A,...
     45
     46== Changes ==
     47
     48=== C++ API ===
     49
     50Two new methods are added at the GDALDataset level :
     51
     52GetNextFeature():
     53
     54{{{
     55/**
     56 \brief Fetch the next available feature from this dataset.
     57
     58 The returned feature becomes the responsibility of the caller to
     59 delete with OGRFeature::DestroyFeature().
     60
     61 Depending on the driver, this method may return features from layers in a
     62 non sequential way. This is what may happen when the
     63 ODsCRandomLayerRead capability is declared (for example for the
     64 OSM and GMLAS drivers). When datasets declare this capability, it is strongly
     65 advised to use GDALDataset::GetNextFeature() instead of
     66 OGRLayer::GetNextFeature(), as the later might have a slow, incomplete or stub
     67 implementation.
     68 
     69 The default implementation, used by most drivers, will
     70 however iterate over each layer, and then over each feature within this
     71 layer.
     72
     73 This method takes into account spatial and attribute filters set on layers that
     74 will be iterated upon.
     75
     76 The ResetReading() method can be used to start at the beginning again.
     77
     78 Depending on drivers, this may also have the side effect of calling
     79 OGRLayer::GetNextFeature() on the layers of this dataset.
     80
     81 This method is the same as the C function GDALDatasetGetNextFeature().
     82
     83 @param ppoBelongingLayer a pointer to a OGRLayer* variable to receive the
     84                          layer to which the object belongs to, or NULL.
     85                          It is possible that the output of *ppoBelongingLayer
     86                          to be NULL despite the feature not being NULL.
     87 @param pdfProgressPct    a pointer to a double variable to receive the
     88                          percentage progress (in [0,1] range), or NULL.
     89                          On return, the pointed value might be negative if
     90                          determining the progress is not possible.
     91 @param pfnProgress       a progress callback to report progress (for
     92                          GetNextFeature() calls that might have a long duration)
     93                          and offer cancellation possibility, or NULL
     94 @param pProgressData     user data provided to pfnProgress, or NULL
     95 @return a feature, or NULL if no more features are available.
     96 @since GDAL 2.2
     97*/
     98
     99OGRFeature* GDALDataset::GetNextFeature( OGRLayer** ppoBelongingLayer,
     100                                         double* pdfProgressPct,
     101                                         GDALProgressFunc pfnProgress,
     102                                         void* pProgressData )
     103}}}
     104
     105and ResetReading():
     106
     107{{{
     108/**
     109 \brief Reset feature reading to start on the first feature.
     110
     111 This affects GetNextFeature().
     112
     113 Depending on drivers, this may also have the side effect of calling
     114 OGRLayer::ResetReading() on the layers of this dataset.
     115
     116 This method is the same as the C function GDALDatasetResetReading().
     117 
     118 @since GDAL 2.2
     119*/
     120void        GDALDataset::ResetReading();
     121}}}
     122
     123=== New capabilities ===
     124
     125The following 2 new dataset capabilities are added :
     126{{{
     127#define ODsCRandomLayerRead     "RandomLayerRead"   /**< Dataset capability for GetNextFeature() returning features from random layers */
     128#define ODsCRandomLayerWrite    "RandomLayerWrite " /**< Dataset capability for supporting CreateFeature on layer in random order */
     129}}}
     130
     131=== C API ===
     132
     133The above 2 new methods are available in the C API with :
     134{{{
     135OGRFeatureH CPL_DLL GDALDatasetGetNextFeature( GDALDatasetH hDS,
     136                                               OGRLayerH* phBelongingLayer,
     137                                               double* pdfProgressPct,
     138                                               GDALProgressFunc pfnProgress,
     139                                               void* pProgressData )
     140
     141void CPL_DLL GDALDatasetResetReading( GDALDatasetH hDS );
     142}}}
     143
     144== Discussion about a few design choices of the new API ==
     145
     146Compared to OGRLayer::GetNextFeature(), GDALDataset::GetNextFeature() has a
     147few differences :
     148- it returns the layer which the feature belongs to. Indeed, there's no easy way
     149  from a feature to know which layer it belongs too (since in the data model,
     150  features can exist outside of any layer). One possibility would be to
     151  correlate the OGRFeatureDefn* object of the feature with the one of the layer,
     152  but that is a bit inconvenient to do (and theoretically, one could imagine
     153  several layers sharing the same feature definition object, although this
     154  probably never happen in any in-tree driver).
     155- even if the feature returned is not NULL, the returned layer might be NULL.
     156  This is just a provision for now, since that cannot currently happen. This
     157  could be interesting to address schema-less datasources where basically each
     158  feature could have a different schema (GeoJSON for example) without really
     159  belonging to a clearly identified layer.
     160- it returns a progress percentage. When using OGRLayer API, one has to count
     161  the number of features returned with the total number returned by GetFeatureCount().
     162  For the use cases we want to address knowing quickly the total number of features
     163  of the dataset is not doable. But knowing the position of the file pointer
     164  regarding the total size of the size is easy. Hence the decision to make
     165  GetNextFeature() return the progress percentage. Regarding the choice of the
     166  range [0,1], this is to be consistent with the range accepted by GDAL progress
     167  functions.
     168- it accepts a progress and cancellation callback. One could wonder why this is
     169  needed given that GetNextFeature() is an "elementary" method and that it
     170  can already returns the progress percentage. However, in some circumstances,
     171  it might take a rather long time to complete a GetNextFeature() call. For
     172  example in the case of the OSM driver, as an optimization you can ask the
     173  driver to return features of a subset of layers. For example all layers except
     174  nodes. But generally the nodes are at the beginning of the file, so before you
     175  get the first feature, you have typically to process 70% of the whole file. In
     176  the GMLAS driver, the first GetNextFeature() call is also the opportunity to
     177  do a preliminary quick scan of the file to determine the SRS of geometry columns,
     178  hence having progress feedback is welcome.
     179
     180The progress percentage output is redundant with the progress callback mechanism,
     181and the latter could be used to get the former, however it may be a bit convoluted.
     182It would require doing things like:
     183
     184{{{
     185int MyProgress(double pct, const char* msg, void* user_data)
     186{
     187    *(double*)user_data = pct;
     188    return TRUE;
     189}
     190
     191myDS->GetNextFeature(&poLayer, MyProgress, &pct)
     192}}}
     193
     194
     195== SWIG bindings (Python / Java / C# / Perl) changes ==
     196
     197GDALDatasetGetNextFeature is mapped as gdal::Dataset::GetNextFeature() and
     198GDALDatasetResetReading as gdal::Dataset::ResetReading().
     199
     200Regarding gdal::Dataset::GetNextFeature(), currently only Python has been modified
     201to return both the feature and its belonging layer. Other bindings just return
     202the feature for now (would need specialized typemaps)
     203
     204== Drivers ==
     205
     206The OSM and GMLAS driver are updated to implement the new API.
     207
     208Existing drivers that support ODsCRandomLayerWrite are updated to advertize it
     209(that is most drivers that have layer creation capabilities, with the exceptions
     210of KML, JML and GeoJSON).
     211
     212== Utilities ==
     213
     214ogr2ogr / GDALVectorTranslate() is changed internally to remove the hack that
     215was used for the OSM driver to use the new API, when ODsCRandomLayerRead is
     216advertized. It checks if the output driver advertizes ODsCRandomLayerWrite, and
     217if it does not, emit a warning, but still goes on proceeding with the conversion
     218using random layer reading/writing.
     219
     220ogrinfo is extended to accept a -rl (for random layer) flag that instructs it
     221to use the GDALDataset::GetNextFeature() API. It was considered to use it
     222automatically when ODsCRandomLayerRead was advertized, but the output can be
     223quite... random and thus not very practical for the user.
     224
     225== Documentation ==
     226
     227All new methods/functions are documented.
     228
     229== Test Suite ==
     230
     231The specialized GetNextFeature() implementation of the OSM and GMLAS driver
     232is tested in their respective tests. The default implementation of
     233GDALDataset::GetNextFeature() is tested in the MEM driver tests.
     234
     235== Compatibility Issues ==
     236
     237None for existing users of the C/C++ API.
     238
     239Since there is a default implementation, the new functions/methods can be safely
     240used on drivers that don't have a specialized implementation.
     241
     242The addition of the new virtual methods GDALDataset::ResetReading() and
     243GDALDataset::GetNextFeature() may cause issues for out-of-tree drivers that
     244would already use internally such method names, but with different semantics,
     245or signatures. We have encountered such issues with a few in-tree drivers, and
     246fixed them.
     247
     248== Implementation ==
     249
     250The implementation will be done by Even Rouault, and is mostly triggered by
     251the needs of the new GMLAS driver (initial development funded by the European Earth
     252observation programme Copernicus).
     253
     254The proposed implementation is in https://github.com/rouault/gdal2/tree/gmlas_randomreadwrite
     255(commit: https://github.com/rouault/gdal2/commit/8447606d68b9fac571aa4d381181ecfffed6d72c)
     256
     257== Voting history ==
     258
     259TBD