Changes between Version 7 and Version 8 of rfc62_raster_algebra


Ignore:
Timestamp:
Sep 24, 2016, 12:37:32 AM (8 years ago)
Author:
Ari Jolma
Comment:

Rewrite. Add related work and requirements.

Legend:

Unmodified
Added
Removed
Modified
  • rfc62_raster_algebra

    v7 v8  
    1717Raster algebra is a well known branch of geospatial science and technology and an often needed tool. Currently GDAL does not have comprehensive support for raster algebra in core.
    1818
     19== Related work ==
     20
     21Python bindings: raster bands or parts of raster bands can be read into / written from numpy array objects. Huge rasters can be iterated block by block. numpy methods allow efficient implementation of many raster algebra methods using python. There is some support for parallel processing using numpy.
     22
     23Perl bindings: raster bands or parts of raster bands can be read into / written from PDL objects. Huge rasters can be iterated block by block. PDL methods allow efficient implementation of many raster algebra methods using perl. There is some support for parallel processing using PDL.
     24
     25QGIS raster calculator: raster calculation string is parsed into an expression tree and output band is computed row by row from the inputs. All computations are done with double precision floating point numbers. The calculation does not support parallel processing.
     26
     27PostGIS raster: raster algebra is supported by callback functions.
     28
     29There is existing research, which may be exploited:
     30
     31Parallel Priority-Flood depression filling for trillion cell digital elevation models on desktops or clusters http://www.sciencedirect.com/science/article/pii/S0098300416301704
     32
     33Parallel Non-divergent Flow Accumulation For Trillion Cell Digital Elevation Models On Desktops Or Clusters https://arxiv.org/abs/1608.04431
     34
     35== Requirements (Goals) ==
     36
     37The implementation should be data type aware. This may mean code written with templates.
     38
     39The implementation should be parallel processing friendly.
     40
     41The implementation should allow a relatively easy to use C++ / C API. This may mean interface, which does not use templates.
     42
     43The implementation should allow arbitrary functions on cell values. I.e., be extensible by the user.
     44
     45The implementation should allow focal methods. I.e., methods, where the value of a cell depends on its neighborhood.
     46
    1947== Approach ==
    2048
    21 GDAL presents a few problems to raster algebra implementation: 1) the access to data is based on blocks, 2) GDAL supports several datatypes, even complex values, and 3) there is no immediate support for the not-simple data structures needed by some methods (by "method" I refer to functions of raster algebra in this text). It is proposed to solve these problems as follows.
     49The implementation does not need to be tightly integrated with the core. This means an "add-on" type solution is ok.
    2250
    23 1) The code is organized into two levels: the first one, which loops through all blocks in band(s) and maintains a cache of needed blocks, and the second one, which loops through all cells in a block. The first level is generic one and the method and its arguments just pass through it. The method is passed as a callback and arguments as pointers to objects of base class. The second one is method specific, it is where the method is really implemented. The method callbacks must have similar prototype and their return value is used to, e.g., determine whether the block needs to be written to disk etc.
    24 
    25 2) C++ templates will be used to generically support multiple cell datatypes. However, an interface class will be written, which is not templated, so the user is not distracted with the datatype too much.
    26 
    27 3) An interface class structure is used for arguments and return values (non-band). The interface classes are not templated, they shadow actual classes, which are templated and use STL as much as possible.
    28 
    29 == The API ==
    30 
    31 A proposal is to have only a C++ interface, and also base the bindings on this C++ interface.
    32 
    33 {{{
    34 // the API:
    35 
    36 // The C++ API (these are all interface classes, so everything is virtual as the actual code is in hidden template classes
    37 
    38 class gma_object_t {
    39 public:
    40     virtual ~gma_object_t();
    41     virtual gma_class_t get_class();
    42 };
    43 
    44 class gma_number_t : public gma_object_t {
    45 public:
    46     virtual GDALDataType datatype();
    47     virtual void set_value(double value);
    48     virtual void set_value(int value);
    49     virtual int value_as_int();
    50     virtual double value_as_double();
    51     virtual bool is_defined();
    52     virtual void set_inf(int inf); // -1 to minus inf, 0 to not inf, 1 to plus inf
    53     virtual bool is_inf();
    54     virtual bool is_integer();
    55     virtual bool is_float();
    56 };
    57 
    58 class gma_pair_t : public gma_object_t {
    59 public:
    60     virtual void set_first(gma_object_t *first);
    61     virtual void set_second(gma_object_t *second);   
    62     virtual gma_object_t *first();
    63     virtual gma_object_t *second();
    64 };
    65 
    66 class gma_bins_t : public gma_object_t {
    67 public:
    68     virtual GDALDataType datatype();
    69     virtual unsigned int size();
    70     virtual void push(int value);
    71     virtual void push(double value);
    72 };
    73 
    74 class gma_histogram_t : public gma_object_t {
    75 public:
    76     virtual GDALDataType datatype();
    77     virtual unsigned int size();
    78     virtual gma_object_t *at(unsigned int i);
    79     virtual void print();
    80 };
    81 
    82 class gma_classifier_t : public gma_object_t {
    83 public:
    84     virtual GDALDataType datatype();
    85     virtual unsigned int size();
    86     virtual void add_class(gma_number_t *interval_max, gma_number_t *value);
    87     virtual void add_value(gma_number_t *old_value, gma_number_t *new_value);
    88     virtual void add_default(gma_number_t *default_value);
    89 };
    90 
    91 class gma_cell_t  : public gma_object_t {
    92 public:
    93     virtual GDALDataType datatype();
    94     virtual int& x();
    95     virtual int& y();
    96     virtual void set_value(double value);
    97     virtual void set_value(int value);
    98     virtual int value_as_int();
    99     virtual double value_as_double();
    100 };
    101 
    102 /*
    103   Return value 0 interrupts, 1 denotes ok, and 2 denotes ok and a need
    104   to save the cell value back to band.
    105 */
    106 typedef int (*gma_cell_callback_f)(gma_cell_t*, gma_object_t*);
    107 
    108 class gma_cell_callback_t : public gma_object_t {
    109 public:
    110     virtual void set_callback(gma_cell_callback_f callback);
    111     virtual void set_user_data(gma_object_t*);
    112 };
    113 
    114 class gma_logical_operation_t  : public gma_object_t {
    115 public:
    116     virtual GDALDataType datatype();
    117     virtual void set_operation(gma_operator_t);
    118     virtual void set_value(int value);
    119     virtual void set_value(double value);
    120 };
    121 
    122 class gma_hash_t : public gma_object_t {
    123 public:
    124     virtual GDALDataType datatype();
    125     virtual int size();
    126     virtual std::vector<gma_number_t*> *keys_sorted();
    127     virtual gma_object_t *get(gma_number_t *key);
    128 };
    129 
    130 class gma_band_t : public gma_object_t {
    131 public:
    132     virtual void update();
    133     virtual GDALRasterBand *band();
    134     virtual GDALDataset *dataset();
    135     virtual GDALDriver *driver();
    136     virtual GDALDataType datatype();
    137     virtual bool datatype_is_integer();
    138     virtual bool datatype_is_float();
    139     virtual int w();
    140     virtual int h();
    141 
    142     virtual void set_progress_fct(GDALProgressFunc progress, void * progress_arg);
    143 
    144     virtual gma_band_t *new_band(const char *name, GDALDataType datatype);
    145     virtual gma_number_t *new_number();
    146     virtual gma_number_t *new_int(int value);
    147     virtual gma_pair_t *new_pair();
    148     virtual gma_pair_t *new_range();
    149     virtual gma_bins_t *new_bins();
    150     virtual gma_cell_t *new_cell();
    151     virtual gma_classifier_t *new_classifier();
    152     virtual gma_cell_callback_t *new_cell_callback();
    153     virtual gma_logical_operation_t *new_logical_operation();
    154 
    155     virtual void print();
    156     virtual void rand();
    157     virtual void abs();
    158     virtual void exp();
    159     virtual void log();
    160     virtual void log10();
    161     virtual void sqrt();
    162     virtual void sin();
    163     virtual void cos();
    164     virtual void tan();
    165     virtual void ceil();
    166     virtual void floor();
    167 
    168     virtual void assign(int value);
    169     virtual void assign_all(int value);
    170     virtual void add(int summand);
    171     virtual void subtract(int);
    172     virtual void multiply(int);
    173     virtual void divide(int);
    174     virtual void modulus(int divisor);
    175    
    176     virtual void assign(double value);
    177     virtual void assign_all(double value);
    178     virtual void add(double summand);
    179     virtual void subtract(double);
    180     virtual void multiply(double);
    181     virtual void divide(double);
    182 
    183     virtual void classify(gma_classifier_t*);
    184     virtual void cell_callback(gma_cell_callback_t*);
    185 
    186     // arg = NULL, pair:(n,pair:(min,max)), or bins; returns histogram
    187     virtual gma_histogram_t *histogram(gma_object_t *arg = NULL);
    188     // returns hash of a hashes, keys are zone numbers
    189     virtual gma_hash_t *zonal_neighbors();
    190     virtual gma_number_t *get_min();
    191     virtual gma_number_t *get_max();
    192     // returns a pair of numbers
    193     virtual gma_pair_t *get_range();
    194     virtual std::vector<gma_cell_t*> *cells();
    195 
    196     // op can be used to make the operation conditional,
    197     // the test is made against the value of the parameter band
    198     virtual void assign(gma_band_t *, gma_logical_operation_t *op = NULL);
    199     virtual void add(gma_band_t *, gma_logical_operation_t *op = NULL);
    200     virtual void subtract(gma_band_t *, gma_logical_operation_t *op = NULL);
    201     virtual void multiply(gma_band_t *, gma_logical_operation_t *op = NULL);
    202     virtual void divide(gma_band_t *, gma_logical_operation_t *op = NULL);
    203     virtual void modulus(gma_band_t *, gma_logical_operation_t *op = NULL);
    204 
    205     // this = value where decision is true
    206     // the decision band must be of type uint8_t
    207     virtual void decision(gma_band_t *value, gma_band_t *decision);
    208 
    209     virtual gma_hash_t *zonal_min(gma_band_t *zones);
    210     virtual gma_hash_t *zonal_max(gma_band_t *zones);
    211 
    212     virtual void rim_by8(gma_band_t *areas);
    213 
    214     virtual void fill_depressions(gma_band_t *dem);
    215     virtual void D8(gma_band_t *dem);
    216     virtual void route_flats(gma_band_t *dem);
    217     virtual void upstream_area(gma_band_t *);
    218     virtual void catchment(gma_band_t *, gma_cell_t *);
    219 
    220 };
    221 
    222 // a band object factory
    223 gma_band_t *gma_new_band(GDALRasterBand *b);
    224 
    225 }}}
     51GDAL design sets some constraints/requirements to raster algebra implementation: 1) the access to data is based on blocks, 2) GDAL supports several datatypes, even complex values, 3) there is no immediate support for the not-simple data structures needed by some methods (by "method" I refer to functions of raster algebra in this text), 4) data can be read from a band in parallel but writing needs to be exclusive.
    22652
    22753== Changes ==
     
    23763== Utilities ==
    23864
    239 A new utility will be written to take advantage of the functionality.
     65Existing utilities are not affected but new utilities may be written taking advantage of the new functionality.
    24066
    24167== Documentation ==
    24268
    243 Will be written.
     69Must be written.
    24470
    24571== Test Suite ==
    24672
    247 Will be written.
     73Must be written.
    24874
    24975== Compatibility Issues ==
     
    25379== Implementation ==
    25480
    255 The implementation will be done by Ari Jolma.
     81A proposed implementation is developed at https://github.com/ajolma/raster_algebra
    25682
    257 The proposed implementation is developed on https://github.com/ajolma/raster_algebra
     83This code attempts to solve the problem as follows. (The source is in transition from an old approach, which was based on operators as methods, while the new approach is based on operator classes)
     84
     85Classes 'operand' and 'operator' are defined. An operand is an object, which holds data and an operator is an object, which computes a result (essentially an operand) from operands.
     86
     87Raster algebra computation is a tree of operand and operator objects, which is executed in a recursive fashion.
     88
     89There are interface classes and templated concrete classes. The concrete classes inherit from the interface classes.
     90
     91Two operand classes are defined: a number and a band. There is a need for other types of operands. For example a classifier would map integer values or real number ranges into numbers. Code for such exists in the source but it is not organized to reflect the new approach.
     92
     93A central method is 'compute' in band class, which is basically the effective block loop code presented in the documentation for GDALRasterBand::ReadBlock.
     94
     95Multiple data types are supported by template concrete class for bands and by overloaded get_value method, which returns the value in required data type.
    25896
    25997== Voting history