wiki:DevWikiSpatialCollectionTutorial

Version 1 (modified by bnordgren, 10 years ago) (diff)

--

Introduction

This page is intended to quickly demonstrate the major features of the spatial collection concept...a concept designed to erase the distinction between raster and vector data types. Using the spatial collection concept, developers can write code which performs useful spatial functions without regard for the nature of the data on which they are operating.

The fundamental premise of the spatial collection design is that all spatial data, no matter the form it takes, can provide answers to these two questions:

  1. Is point P included?
  2. What is the value at point P?

In its current form, the implementation of spatial collection always provides an answer to question 2, although strictly speaking, it is optional. A purely geometric object may neglect to provide any value information. Currently, the wrapper for purely geometric objects will return one of two user specified values: an "inside" value if point P is included, and an "outside" value if it is not. The wrapper for a raster object will actually lookup the value given the provided point.

The value returned by a spatial collection is restricted to numeric types. An array of numbers (e.g., a vector) is returned every time a point is queried. A given spatial collection will return the same length vector no matter what point is queried, and the ordinal position in the vector is significant (e.g., the item at index zero always represents the same thing.) When wrapping a raster type, the vector could represent values from one or more selected bands at point P. While an individual spatial collection will always return a vector of the same length, a different spatial collection may have a vector of a different length, where the elements represent different quantities. The salient point is that all locations within a single spatial collection are consistently reported, but a different collection may report them differently.

The general strategy for using the spatial collection framework is to wrap the actual data item with a spatial collection early, then write the majority of the working code against the spatial collection interface.

Wrapping rasters

Here we demonstrate a typical workflow where a function accepts two rasters. The first order of business is to retrieve the rasters and some "helper" data from postgresql and wrap them. The first step is retrieval.

        /* r1 is null, return null */
        if (PG_ARGISNULL(0)) PG_RETURN_NULL();
        r1_pg = (rt_pgraster *) PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(0), 0, sizeof(struct rt_raster_serialized_t));

        /* r2 is null, return null */
        if (PG_ARGISNULL(1)) PG_RETURN_NULL();
        r2_pg = (rt_pgraster *) PG_DETOAST_DATUM_SLICE(PG_GETARG_DATUM(1), 0, sizeof(struct rt_raster_serialized_t));

....

In the interim, we load a list of desired bands from each raster as well as a (possibly null) index to the band we should use as a nodata value. We then wrap the rasters as follows:

        /* wrap r1 in a spatial collection */
        if (r1_hasnodata) {
                r1_sc = sc_create_pgraster_wrapper_nodata(r1_pg,
                                r1_bands, r1_bandnum, r1_nodata) ;
        } else {
                r1_sc = sc_create_pgraster_wrapper(r1_pg, r1_bands, r1_bandnum) ;
        }

        /* wrap r2 in a spatial collection */
        if (r2_hasnodata) {
                r2_sc = sc_create_pgraster_wrapper_nodata(r2_pg,
                                r2_bands, r2_bandnum, r2_nodata) ;
        } else {
                r2_sc = sc_create_pgraster_wrapper(r2_pg, r2_bands, r2_bandnum) ;
        }

Note that this version of the raster wrapper accepts a pg_raster, or the serialized version. The wrapper actually takes care of deserializing it and making the spatial collection object ready to answer the questions listed above. There is also a version of the raster wrapper which takes a deserialized rt_raster if you have one.

Note also that there are two versions of the raster wrapper: one which supports a "nodata" lookup and the other which assumes that all pixels contain data. The latter version is a little simpler to set up, and may offer a performance advantage. Both versions offer the opportunity to specify a list of bands which should be returned when a point is queried. If the list of bands (r1_bands or r2_bands, above) is not provided, then all of the raster's bands are returned.

Spatial Relationships as Spatial Collections

At this point, we have two spatial collections, not two rasters. As long as we only write code which uses the spatial collection interface, we do not have to change the code to handle different data types. Observe how the set of spatial relationships (intersection, union, difference and symdifference) is implemented against the spatial collection interface:

        /* create the evaluator */
        eval = sc_create_first_value_evaluator(r1_sc, r2_sc) ;
        if (eval != NULL) {
                /* create the "relation" operator */
                relation_op = sc_create_sync_relation_op(SPATIAL_PLUS_VALUE,
                                r1_sc, r2_sc, relation, eval) ;

Our intent here is to create a single spatial collection (relation_op), which answers the two questions above based on a spatial operation applied to two input collections. The answer to "Is point P included?" is straightforward. However, we are left with a choice as to how to generate the value associated with point P. In this case, we instantiate a "first value" evaluator, which will return the value from r1_sc if possible, and otherwise will return the value of r2_sc.

In the above, sc_create_first_value_evaluator instantiates an evaluator (one of the two interfaces related to a spatial collection) to provide answers to the value question. Although this is hardcoded in this instance, it does not have to be. This evaluator is then used when sc_create_sync_relation_op instantiates relation_op, which affects how relation_op generates and returns values when points are queried.

The salient point here is that value logic is separated from spatial logic. Different methods of determining values may be implemented and passed to the relation operator. This allows the relation operator code to be reused in other contexts (perhaps as a different function exposed to the user, perhaps as a utility object performing some supporting computation.) Likewise, the first-value evaluator may be reused as a component of any spatial collection where that behavior is appropriate.

Each of these spatial collections carries with it a GBOX named extent. When relation_op was instantiated, an approximate (or expected) extent was calculated based on the specified relation (intersection, union...) and the extents of the two inputs. It is important to point out that at this point, we have set up an operator, but we have not performed any actual computations. The extent of relation_op may be too big. It is best to take this with a grain of salt.