Version 5 (modified by qliu, 10 months ago)

--

Distance Analysis Tools for PostGIS Raster (PostGIS Raster GSoC 2012 project)

Creating an Euclidean Distance Raster in PostGIS

Objectives

We want to develop a raster/vector integrated approach to generate a distance raster coverage (optionnaly a tiled coverage) from a point coverage (could also be lines or polygons coverage) with some raster specifications, representing the euclidean distance to those points (lines or polygons).

This approach should be reusable and opening the way to further implementation of interpolation raster, since the distance raster is a specific and simpler case of an interpolation raster, in terms of storing the value associated with the nearest point(s) instead of distance to it.

Constraints

1. The source table of geometries (points, line or polygons) can contain one geometry or many (eventually millions). We want the method to scale well whatever the number of source geometry. 2. The desired raster is specified with parameters or by referencing an existing raster. 3. Sometimes the resolution of the desired raster is so high that the whole raster coverage cannot be stored in one PostgreSQL row. The approach must be able to produce a tiled raster coverage stored into many rows. 4. Source geometries might be outside the extent of the desired raster. 5. The approach should work well in a number of situations: a. small number of sources vs low resolution raster, b. small number of sources vs high resolution raster c. large number of sources vs low resolution d. large number of sources vs high resolution Large raster coverage 6. The user can specify a maximum distance to the source. When the source is too far from the geometry it gets assigned a nodata value. 7. We want the implementation to be generic enough to be reused to implement more general interpolation methods like nearest neighbor, IDW, spline or kriging. Otherwise we want it to be generic enough to be reused to implement more general cost distance methods.

Different Envisioned Approaches

1. The raster source approach

The first approach is similar to what is found in most GIS package. The source points (or geometries) are converted to a raster of sources and this raster is passed to the function. The function iterates over the pixels (or over the source points???) assigning the distance to the nearest point (or assigning minimum values to pixels within a circle progressively growing around each point???).

Converting one or a set of source geometries to one raster can be performed like this in PostGIS: raster ST_Union(raster ST_AsRaster(source_geom, list of parameters for raster specifications))

The Euclidean distance from each pixel to its nearest source pixel can then be computed using ST_MapAlgebra.

CREATE FUNCTION euclidean_fct(pos1 integer[], pos2 integer[])

RETURNS float AS \$\$ BEGIN SELECT ST_Distance(ST_Point(pos1), ST_Point(pos2)) as euclidean_dist;

RETURN euclidean_dist;

END; \$\$ LANGUAGE 'plpgsql';

CREATE TABLE distcoverage AS SELECT rid, ST_MapAlgebraFct(rast, 'euclidean_fct(position of computed pixel, position of source point pixel)'::regprocedure) rast FROM myexistingcoveragetable Remarks: ○ In case source table containing only one point (constraint 1), converting source point to raster could utilize ST_AsRaster() to create a raster with the point geometry and raster specifications. ○ In case source table containing more than one point but the number of source points is small (constraint 5), computing distance could rescan all the source points and assign the shortest distance to the pixel being computed.

Pros ○ Simplest approach already implemented in other GIS like GRASS. ○ Cons: ○ Producing an intermediate raster is costly if the requested raster resolution is very high (constraint 3, 5b & 5d). ○ ST_Union could be very inefficient at producing the required raster from a large set of geometries and there is no efficient method to produce such a raster right now in PostGIS (constraints 1, 5c & 5d). ○ It could be very inefficient to rescan all the source points to find the one nearest to the current pixel (constraint 6, 7, 8 & 12). We still have to see how GRASS does this efficiently. ○ Only rasters which extent contains all the source points could be produced. ○ This approach does not answer well to the requirement of developing a generic reusable solution for more interpolation needs (constraint 7).

2. The TIN approach

The second approach involve creating a TIN (triangulated irregular network) from the table of source geometries first and then to use this TIN to determine the nearest neighbors for each pixel.

The TIN is stored temporarily. A function iterating over each pixel determines in which part of the TIN the current pixel falls and determine the triangle corner nearest to the centroid of the pixel.

s

Remarks: • Nearest point to pixel could be determined by interesting the pixel centroid with the TIN in order to determine the triangle including the centroid. The nearest point is then determined from the three corner of the triangle. • It could also be determined by producing the TIN with the set of point including all the pixel centroid. Would this fasten the search for the nearest neighbor? This is still to determine. • Would this work with a table of line or geometries? How do we produce a TIN from a line or polygon coverage? • PostGIS might not be ready to produce a TIN from a point coverage.

Pros:

○ This would work well with a large number of source points (constraints 1, 5c & 5d) since we would not have to rescan all the source points for each pixel calculation. ○ This approach fulfills constraint 7 in that it would allow easy implementation of other interpolation methods. Cons: ○ We have no idea for now about the algorithm and performance of building TIN from source geometries in PostGIS if the number of source point is very large (Constraint 7 & 8) ○ Might be very inefficient and a waste of computing TIN in case there is only one point in the source table (Constraint 1) or there are very small number of source points and the requested raster is relatively small too (Constraint 5).

3. The KNN index approach

The third approach is similar to the second but use the new KNN indexing facilities of PostGIS to determine the nearest neighbor points of each pixel.

CREATE FUNCTION euclidean_fct(raster rast, text dbschema, text sourcetable, text sourcegeomcolumn)

RETURNS float AS \$\$ BEGIN

SELECT ST_Distance (ST_GeomFromText('POINT(columnx rowy)',ST_SRID(rast)), ST_Transform(sourcegeom, Find_SRID(dbschema,sourcetable,sourcegeomcolumn), ST_SRID(rast)) as eucliean_dist FROM sourcetable ORDER BY \$1 <-> \$2 LIMIT 1; RETURN euclidean_dist

END; \$\$ LANGUAGE 'plpgsql';

CREATE TABLE distcoverage AS SELECT rid, ST_MapalgebraFct(rast, 'euclidean_fct(rast, dbschema, sourcetable, sourcegeomcolumn)'::regprocedure) rast FROM myexistingcoveragetable

Pros:

○ Source points do not have to be in the raster extent (constraint 4) ○ Should work well with large number of source points (constraint 1, 5c & 5d) since we will be using KNN index to find the nearest point. ○ Make it possible to to createa tiled raster coverage aligned to an existing one (constraint 3) ○ Opening to be able to reused for further computing interpolation since the method to find neighbors is quite generic.

Cons:

○ Might not be as light as Approach 1 in case there is only one point in the source table (Constraint 1) or there are very small number of source points and the requested raster is relatively small too (Constraint 5) ○ Have no idea how to deal with Constraint 9.

Preferred Approach

Approach 3 is the preferred approach for the following reasons: ● it exploits and benefits from a very unique and performant new feature of PostGIS: KNN indexing, ● it is compliant with PostGIS Raster powerful raster/vector interactions ● in terms of scalability, it provides the possibility to generate a distance raster from any coverage of point (however numerous they are) since the time to find the nearest neighbor for each pixel is relatively constant (thanks to the KNN index), ● it makes it possible to work with huge tiled coverage (constraint 3) since the extent of the raster is independent of the source extent, ● it provides a reusable approach for other types of distances (Constraint 13) and interpolation.

Tentative functions signatures:

Generate a raster having the same alignment as a reference raster:

ST_EuclideanDistance(raster ref, text sourceschema, text sourcetable, text sourcegeomcolumn, text pixeltype, double precision value=1, double precision nodataval=0)

Generate a raster based on raster specifications:

● Set the dimensions of the raster by providing the parameters of a pixel size (scalex & scaley and skewx & skewy). The width & height of the resulting raster will be adjusted to fit the extent of the geometry

ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, double precision scalex, double precision scaley, double precision gridx, double precision gridy, text pixeltype, double precision value=1, double precision nodataval=0, double precision skewx=0, double precision skewy=0)

● Fix the dimensions of the raster by providing the dimensions of the raster (width & height). The parameters of the pixel size (scalex & scaley and skewx & skewy) of the resulting raster will be adjusted to fit the extent of the source geometry

ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, geometry geom, integer width, integer height, double precision gridx, double precision gridy, text pixeltype, double precision value=1, double precision nodataval=0, double precision skewx=0, double precision skewy=0)

For Constraint 6: (just write down the function that takes a reference raster for example) ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, raster ref, text pixeltype, double precision value=1, double precision nodataval=0, double precision maxdistance)