|Version 28 (modified by qliu, 10 months ago)|
Distance Analysis Tools for PostGIS Raster (PostGIS Raster GSoC 2012 project)
Creating an Euclidean Distance Raster in PostGIS
We want to develop a raster/vector integrated approach to generate a distance raster coverage (optionally a tiled coverage) from a point coverage (could also be lines or polygons coverage) with some raster specifications, representing the Euclidean distance to those points (lines or polygons).
This approach should be reusable and opening the way to further implementation of interpolation raster, since the distance raster is a specific and simpler case of an interpolation raster, in terms of storing the value associated with the nearest point(s) instead of distance to it.
- The source table of geometries (points, line or polygons) can contain one geometry or many (eventually millions). We want the method to scale well whatever the number of source geometry.
- The desired raster is specified with parameters or by referencing an existing raster.
- Sometimes the resolution of the desired raster is so high that the whole raster coverage cannot be stored in one PostgreSQL row. The approach must be able to produce a tiled raster coverage stored into many rows.
- Source geometries might be outside the extent of the desired raster.
- The approach should work well in a number of situations:
- small number of sources vs low resolution raster
- small number of sources vs high resolution raster
- large number of sources vs low resolution
- large number of sources vs high resolution Large raster coverage
- The user can specify a maximum distance to the source. When the source is too far from the geometry it gets assigned a nodata value.
- We want the implementation to be generic enough to be reused to implement more general interpolation methods like nearest neighbor, IDW, spline or kriging. Otherwise we want it to be generic enough to be reused to implement more general cost distance methods.
Different Envisioned Approaches
1. The raster source approach
The first approach is similar to what is found in most GIS package. The source points (or geometries) are converted to a raster of sources and this raster is passed to the function. The function iterates over the pixels (or over the source points???) assigning the distance to the nearest point (or assigning minimum values to pixels within a circle progressively growing around each point???).
Converting one or a set of source geometries to one raster can be performed like this in PostGIS:
raster ST_Union(raster ST_AsRaster(source_geom, list of parameters of raster specifications))
The Euclidean distance from each pixel to its nearest source pixel can then be computed using ST_MapAlgebra.
CREATE FUNCTION euclidean_fct(pos1 integer, pos2 integer) RETURNS float AS $$ BEGIN SELECT ST_Distance(ST_Point(pos1), ST_Point(pos2)) as euclidean_dist; RETURN euclidean_dist; END; $$ LANGUAGE 'plpgsql'; CREATE TABLE distcoverage AS SELECT rid, ST_MapAlgebraFct(rast, 'euclidean_fct(position of computed pixel, position of source point pixel)'::regprocedure) rast FROM myexistingcoveragetable
- In case source table containing only one point (constraint 1), converting source point to raster could utilize ST_AsRaster() to create a raster with the point geometry and raster specifications.
- In case source table containing more than one point but the number of source points is small (constraint 5), computing distance could rescan all the source points and assign the shortest distance to the pixel being computed.
- Simplest approach already implemented in other GIS like GRASS.
- Producing an intermediate raster is costly if the requested raster resolution is very high (constraint 3, 5b & 5d).
- ST_Union could be very inefficient at producing the required raster from a large set of geometries and there is no efficient method to produce such a raster right now in PostGIS (constraints 1, 5c & 5d).
- It could be very inefficient to rescan all the source points to find the one nearest to the current pixel (constraint 6, 7, 8 & 12). We still have to see how GRASS does this efficiently.
- Only rasters which extent contains all the source points could be produced.
- This approach does not answer well to the requirement of developing a generic reusable solution for more interpolation needs (constraint 7).
2. The TIN approach
The second approach involve creating a TIN (triangulated irregular network) from the table of source geometries first and then to use this TIN to determine the nearest neighbors for each pixel.
The TIN is stored temporarily. A function iterating over each pixel determines in which part of the TIN the current pixel falls and determine the triangle corner nearest to the centroid of the pixel.
- Nearest point to pixel could be determined by interesting the pixel centroid with the TIN in order to determine the triangle including the centroid. The nearest point is then determined from the three corner of the triangle.
- It could also be determined by producing the TIN with the set of point including all the pixel centroid. Would this fasten the search for the nearest neighbor? This is still to determine.
- Would this work with a table of line or geometries? How do we produce a TIN from a line or polygon coverage?
- PostGIS might not be ready to produce a TIN from a point coverage.
- This would work well with a large number of source points (constraints 1, 5c & 5d) since we would not have to rescan all the source points for each pixel calculation.
- This approach fulfills constraint 7 in that it would allow easy implementation of other interpolation methods.
Comments from Mentor: Not really. If the raster HAS to be in the extent of the point coverage, as stated in the cons, it means it could not correspond to the arbitrary alignment/extent of an existing coverage. This is probably the main reason the third approach is preferable to the TIN one.
- We have no idea for now about the algorithm and performance of building TIN from source geometries in PostGIS if the number of source point is very large (Constraint 7 & 8)
Comments from Mentor: Right, ST_Within(raster, geometry) does not exist. But you would work with the centroid of the pixel, which is a point and the triangle of the TIN which are polygons and there is no problem doing that with ST_Within. If you would include the pixel centroid in the TIN as stated above, you would nort even have to do that as the centroid and the points would all be in the same TIN structure allowing fast find of neighbors.
- Might be very inefficient and a waste of computing TIN in case there is only one point in the source table (Constraint 1) or there are very small number of source points and the requested raster is relatively small too (Constraint 5).
3. The KNN index approach
The third approach is similar to the second but use the new KNN indexing facilities of PostGIS to determine the nearest neighbor points of each pixel.
CREATE FUNCTION euclidean_fct(raster rast, text dbschema, text sourcetable, text sourcegeomcolumn) RETURNS float AS $$ BEGIN SELECT ST_Distance (ST_GeomFromText('POINT(columnx rowy)',ST_SRID(rast)), ST_Transform(sourcegeom, Find_SRID(dbschema,sourcetable,sourcegeomcolumn), ST_SRID(rast)) as eucliean_dist FROM sourcetable ORDER BY $1 <-> $2 LIMIT 1; RETURN euclidean_dist END; $$ LANGUAGE 'plpgsql'; CREATE TABLE distcoverage AS SELECT rid, ST_MapalgebraFct(rast, 'euclidean_fct(rast, dbschema, sourcetable, sourcegeomcolumn)'::regprocedure) rast FROM myexistingcoveragetable
- Source points do not have to be in the raster extent (constraint 4)
- Should work well with large number of source points (constraint 1, 5c & 5d) since we will be using KNN index to find the nearest point.
- Make it possible to to createa tiled raster coverage aligned to an existing one (constraint 3)
- Opening to be able to reused for further computing interpolation since the method to find neighbors is quite generic.
- Might not be as light as Approach 1 in case there is only one point in the source table (Constraint 1) or there are very small number of source points and the requested raster is relatively small too (Constraint 5)
Comments from Mentor: In that case, should be fast anyway. We could optimize if there is only one source point.
Approach 3 is the preferred approach for the following reasons:
- It exploits and benefits from a very unique and performant new feature of PostGIS: KNN indexing,
- It is compliant with PostGIS Raster powerful raster/vector interactions
- In terms of scalability, it provides the possibility to generate a distance raster from any coverage of point (however numerous they are) since the time to find the nearest neighbor for each pixel is relatively constant (thanks to the KNN index),
- It makes it possible to work with huge tiled coverage (constraint 3) since the extent of the raster is independent of the source extent,
- It provides a reusable approach for other types of distances (Constraint 13) and interpolation.
Tentative functions signatures:
Generate a raster having the same alignment as a reference raster:
ST_EuclideanDistance(raster ref, text sourceschema, text sourcetable, text sourcegeomcolumn, text pixeltype, double precision value=1, double precision nodataval=0)
Generate a raster based on raster specifications:
- Set the dimensions of the raster by providing the parameters of a pixel size (scalex & scaley and skewx & skewy). The width & height of the resulting raster will be adjusted to fit the extent of the geometry
ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, double precision scalex, double precision scaley, double precision gridx, double precision gridy, text pixeltype, double precision value=1, double precision nodataval=0, double precision skewx=0, double precision skewy=0)
- Fix the dimensions of the raster by providing the dimensions of the raster (width & height). The parameters of the pixel size (scalex & scaley and skewx & skewy) of the resulting raster will be adjusted to fit the extent of the source geometry
ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, geometry geom, integer width, integer height, double precision gridx, double precision gridy, text pixeltype, double precision value=1, double precision nodataval=0, double precision skewx=0, double precision skewy=0)
- For Constraint 6: (just write down the function that takes a reference raster for example)
ST_EuclideanDistance(text dbschema, text sourcetable, text sourcegeomcolumn, raster ref, text pixeltype, double precision value=1, double precision nodataval=0, double precision maxdistance)
Comments from Mentor: You should integrate this parameter in the more generic signature instead. The idea is to create a generic function with the most complete signature and then to create variants assuming default value for some of the parameters. Only the most complete has to be implemented in C.