#931 closed task (fixed)
[raster] ST_Mean
Reported by: | dustymugs | Owned by: | dustymugs |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.0.0 |
Component: | raster | Version: | master |
Keywords: | history | Cc: |
Description
A function to get the arithmetic mean of a raster's band.
- ST_Mean(rast raster, nband int, ignore_nodata boolean) → double
returns the mean
nband: index of band to process on
ignore_nodata: if TRUE, any pixel who's value is nodata is ignored.
ST_Mean(rast, 2, TRUE)
- ST_Mean(rast raster, nband int) → double
assumes ignore_nodata = TRUE
ST_Mean(rast, 2)
- ST_Mean(rast raster, ignore_nodata boolean) → double
assumes band index = 1
ST_Mean(rast, FALSE)
- ST_Mean(rast raster) → double
assumes band index = 1 and ignore_nodata = TRUE
ST_Mean(rast)
Four approximation functions are also proposed sacrificing some accuracy for speed, especially on large rasters (10000 x 10000).
- ST_ApproxMean(rast raster, nband int, ignore_nodata boolean, sample_percent double precision) → double
sample_percent: a value between 0 and 1 indicating the percentage of the raster band's pixels to consider when determining the min/max pair.
ST_ApproxMean(rast, 3, FALSE, 0.1) ST_ApproxMean(rast, 1, TRUE, 0.5)
- ST_ApproxMean(rast raster, ignore_nodata boolean, sample_percent double precision) → double
assumes that nband = 1
ST_ApproxMean(rast, FALSE, 0.01) ST_ApproxMean(rast, TRUE, 0.025)
- ST_ApproxMean(rast raster, sample_percent double precision) → double
assumes that nband = 1 and ignore_nodata = TRUE
ST_ApproxMean(rast, 0.25)
- ST_ApproxMean(rast raster) → double
assumes that nband = 1, ignore_nodata = TRUE and sample_percent = 0.1
ST_ApproxMean(rast)
Attachments (1)
Change History (10)
follow-up: 2 comment:1 by , 14 years ago
comment:2 by , 14 years ago
Replying to pracine:
How will this work on a tiled raster coverage?
Would there be any issues? If the tiles were brought together using ST_Union or ST_Accum, it would be no different as passing a single tile. I may not understanding something…
follow-up: 4 comment:3 by , 14 years ago
Using ST_Union on a 1 TB coverage would mean creating a one row, 1 TB objects… I don't think we want to do that. ST_Union should be used to generate bigger raster from a mimited number of tiles. Or in a SET function returning itself only tiles.
I guess the preferred general technique to make a global raster coverage function is to create a ST_Mean function accepting a table name and a raster column name and have this function to build a EXECUTE query over all the raster of that table. Something like:
SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')
The code of this function would look very much like the one I prototyped at the end of raster\scripts\plpgsql\st_histogram.sql. In the case of ST_Mean, example 7.
comment:4 by , 14 years ago
Good point on the absolutely obscene coverage. I think your ST_Mean prototype would need to be expanded out to encompass all the other summary stat functions, ST_SummaryStats, ST_MinMax and ST_StdDev.
SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')
One thing that does concern me about the prototype above is the lack of flexibility in filtering the coverage table on the fly. Maybe something like
CREATE TYPE summarystatsarg AS ( rast raster, nband integer, hasnodata boolean, sample_percent double precision ); CREATE OR REPLACE FUNCTION st_mean(coveragestats[]) RETURNS int AS $$ DECLARE BEGIN -- some code END; $$ LANGUAGE 'plpgsql'; SELECT st_mean(( SELECT array_agg(stats) FROM ( SELECT ROW(rast, 1, FALSE, 1)::coveragestats AS stats FROM tmax_2011 WHERE observation_date = '2011-05-01' ) AS t ));
The above would give the user the most amount of flexibility and control over what is evaluated in the function.
I took a look at example 7 of st_histogram.sql and have only one comment/note…
In the implementation I'm thinking of, a weighed mean may work better plus it goes hand-in-hand with a weighed standard deviation.
http://en.wikipedia.org/wiki/Weighted_mean
As I've been looking at the ST_MinMax function and refactoring it for the summary stats, I envision that the summary stats will be the base stats function call. The histogram and quantile functions will depend upon the base stats as the histogram needs the min/max values to set the bounds for the number of bins and the quantile function needs the ordered series of values to determine a percentile's value. The reason I believe this is the right way to go is that the summary stats can be computed in one pass. Histograms and quantiles usually require a second pass so I'd rather keep them as separate functions that are called if needed.
comment:5 by , 14 years ago
Status: | new → assigned |
---|
I've decided to go with your suggestion Pierre.
ST_Mean(rastertable text, rastercolumn text)
I realized as I was playing around with various possibilities that your way was the only way that a tile would be loaded one at a time and not overtly tax memory.
So, a set of variations of ST_Mean and ST_ApproxMean for handling large sets of raster tiles:
- ST_Mean(rastertable text, rastercolumn text, nband int, hasnodata boolean) → double precision
ST_Mean('tmax_2010', 'rast', 1, FALSE) ST_Mean('precip_2011', 'rast', 1, TRUE)
- ST_Mean(rastertable text, rastercolumn text, nband int) → double precision
hasnodata is set to FALSE
ST_Mean('tmax_2010', 'rast', 1)
- ST_Mean(rastertable text, rastercolumn text, hasnodata boolean) → double precision
nband is set to 1
ST_Mean('precip_2011', 'rast', TRUE)
- ST_Mean(rastertable text, rastercolumn text) → double precision
nband is set to 1 and hasnodata is set to FALSE
ST_Mean('tmin_2009', 'rast')
Variations for ST_ApproxMean are:
- ST_ApproxMean(rastertable text, rastercolumn text, nband int, hasnodata boolean, sample_percent double precision) → double precision
ST_ApproxMean('tmax_2010', 'rast', 1, FALSE, 0.5) ST_ApproxMean('precip_2011', 'rast', 1, TRUE, 0.2)
- ST_ApproxMean(rastertable text, rastercolumn text, nband int, sample_percent double precision) → double precision
hasnodata is set to FALSE
ST_ApproxMean('tmax_2010', 'rast', 1, 0.5) ST_ApproxMean('precip_2011', 'rast', 1, 0.2)
- ST_ApproxMean(rastertable text, rastercolumn text, hasnodata boolean, sample_percent double precision) → double precision
nband is set to 1
ST_ApproxMean('tmax_2010', 'rast', FALSE, 0.5) ST_ApproxMean('precip_2011', 'rast', TRUE, 0.2)
- ST_ApproxMean(rastertable text, rastercolumn text, sample_percent double precision) → double precision
nband is set to 1 and hasnodata is set to FALSE
ST_ApproxMean('tmax_2010', 'rast', 0.5) ST_ApproxMean('precip_2011', 'rast', 0.2)
- ST_ApproxMean(rastertable text, rastercolumn text) → double precision
nband is set to 1, hasnodata is set to FALSE and sample_percent is set to 0.1
ST_ApproxMean('tmax_2010', 'rast') ST_ApproxMean('precip_2011', 'rast')
Similar variations will be provided for ST_SummaryStats, ST_StdDev and ST_MinMax and corresponding approximation functions.
comment:6 by , 14 years ago
The mean returned in the coverage functions is a weighted mean of the means from the raster tile.
by , 14 years ago
Attachment: | st_mean.patch added |
---|
Incremental patch for ST_Mean. ST_SummaryStats patch is required for this patch.
comment:7 by , 14 years ago
Adds ST_Mean function, which builds upon ST_SummaryStats. Merges cleanly against r7145.
The following patches must be merged first for this patch:
- ST_Band
- ST_SummaryStats
comment:8 by , 14 years ago
Keywords: | history added |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Added in r7149
comment:9 by , 13 years ago
Milestone: | PostGIS Raster Future → PostGIS 2.0.0 |
---|
How will this work on a tiled raster coverage?