Opened 11 years ago

Closed 11 years ago

# [raster] ST_Mean

Reported by: Owned by: Bborie Park Bborie Park medium PostGIS 2.0.0 raster master history

### Description

A function to get the arithmetic mean of a raster's band.

1. ST_Mean(rast raster, nband int, ignore_nodata boolean) -> double

returns the mean

nband: index of band to process on

ignore_nodata: if TRUE, any pixel who's value is nodata is ignored.

```ST_Mean(rast, 2, TRUE)
```
1. ST_Mean(rast raster, nband int) -> double

assumes ignore_nodata = TRUE

```ST_Mean(rast, 2)
```
1. ST_Mean(rast raster, ignore_nodata boolean) -> double

assumes band index = 1

```ST_Mean(rast, FALSE)
```
1. ST_Mean(rast raster) -> double

assumes band index = 1 and ignore_nodata = TRUE

```ST_Mean(rast)
```

Four approximation functions are also proposed sacrificing some accuracy for speed, especially on large rasters (10000 x 10000).

1. ST_ApproxMean(rast raster, nband int, ignore_nodata boolean, sample_percent double precision) -> double

sample_percent: a value between 0 and 1 indicating the percentage of the raster band's pixels to consider when determining the min/max pair.

```ST_ApproxMean(rast, 3, FALSE, 0.1)

ST_ApproxMean(rast, 1, TRUE, 0.5)
```
1. ST_ApproxMean(rast raster, ignore_nodata boolean, sample_percent double precision) -> double

assumes that nband = 1

```ST_ApproxMean(rast, FALSE, 0.01)

ST_ApproxMean(rast, TRUE, 0.025)
```
1. ST_ApproxMean(rast raster, sample_percent double precision) -> double

assumes that nband = 1 and ignore_nodata = TRUE

```ST_ApproxMean(rast, 0.25)
```
1. ST_ApproxMean(rast raster) -> double

assumes that nband = 1, ignore_nodata = TRUE and sample_percent = 0.1

```ST_ApproxMean(rast)
```

### comment:1 follow-up:  2 Changed 11 years ago by pracine

How will this work on a tiled raster coverage?

### comment:2 in reply to:  1 Changed 11 years ago by Bborie Park

How will this work on a tiled raster coverage?

Would there be any issues? If the tiles were brought together using ST_Union or ST_Accum, it would be no different as passing a single tile. I may not understanding something...

### comment:3 follow-up:  4 Changed 11 years ago by pracine

Using ST_Union on a 1 TB coverage would mean creating a one row, 1 TB objects... I don't think we want to do that. ST_Union should be used to generate bigger raster from a mimited number of tiles. Or in a SET function returning itself only tiles.

I guess the preferred general technique to make a global raster coverage function is to create a ST_Mean function accepting a table name and a raster column name and have this function to build a EXECUTE query over all the raster of that table. Something like:

SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')

The code of this function would look very much like the one I prototyped at the end of raster\scripts\plpgsql\st_histogram.sql. In the case of ST_Mean, example 7.

### comment:4 in reply to:  3 Changed 11 years ago by Bborie Park

Good point on the absolutely obscene coverage. I think your ST_Mean prototype would need to be expanded out to encompass all the other summary stat functions, ST_SummaryStats, ST_MinMax and ST_StdDev.

SELECT ST_Mean('myrastercoveragetable', 'myrastercolumn')

One thing that does concern me about the prototype above is the lack of flexibility in filtering the coverage table on the fly. Maybe something like

```CREATE TYPE summarystatsarg AS (
rast raster,
nband integer,
hasnodata boolean,
sample_percent double precision
);

CREATE OR REPLACE FUNCTION st_mean(coveragestats[])
RETURNS int
AS \$\$
DECLARE
BEGIN
-- some code
END;
\$\$ LANGUAGE 'plpgsql';

SELECT st_mean((
SELECT
array_agg(stats)
FROM (
SELECT
ROW(rast, 1, FALSE, 1)::coveragestats AS stats
FROM tmax_2011
WHERE observation_date = '2011-05-01'
) AS t
));
```

The above would give the user the most amount of flexibility and control over what is evaluated in the function.

I took a look at example 7 of st_histogram.sql and have only one comment/note...

In the implementation I'm thinking of, a weighed mean may work better plus it goes hand-in-hand with a weighed standard deviation.

As I've been looking at the ST_MinMax function and refactoring it for the summary stats, I envision that the summary stats will be the base stats function call. The histogram and quantile functions will depend upon the base stats as the histogram needs the min/max values to set the bounds for the number of bins and the quantile function needs the ordered series of values to determine a percentile's value. The reason I believe this is the right way to go is that the summary stats can be computed in one pass. Histograms and quantiles usually require a second pass so I'd rather keep them as separate functions that are called if needed.

### comment:5 Changed 11 years ago by Bborie Park

Status: new → assigned

I've decided to go with your suggestion Pierre.

```ST_Mean(rastertable text, rastercolumn text)
```

I realized as I was playing around with various possibilities that your way was the only way that a tile would be loaded one at a time and not overtly tax memory.

So, a set of variations of ST_Mean and ST_ApproxMean for handling large sets of raster tiles:

1. ST_Mean(rastertable text, rastercolumn text, nband int, hasnodata boolean) -> double precision
```ST_Mean('tmax_2010', 'rast', 1, FALSE)

ST_Mean('precip_2011', 'rast', 1, TRUE)
```
1. ST_Mean(rastertable text, rastercolumn text, nband int) -> double precision

hasnodata is set to FALSE

```ST_Mean('tmax_2010', 'rast', 1)
```
1. ST_Mean(rastertable text, rastercolumn text, hasnodata boolean) -> double precision

nband is set to 1

```ST_Mean('precip_2011', 'rast', TRUE)
```
1. ST_Mean(rastertable text, rastercolumn text) -> double precision

nband is set to 1 and hasnodata is set to FALSE

```ST_Mean('tmin_2009', 'rast')
```

Variations for ST_ApproxMean are:

1. ST_ApproxMean(rastertable text, rastercolumn text, nband int, hasnodata boolean, sample_percent double precision) -> double precision
```ST_ApproxMean('tmax_2010', 'rast', 1, FALSE, 0.5)

ST_ApproxMean('precip_2011', 'rast', 1, TRUE, 0.2)
```
1. ST_ApproxMean(rastertable text, rastercolumn text, nband int, sample_percent double precision) -> double precision

hasnodata is set to FALSE

```ST_ApproxMean('tmax_2010', 'rast', 1, 0.5)

ST_ApproxMean('precip_2011', 'rast', 1, 0.2)
```
1. ST_ApproxMean(rastertable text, rastercolumn text, hasnodata boolean, sample_percent double precision) -> double precision

nband is set to 1

```ST_ApproxMean('tmax_2010', 'rast', FALSE, 0.5)

ST_ApproxMean('precip_2011', 'rast', TRUE, 0.2)
```
1. ST_ApproxMean(rastertable text, rastercolumn text, sample_percent double precision) -> double precision

nband is set to 1 and hasnodata is set to FALSE

```ST_ApproxMean('tmax_2010', 'rast', 0.5)

ST_ApproxMean('precip_2011', 'rast', 0.2)
```
1. ST_ApproxMean(rastertable text, rastercolumn text) -> double precision

nband is set to 1, hasnodata is set to FALSE and sample_percent is set to 0.1

```ST_ApproxMean('tmax_2010', 'rast')

ST_ApproxMean('precip_2011', 'rast')
```

Similar variations will be provided for ST_SummaryStats, ST_StdDev and ST_MinMax and corresponding approximation functions.

### comment:6 Changed 11 years ago by Bborie Park

The mean returned in the coverage functions is a weighted mean of the means from the raster tile.

### Changed 11 years ago by Bborie Park

Incremental patch for ST_Mean. ST_SummaryStats patch is required for this patch.

### comment:7 Changed 11 years ago by Bborie Park

Adds ST_Mean function, which builds upon ST_SummaryStats. Merges cleanly against r7145.

The following patches must be merged first for this patch:

1. ST_Band
1. ST_SummaryStats

### comment:8 Changed 11 years ago by Bborie Park

Keywords: history added → fixed assigned → closed