Opened 6 years ago
Last modified 4 years ago
#1048 assigned enhancement
[raster] Coverage version of stats functions should be aggregates
Reported by: | pracine | Owned by: | dustymugs |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS Future |
Component: | raster | Version: | trunk |
Keywords: | Cc: |
Description
Thinking about it twice I was wondering if it would not be better to implement the coverage versions of ST_Histogram, ST_SummaryStat, ST_count, ST_ValueCount, ST_ValuePercent (BTW should not there be a coverage version of ST_Quantile?) as aggregates. We would be able to limit the stats to tiles selected with a where clause.
I guess you would just have to refactor the existing code into a complex state object with a state function accumulating stats and a final function.
Change History (16)
comment:1 Changed 6 years ago by
comment:2 Changed 6 years ago by
As for a coverage version of ST_Quantile, one has yet to been written due to potential memory issues. The problem is that all quantile functions require all values loaded and sorted. So, I need to see if I can only load subsets of values around the quantiles to be returned.
comment:3 Changed 6 years ago by
Milestone: | PostGIS 2.0.0 → PostGIS Raster Future |
---|
comment:4 Changed 6 years ago by
Look at a plpgsql version of an aggregate version of ST_SummaryStat() in http://trac.osgeo.org/postgis/browser/trunk/raster/scripts/plpgsql/st_summarystatsagg.sql
Only stddev did not make it.
comment:5 Changed 5 years ago by
Milestone: | PostGIS Raster Future → PostGIS Future |
---|
comment:6 Changed 5 years ago by
Milestone: | PostGIS Future → PostGIS 2.1.0 |
---|
Just some notes copied for raster/TODO...
Instead of stats functions using tablename, columnname for a coverage, it should be using aggregate functions. This change isn't too significant for most of the stats functions. BUT, the histogram and quantile functions will need to be rewritten to use 1-pass algorithms that do not require prior knowledge of the size of the incoming stream. The current coverage quantile function is a 1-pass algorithm but requires prior knowledge of the size of the stream.
comment:8 follow-up: 12 Changed 4 years ago by
It looks like aggregate functions cannot return sets of values/records.
As such, aggregate versions of ST_Histogram, ST_Quantile and ST_ValueCount() cannot result in a format as presently returned for the standard forms. It looks like there needs to be a singular form of output, such as an array...
SELECT ST_ValueCountAgg(rast, 1, ARRAY[1,2,3]::double precision[]) AS pvc FROM foo pvc ---------------------------------------------- {{1,12,0.33333},{2,12,0.33333},{3,12,0.33333}}
The aggregate version of ST_SummaryStats() will return the same output of the normal function as it only returns one record.
comment:9 Changed 4 years ago by
while we are talking about annoyances of the coverage ones. They don't accept table schema. I agree they would be much more useful as aggs though.
comment:10 Changed 4 years ago by
I guess the person who answered your question in the mailing list was speaking about set of rows when you mean set of columns. It is true that aggregates can't return set of rows. I don't know if they can return a set of column. Maybe by defining the final function properly.
My prototype of ST_SummaryStatAgg() in the Add-ons DO return a set of column by the mean of a type. But I guess we don't want to add new type like we used to.
comment:11 Changed 4 years ago by
No, Pavel understood my question correctly. I was wondering about sets of rows returning from an aggregate function. A single row can comprise any number of columns though.
comment:12 Changed 4 years ago by
Replying to dustymugs:
It looks like aggregate functions cannot return sets of values/records.
As such, aggregate versions of ST_Histogram, ST_Quantile and ST_ValueCount() cannot result in a format as presently returned for the standard forms. It looks like there needs to be a singular form of output, such as an array...
SELECT ST_ValueCountAgg(rast, 1, ARRAY[1,2,3]::double precision[]) AS pvc FROM foo pvc ---------------------------------------------- {{1,12,0.33333},{2,12,0.33333},{3,12,0.33333}}
So I guess you would like to make a small set returning function taking this output and splitting it into multiple rows/columns...
comment:13 Changed 4 years ago by
I don't know about that yet. There's nothing really clean/elegant...
comment:14 Changed 4 years ago by
Owner: | changed from pracine to dustymugs |
---|---|
Status: | new → assigned |
Another item worth noting. It appears as though the following code and query fails...
CREATE OR REPLACE FUNCTION _st_summarystats_finalfn( internal, OUT count bigint, OUT sum double precision, OUT mean double precision, OUT stddev double precision, OUT min double precision, OUT max double precision ) AS 'MODULE_PATHNAME', 'RASTER_summaryStats_finalfn' LANGUAGE 'c' IMMUTABLE; CREATE OR REPLACE FUNCTION _st_summarystats_transfn( internal, raster, integer, boolean, double precision ) RETURNS internal AS 'MODULE_PATHNAME', 'RASTER_summaryStats_transfn' LANGUAGE 'c' IMMUTABLE; CREATE AGGREGATE st_summarystatsagg(raster, integer, boolean, double precision) ( SFUNC = _st_summarystats_transfn, STYPE = internal, FINALFUNC = _st_summarystats_finalfn );
SELECT (stats).count, round((stats).sum::numeric, 3), round((stats).mean::numeric, 3), round((stats).stddev::numeric, 3), round((stats).min::numeric, 3), round((stats).max::numeric, 3) FROM ( SELECT ST_SummaryStatsAgg(rast, 1, TRUE) AS stats FROM test_summarystats ) foo; ERROR: record type has not been registered
The only workaround I've found is to create a type.
CREATE TYPE summarystats AS ( count bigint, sum double precision, mean double precision, stddev double precision, min double precision, max double precision ); CREATE OR REPLACE FUNCTION _st_summarystats_finalfn(internal) RETURNS summarystats AS 'MODULE_PATHNAME', 'RASTER_summaryStats_finalfn' LANGUAGE 'c' IMMUTABLE;
With the type...
SELECT (stats).count, round((stats).sum::numeric, 3), round((stats).mean::numeric, 3), round((stats).stddev::numeric, 3), round((stats).min::numeric, 3), round((stats).max::numeric, 3) FROM ( SELECT ST_SummaryStatsAgg(rast, 1, TRUE) AS stats FROM test_summarystats ) foo; count | round | round | round | round | round -------+---------+--------+-------+---------+------- 20 | -68.584 | -3.429 | 6.571 | -10.000 | 3.142 (1 row)
I don't think aggregate functions can return a multi-column result. By the looks of PostgreSQL's aggregate functions, they all return simple values except array_agg which is an array of values.
In addition, how would an aggregate function work for something like ST_ValueCount where there are two columns, the first being the value and the second being the count?