Opened 6 years ago

Last modified 2 months ago

#1048 assigned enhancement

[raster] Coverage version of stats functions should be aggregates

Reported by: pracine Owned by: dustymugs
Priority: medium Milestone: PostGIS Fund Me
Component: raster Version: trunk
Keywords: Cc:

Description

Thinking about it twice I was wondering if it would not be better to implement the coverage versions of ST_Histogram, ST_SummaryStat, ST_count, ST_ValueCount, ST_ValuePercent (BTW should not there be a coverage version of ST_Quantile?) as aggregates. We would be able to limit the stats to tiles selected with a where clause.

I guess you would just have to refactor the existing code into a complex state object with a state function accumulating stats and a final function.

Change History (17)

comment:1 Changed 6 years ago by dustymugs

I don't think aggregate functions can return a multi-column result. By the looks of PostgreSQL's aggregate functions, they all return simple values except array_agg which is an array of values.

In addition, how would an aggregate function work for something like ST_ValueCount where there are two columns, the first being the value and the second being the count?

comment:2 Changed 6 years ago by dustymugs

As for a coverage version of ST_Quantile, one has yet to been written due to potential memory issues. The problem is that all quantile functions require all values loaded and sorted. So, I need to see if I can only load subsets of values around the quantiles to be returned.

comment:3 Changed 6 years ago by dustymugs

Milestone: PostGIS 2.0.0PostGIS Raster Future

comment:4 Changed 6 years ago by pracine

Look at a plpgsql version of an aggregate version of ST_SummaryStat() in http://trac.osgeo.org/postgis/browser/trunk/raster/scripts/plpgsql/st_summarystatsagg.sql

Only stddev did not make it.

comment:5 Changed 6 years ago by pracine

Milestone: PostGIS Raster FuturePostGIS Future

comment:6 Changed 5 years ago by dustymugs

Milestone: PostGIS FuturePostGIS 2.1.0

Just some notes copied for raster/TODO...

Instead of stats functions using tablename, columnname for a coverage, it should be using aggregate functions. This change isn't too significant for most of the stats functions. BUT, the histogram and quantile functions will need to be rewritten to use 1-pass algorithms that do not require prior knowledge of the size of the incoming stream. The current coverage quantile function is a 1-pass algorithm but requires prior knowledge of the size of the stream.

comment:7 Changed 5 years ago by dustymugs

Milestone: PostGIS 2.1.0PostGIS Future

Probably for PostGIS 2.2

comment:8 Changed 4 years ago by dustymugs

It looks like aggregate functions cannot return sets of values/records.

http://www.postgresql.org/message-id/CAFj8pRBX0z-qU6oor_R06vTq3a10vJ+k1zVkk0d69vctVpTe1A@mail.gmail.com

As such, aggregate versions of ST_Histogram, ST_Quantile and ST_ValueCount() cannot result in a format as presently returned for the standard forms. It looks like there needs to be a singular form of output, such as an array...

SELECT
  ST_ValueCountAgg(rast, 1, ARRAY[1,2,3]::double precision[]) AS pvc
FROM foo

pvc
----------------------------------------------
{{1,12,0.33333},{2,12,0.33333},{3,12,0.33333}}

The aggregate version of ST_SummaryStats() will return the same output of the normal function as it only returns one record.

comment:9 Changed 4 years ago by robe

while we are talking about annoyances of the coverage ones. They don't accept table schema. I agree they would be much more useful as aggs though.

comment:10 Changed 4 years ago by pracine

I guess the person who answered your question in the mailing list was speaking about set of rows when you mean set of columns. It is true that aggregates can't return set of rows. I don't know if they can return a set of column. Maybe by defining the final function properly.

My prototype of ST_SummaryStatAgg() in the Add-ons DO return a set of column by the mean of a type. But I guess we don't want to add new type like we used to.

comment:11 Changed 4 years ago by dustymugs

No, Pavel understood my question correctly. I was wondering about sets of rows returning from an aggregate function. A single row can comprise any number of columns though.

comment:12 in reply to:  8 Changed 4 years ago by pracine

Replying to dustymugs:

It looks like aggregate functions cannot return sets of values/records.

http://www.postgresql.org/message-id/CAFj8pRBX0z-qU6oor_R06vTq3a10vJ+k1zVkk0d69vctVpTe1A@mail.gmail.com

As such, aggregate versions of ST_Histogram, ST_Quantile and ST_ValueCount() cannot result in a format as presently returned for the standard forms. It looks like there needs to be a singular form of output, such as an array...

SELECT
  ST_ValueCountAgg(rast, 1, ARRAY[1,2,3]::double precision[]) AS pvc
FROM foo

pvc
----------------------------------------------
{{1,12,0.33333},{2,12,0.33333},{3,12,0.33333}}

So I guess you would like to make a small set returning function taking this output and splitting it into multiple rows/columns...

comment:13 Changed 4 years ago by dustymugs

I don't know about that yet. There's nothing really clean/elegant...

comment:14 Changed 4 years ago by dustymugs

Owner: changed from pracine to dustymugs
Status: newassigned

Another item worth noting. It appears as though the following code and query fails...

CREATE OR REPLACE FUNCTION _st_summarystats_finalfn(
	internal,
	OUT count bigint,
	OUT sum double precision,
	OUT mean double precision,
	OUT stddev double precision,
	OUT min double precision,
	OUT max double precision
)
	AS 'MODULE_PATHNAME', 'RASTER_summaryStats_finalfn'
	LANGUAGE 'c' IMMUTABLE;

CREATE OR REPLACE FUNCTION _st_summarystats_transfn(
	internal,
	raster, integer,
	boolean, double precision
)
	RETURNS internal
	AS 'MODULE_PATHNAME', 'RASTER_summaryStats_transfn'
	LANGUAGE 'c' IMMUTABLE;

CREATE AGGREGATE st_summarystatsagg(raster, integer, boolean, double precision) (
	SFUNC = _st_summarystats_transfn,
	STYPE = internal,
	FINALFUNC = _st_summarystats_finalfn
);
SELECT
	(stats).count,
	round((stats).sum::numeric, 3),
	round((stats).mean::numeric, 3),
	round((stats).stddev::numeric, 3),
	round((stats).min::numeric, 3),
	round((stats).max::numeric, 3)
FROM (
SELECT
		ST_SummaryStatsAgg(rast, 1, TRUE) AS stats
FROM test_summarystats
) foo;

ERROR:  record type has not been registered

The only workaround I've found is to create a type.

CREATE TYPE summarystats AS (
	count bigint,
	sum double precision,
	mean double precision,
	stddev double precision,
	min double precision,
	max double precision
);

CREATE OR REPLACE FUNCTION _st_summarystats_finalfn(internal)
	RETURNS summarystats
	AS 'MODULE_PATHNAME', 'RASTER_summaryStats_finalfn'
	LANGUAGE 'c' IMMUTABLE;

With the type...

SELECT
	(stats).count,
	round((stats).sum::numeric, 3),
	round((stats).mean::numeric, 3),
	round((stats).stddev::numeric, 3),
	round((stats).min::numeric, 3),
	round((stats).max::numeric, 3)
FROM (
SELECT
		ST_SummaryStatsAgg(rast, 1, TRUE) AS stats
FROM test_summarystats
) foo;


 count |  round  | round  | round |  round  | round 
-------+---------+--------+-------+---------+-------
    20 | -68.584 | -3.429 | 6.571 | -10.000 | 3.142
(1 row)

comment:15 Changed 4 years ago by dustymugs

Breakdown this ticket into component tickets:

ST_SummaryStatsAgg(): Ticket #2565

ST_ValueCountAgg(): Ticket #2566

Others will be added as time goes on.

comment:16 Changed 4 years ago by dustymugs

ST_CountAgg(): Ticket #2567

comment:17 Changed 2 months ago by robe

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.