Opened 8 years ago
Closed 7 years ago
#3751 closed enhancement (fixed)
Parallel support for aggregates with transitions for raster
Reported by: | robe | Owned by: | pramsey |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.5.0 |
Component: | postgis | Version: | 2.3.x |
Keywords: | Cc: | pramsey |
Description
I didn't think aggregate functions that use transitions (take internal as input) are parallelizable.
Turns out that even in PostgreSQL 9.6 it is possible to do so as long as you define a serialize and deserialize function for them. These are new args to CREATE AGGREGATE in 9.6
https://www.postgresql.org/docs/9.6/static/sql-createaggregate.html
CREATE AGGREGATE name ( [ argmode ] [ argname ] arg_data_type [ , ... ] ) ( SFUNC = sfunc, STYPE = state_data_type [ , SSPACE = state_data_size ] [ , FINALFUNC = ffunc ] [ , FINALFUNC_EXTRA ] [ , COMBINEFUNC = combinefunc ] [ , SERIALFUNC = serialfunc ] [ , DESERIALFUNC = deserialfunc ] [ , INITCOND = initial_condition ] [ , MSFUNC = msfunc ] [ , MINVFUNC = minvfunc ] [ , MSTYPE = mstate_data_type ] [ , MSSPACE = mstate_data_size ] [ , MFINALFUNC = mffunc ] [ , MFINALFUNC_EXTRA ] [ , MINITCOND = minitial_condition ] [ , SORTOP = sort_operator ] [ , PARALLEL = { SAFE | RESTRICTED | UNSAFE } ] )
From the docs
serialfunc
An aggregate function whose state_data_type is internal can participate in parallel aggregation only if it has a serialfunc function, which must serialize the aggregate state into a bytea value for transmission to another process. This function must take a single argument of type internal and return type bytea. A corresponding deserialfunc is also required.
deserialfunc
Deserialize a previously serialized aggregate state back into state_data_type. This function must take two arguments of types bytea and internal, and produce a result of type internal. (Note: the second, internal argument is unused, but is required for type safety reasons.)
It would be really nice if we can do this because then we have the possiblity of making really costly aggregates like ST_Union and the various raster aggregates parallelizable.
Change History (7)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Cc: | added |
---|
I was hoping this was an already tested bit, but it seems I was mistaken, array_agg and string_agg do not use the new serialfunc, deserialfunc. Most cases I found in postgresql source code are a bit trivial-er. I asked on mailing list why this is so, and seems they might not have gotten to it.
Last comment from Andres Freund:
https://www.postgresql.org/message-id/20170606192609.lgwzs2pcr4ng7et5%40alap3.anarazel.de
> > It seems CREATE AGGREGATE was expanded in 9.6 to support parallelization of > > aggregate functions using transitions, with the addition of serialfunc and > > deserialfunc to the aggregate definitions. > > > > https://www.postgresql.org/docs/10/static/sql-createaggregate.html > > > > I was looking at the PostgreSQL 10 source code for some example usages of > > this and was hoping that array_agg and string_agg would support the feature. > > At a cursory glance, it seems they do not use this. > > Examples I see that do support it are the average and standard deviation > > functions. > > > > Is there a reason for this or it just wasn't gotten to? I'd suggest trying to write a parallel version of them ;). Shouldn't be too hard.
@pramsey have you had a chance to look at this yet?
comment:3 by , 7 years ago
I've marked all raster functions as parallel safe at r15536. I'll finish off with geometry and then if we get to adding the new logic to REALLY make it allow it to be parallelized that can come later.
comment:5 by , 7 years ago
Summary: | Parallel support for aggregates with transitions → Parallel support for aggregates with transitions for raster |
---|
ST_CountAgg ST_SummaryStatsAgg
comment:6 by , 7 years ago
Milestone: | PostGIS 2.4.0 → PostGIS 2.5.0 |
---|
I'm willing to forgo this one. Push back to 2.4 if you can actually get to it.
comment:7 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I'm closing this ticket in favour of the more action-oriented #3836, which is targeted to 2.5. Your marking of everything as parallel safe has accomplished the most important aspect of the work for this release.
This ticket reminds me of #3650 - there was a partially committed patch from https://github.com/postgis/postgis/pull/117
parallel=safe there doesn't mean it has to be parallel inside the aggregate - say, a GROUP BY can in theory calculate different bucket's extent in different backend in non-parallel fashion. that option doesn't exist now, since the aggregates are marked parallel unusafe.