#5371 closed defect (fixed)

Parallel ST_Union segfaults

Reported by: ewie Owned by: pramsey
Priority: high Milestone: PostGIS 3.3.3
Component: postgis Version: 3.3.x
Keywords: Cc:

Description

Parallel ST_Union segfaults with PostGIS 3.3.2 on Postgres 11.19 and 12.14 but not on 13.10.

The query to trigger the segfault (see below for setup):

BEGIN;

SET LOCAL max_parallel_workers_per_gather = 2;

--EXPLAIN
SELECT r.name, st_union(a.geom)
FROM area a
JOIN region r ON r.id = a.region
GROUP BY r.name;

ROLLBACK;

Produces this query plan on 11.19, 12.14, and 13.10:

                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=5198.72..5255.39 rows=200 width=35)
   Group Key: r.name
   ->  Gather Merge  (cost=5198.72..5245.39 rows=400 width=35)
         Workers Planned: 2
         ->  Sort  (cost=4198.70..4199.20 rows=200 width=35)
               Sort Key: r.name
               ->  Partial HashAggregate  (cost=4188.55..4191.05 rows=200 width=35)
                     Group Key: r.name
                     ->  Hash Join  (cost=6.50..3874.92 rows=62727 width=123)
                           Hash Cond: (a.region = r.id)
                           ->  Parallel Seq Scan on area a  (cost=0.00..3700.27 rows=62727 width=128)
                           ->  Hash  (cost=4.00..4.00 rows=200 width=11)
                                 ->  Seq Scan on region r  (cost=0.00..4.00 rows=200 width=11)
(13 rows)

The query does not segfault when disabling parallel queries with max_parallel_workers_per_gather = 0.

Setup of schema and data (increase num_areas and num_regions if the query does not generate a parallel query plan):

BEGIN;

-- May need more rows to trigger a parallel query. Increase as necessary.
\set num_areas 150000
\set num_regions 200

CREATE EXTENSION IF NOT EXISTS postgis;

DROP TABLE IF EXISTS area;
DROP TABLE IF EXISTS region;

CREATE TABLE region (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  name text NOT NULL
);

CREATE TABLE area (
  id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  geom geometry NOT NULL,
  region bigint NOT NULL,
  FOREIGN KEY (region) REFERENCES region (id)
);

INSERT INTO region (name)
SELECT
  s::text
FROM
  generate_series(1, :num_regions) s;

INSERT INTO area (geom, region)
SELECT
  grid.geom,
  ((grid.i * grid.j) % :num_regions) + 1
FROM
  st_squaregrid(
    1 / sqrt(:num_areas),
    st_geomfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))')
  ) grid;

ANALYZE area, region;

COMMIT;

Backtrace on 11.19:

Program received signal SIGSEGV, Segmentation fault.
pfree (pointer=0x560e05e96408) at utils/mmgr/mcxt.c:1035
1035		context->methods->free_p(context, pointer);
#0  pfree (pointer=0x560e05e96408) at utils/mmgr/mcxt.c:1035
#1  list_free_private (deep=false, list=0x560e05f53dd8) at nodes/list.c:1117
#2  list_free (list=0x560e05f53dd8) at nodes/list.c:1135
#3  0x00007f23d20e4045 in state_combine (state2=0x560e04d56c58, state1=0x560e03675f00) at /usr/src/debug/postgis33_11-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:254
#4  pgis_geometry_union_parallel_combinefn (fcinfo=<optimized out>) at /usr/src/debug/postgis33_11-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:107
#5  0x0000560e027703a6 in ExecInterpExpr (state=0x560e05e4e040, econtext=0x560e03740768, isnull=<optimized out>) at executor/execExprInterp.c:1667
#6  0x0000560e0278d93d in ExecEvalExprSwitchContext (isNull=0x7ffd4969af07, econtext=<optimized out>, state=<optimized out>) at executor/../../../src/include/executor/executor.h:320
#7  advance_aggregates (aggstate=0x560e03740540) at executor/nodeAgg.c:670
#8  agg_retrieve_direct (aggstate=0x560e03740540) at executor/nodeAgg.c:1838
#9  ExecAgg (pstate=0x560e03740540) at executor/nodeAgg.c:1561
#10 0x0000560e0277937a in ExecProcNode (node=0x560e03740540) at executor/../../../src/include/executor/executor.h:248
#11 ExecutePlan (execute_once=<optimized out>, dest=0x560e0461ec50, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x560e03740540, estate=0x560e03740300) at executor/execMain.c:1712
#12 standard_ExecutorRun (queryDesc=0x560e048feaf0, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:353
#13 0x0000560e028ecb15 in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x560e048feaf0) at executor/execMain.c:296
#14 PortalRunSelect (portal=portal@entry=0x560e036f4f00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x560e0461ec50) at tcop/pquery.c:941
#15 0x0000560e028ed41e in PortalRun (portal=portal@entry=0x560e036f4f00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x560e0461ec50, altdest=altdest@entry=0x560e0461ec50, completionTag=0x7ffd4969b170 "") at tcop/pquery.c:782
#16 0x0000560e028ed912 in exec_simple_query (query_string=0x560e0364ef90 "SELECT\n r.name,\n st_union(a.geom) AS geom\nFROM area a\nJOIN region r ON r.id = a.region\nGROUP BY r.name;") at tcop/postgres.c:1144
#17 0x0000560e028ef441 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:4229
#18 0x0000560e02873cd9 in BackendRun (port=0x560e036b7040) at postmaster/postmaster.c:4429
#19 BackendStartup (port=0x560e036b7040) at postmaster/postmaster.c:4093
#20 ServerLoop () at postmaster/postmaster.c:1728
#21 0x0000560e02874c3c in PostmasterMain (argc=<optimized out>, argv=0x560e036498c0) at postmaster/postmaster.c:1401
#22 0x0000560e025c1595 in main (argc=3, argv=0x560e036498c0) at main/main.c:228

Backtrace on 12.14:

Program received signal SIGSEGV, Segmentation fault.
pfree (pointer=0x7f4c5faed7f8) at utils/mmgr/mcxt.c:1035
1035		context->methods->free_p(context, pointer);
#0  pfree (pointer=0x7f4c5faed7f8) at utils/mmgr/mcxt.c:1035
#1  list_free_private (deep=false, list=0x55db196c55c8) at nodes/list.c:1120
#2  list_free (list=0x55db196c55c8) at nodes/list.c:1138
#3  0x00007f4c60a31fa5 in state_combine (state2=0x55db196e2438, state1=0x55db195c2b90) at /usr/src/debug/postgis33_12-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:254
#4  pgis_geometry_union_parallel_combinefn (fcinfo=<optimized out>) at /usr/src/debug/postgis33_12-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:107
#5  0x000055db1865a4dc in ExecInterpExpr (state=0x55db19614b80, econtext=0x55db195a1898, isnull=<optimized out>) at executor/execExprInterp.c:1650
#6  0x000055db186732dd in ExecEvalExprSwitchContext (isNull=0x7ffca8d295b7, econtext=<optimized out>, state=<optimized out>) at executor/../../../src/include/executor/executor.h:316
#7  advance_aggregates (aggstate=0x55db195a1670) at executor/nodeAgg.c:669
#8  agg_retrieve_direct (aggstate=0x55db195a1670) at executor/nodeAgg.c:1838
#9  ExecAgg (pstate=0x55db195a1670) at executor/nodeAgg.c:1563
#10 0x000055db1865d972 in ExecProcNode (node=0x55db195a1670) at executor/../../../src/include/executor/executor.h:242
#11 ExecutePlan (execute_once=<optimized out>, dest=0x55db195a7438, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55db195a1670, estate=0x55db195a13d0) at executor/execMain.c:1632
#12 standard_ExecutorRun (queryDesc=0x55db194f6c20, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:350
#13 0x000055db187e00d5 in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x55db194f6c20) at executor/execMain.c:294
#14 PortalRunSelect (portal=portal@entry=0x55db1953cc60, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55db195a7438) at tcop/pquery.c:938
#15 0x000055db187e1b9e in PortalRun (portal=portal@entry=0x55db1953cc60, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55db195a7438, altdest=altdest@entry=0x55db195a7438, completionTag=0x7ffca8d29820 "") at tcop/pquery.c:779
#16 0x000055db187e209a in exec_simple_query (query_string=0x55db194cea90 "SELECT\n r.name,\n st_union(a.geom) AS geom\nFROM area a\nJOIN region r ON r.id = a.region\nGROUP BY r.name;") at tcop/postgres.c:1214
#17 0x000055db187e412d in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:4293
#18 0x000055db18768587 in BackendRun (port=0x55db194f3fb0) at postmaster/postmaster.c:4517
#19 BackendStartup (port=0x55db194f3fb0) at postmaster/postmaster.c:4200
#20 ServerLoop () at postmaster/postmaster.c:1725
#21 0x000055db187694b1 in PostmasterMain (argc=<optimized out>, argv=0x55db194898c0) at postmaster/postmaster.c:1398
#22 0x000055db1849cf79 in main (argc=3, argv=0x55db194898c0) at main/main.c:228

System:

Linux fedora 6.2.10-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr  6 23:30:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Postgres 11:

-[ RECORD 1 ]--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
version              | PostgreSQL 11.19 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit
postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="110" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"

Postgres 12:

-[ RECORD 1 ]--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
version              | PostgreSQL 12.14 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit
postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="120" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"

Postgres 13:

-[ RECORD 1 ]--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
version              | PostgreSQL 13.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit
postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="130" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"

Change History (6)

comment:1 by robe, 12 months ago

Thanks for the nice succinct example. I'm still testing this out on my systems. Will take me a bit to load up PG12 and PG11 but next on my list to try.

So far with PG16 and PG15 haven't been able to replicate, which is expected given you said it's only with 11 and 12 you have the issue.

comment:2 by robe, 12 months ago

Just an additional note in case pramsey is looking.

The crash is happening in the state_combine function, the

list_free(list2)

comment:3 by robe, 12 months ago

Priority: mediumhigh

I was able to replicate the crash on my PG 11 running below.

                                                                    postgis_full_version                                                                       
 POSTGIS="3.3.2 3.3.2" [EXTENSION] PGSQL="110" GEOS="3.11.1-CAPI-1.17.1" PROJ="7.2.1" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" |
PostgreSQL 11.17, compiled by Visual C++ build 1914, 64-bit


P13, 15, and 16 appeared to be fine on the same windows server and version of PostGIS

So appears to be something not specific to OS, but possibly linked to PostgreSQL major version. The difference might be specific to a major version in that maybe certain exceptions are better handled in newer PostgreSQL versions

comment:4 by robe, 12 months ago

It seems simply removing the line

list_free(list2);

Fixes the issue in PG11. I haven't retested in higher versions yet.

comment:5 by Regina Obe <lr@…>, 12 months ago

In eee4aeb/git:

Fix crash on ST_Union
References #5371 for PostGIS 3.3.3

comment:6 by Regina Obe <lr@…>, 12 months ago

Resolution: fixed
Status: newclosed

In 1be17381/git:

Fix crash on ST_Union
Closes #5371 for PostGIS 3.4.0

Note: See TracTickets for help on using tickets.