Opened 20 months ago
Closed 19 months ago
#5371 closed defect (fixed)
Parallel ST_Union segfaults
Reported by: | ewie | Owned by: | pramsey |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 3.3.3 |
Component: | postgis | Version: | 3.3.x |
Keywords: | Cc: |
Description
Parallel ST_Union
segfaults with PostGIS 3.3.2 on Postgres 11.19 and 12.14 but not on 13.10.
The query to trigger the segfault (see below for setup):
BEGIN; SET LOCAL max_parallel_workers_per_gather = 2; --EXPLAIN SELECT r.name, st_union(a.geom) FROM area a JOIN region r ON r.id = a.region GROUP BY r.name; ROLLBACK;
Produces this query plan on 11.19, 12.14, and 13.10:
QUERY PLAN ------------------------------------------------------------------------------------------------------ Finalize GroupAggregate (cost=5198.72..5255.39 rows=200 width=35) Group Key: r.name -> Gather Merge (cost=5198.72..5245.39 rows=400 width=35) Workers Planned: 2 -> Sort (cost=4198.70..4199.20 rows=200 width=35) Sort Key: r.name -> Partial HashAggregate (cost=4188.55..4191.05 rows=200 width=35) Group Key: r.name -> Hash Join (cost=6.50..3874.92 rows=62727 width=123) Hash Cond: (a.region = r.id) -> Parallel Seq Scan on area a (cost=0.00..3700.27 rows=62727 width=128) -> Hash (cost=4.00..4.00 rows=200 width=11) -> Seq Scan on region r (cost=0.00..4.00 rows=200 width=11) (13 rows)
The query does not segfault when disabling parallel queries with max_parallel_workers_per_gather = 0
.
Setup of schema and data (increase num_areas
and num_regions
if the query does not generate a parallel query plan):
BEGIN; -- May need more rows to trigger a parallel query. Increase as necessary. \set num_areas 150000 \set num_regions 200 CREATE EXTENSION IF NOT EXISTS postgis; DROP TABLE IF EXISTS area; DROP TABLE IF EXISTS region; CREATE TABLE region ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, name text NOT NULL ); CREATE TABLE area ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, geom geometry NOT NULL, region bigint NOT NULL, FOREIGN KEY (region) REFERENCES region (id) ); INSERT INTO region (name) SELECT s::text FROM generate_series(1, :num_regions) s; INSERT INTO area (geom, region) SELECT grid.geom, ((grid.i * grid.j) % :num_regions) + 1 FROM st_squaregrid( 1 / sqrt(:num_areas), st_geomfromtext('polygon ((0 0, 0 1, 1 1, 1 0, 0 0))') ) grid; ANALYZE area, region; COMMIT;
Backtrace on 11.19:
Program received signal SIGSEGV, Segmentation fault. pfree (pointer=0x560e05e96408) at utils/mmgr/mcxt.c:1035 1035 context->methods->free_p(context, pointer); #0 pfree (pointer=0x560e05e96408) at utils/mmgr/mcxt.c:1035 #1 list_free_private (deep=false, list=0x560e05f53dd8) at nodes/list.c:1117 #2 list_free (list=0x560e05f53dd8) at nodes/list.c:1135 #3 0x00007f23d20e4045 in state_combine (state2=0x560e04d56c58, state1=0x560e03675f00) at /usr/src/debug/postgis33_11-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:254 #4 pgis_geometry_union_parallel_combinefn (fcinfo=<optimized out>) at /usr/src/debug/postgis33_11-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:107 #5 0x0000560e027703a6 in ExecInterpExpr (state=0x560e05e4e040, econtext=0x560e03740768, isnull=<optimized out>) at executor/execExprInterp.c:1667 #6 0x0000560e0278d93d in ExecEvalExprSwitchContext (isNull=0x7ffd4969af07, econtext=<optimized out>, state=<optimized out>) at executor/../../../src/include/executor/executor.h:320 #7 advance_aggregates (aggstate=0x560e03740540) at executor/nodeAgg.c:670 #8 agg_retrieve_direct (aggstate=0x560e03740540) at executor/nodeAgg.c:1838 #9 ExecAgg (pstate=0x560e03740540) at executor/nodeAgg.c:1561 #10 0x0000560e0277937a in ExecProcNode (node=0x560e03740540) at executor/../../../src/include/executor/executor.h:248 #11 ExecutePlan (execute_once=<optimized out>, dest=0x560e0461ec50, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x560e03740540, estate=0x560e03740300) at executor/execMain.c:1712 #12 standard_ExecutorRun (queryDesc=0x560e048feaf0, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:353 #13 0x0000560e028ecb15 in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x560e048feaf0) at executor/execMain.c:296 #14 PortalRunSelect (portal=portal@entry=0x560e036f4f00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x560e0461ec50) at tcop/pquery.c:941 #15 0x0000560e028ed41e in PortalRun (portal=portal@entry=0x560e036f4f00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x560e0461ec50, altdest=altdest@entry=0x560e0461ec50, completionTag=0x7ffd4969b170 "") at tcop/pquery.c:782 #16 0x0000560e028ed912 in exec_simple_query (query_string=0x560e0364ef90 "SELECT\n r.name,\n st_union(a.geom) AS geom\nFROM area a\nJOIN region r ON r.id = a.region\nGROUP BY r.name;") at tcop/postgres.c:1144 #17 0x0000560e028ef441 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:4229 #18 0x0000560e02873cd9 in BackendRun (port=0x560e036b7040) at postmaster/postmaster.c:4429 #19 BackendStartup (port=0x560e036b7040) at postmaster/postmaster.c:4093 #20 ServerLoop () at postmaster/postmaster.c:1728 #21 0x0000560e02874c3c in PostmasterMain (argc=<optimized out>, argv=0x560e036498c0) at postmaster/postmaster.c:1401 #22 0x0000560e025c1595 in main (argc=3, argv=0x560e036498c0) at main/main.c:228
Backtrace on 12.14:
Program received signal SIGSEGV, Segmentation fault. pfree (pointer=0x7f4c5faed7f8) at utils/mmgr/mcxt.c:1035 1035 context->methods->free_p(context, pointer); #0 pfree (pointer=0x7f4c5faed7f8) at utils/mmgr/mcxt.c:1035 #1 list_free_private (deep=false, list=0x55db196c55c8) at nodes/list.c:1120 #2 list_free (list=0x55db196c55c8) at nodes/list.c:1138 #3 0x00007f4c60a31fa5 in state_combine (state2=0x55db196e2438, state1=0x55db195c2b90) at /usr/src/debug/postgis33_12-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:254 #4 pgis_geometry_union_parallel_combinefn (fcinfo=<optimized out>) at /usr/src/debug/postgis33_12-3.3.2-1.f37.x86_64/postgis/lwgeom_union.c:107 #5 0x000055db1865a4dc in ExecInterpExpr (state=0x55db19614b80, econtext=0x55db195a1898, isnull=<optimized out>) at executor/execExprInterp.c:1650 #6 0x000055db186732dd in ExecEvalExprSwitchContext (isNull=0x7ffca8d295b7, econtext=<optimized out>, state=<optimized out>) at executor/../../../src/include/executor/executor.h:316 #7 advance_aggregates (aggstate=0x55db195a1670) at executor/nodeAgg.c:669 #8 agg_retrieve_direct (aggstate=0x55db195a1670) at executor/nodeAgg.c:1838 #9 ExecAgg (pstate=0x55db195a1670) at executor/nodeAgg.c:1563 #10 0x000055db1865d972 in ExecProcNode (node=0x55db195a1670) at executor/../../../src/include/executor/executor.h:242 #11 ExecutePlan (execute_once=<optimized out>, dest=0x55db195a7438, direction=<optimized out>, numberTuples=0, sendTuples=true, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55db195a1670, estate=0x55db195a13d0) at executor/execMain.c:1632 #12 standard_ExecutorRun (queryDesc=0x55db194f6c20, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:350 #13 0x000055db187e00d5 in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x55db194f6c20) at executor/execMain.c:294 #14 PortalRunSelect (portal=portal@entry=0x55db1953cc60, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x55db195a7438) at tcop/pquery.c:938 #15 0x000055db187e1b9e in PortalRun (portal=portal@entry=0x55db1953cc60, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55db195a7438, altdest=altdest@entry=0x55db195a7438, completionTag=0x7ffca8d29820 "") at tcop/pquery.c:779 #16 0x000055db187e209a in exec_simple_query (query_string=0x55db194cea90 "SELECT\n r.name,\n st_union(a.geom) AS geom\nFROM area a\nJOIN region r ON r.id = a.region\nGROUP BY r.name;") at tcop/postgres.c:1214 #17 0x000055db187e412d in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:4293 #18 0x000055db18768587 in BackendRun (port=0x55db194f3fb0) at postmaster/postmaster.c:4517 #19 BackendStartup (port=0x55db194f3fb0) at postmaster/postmaster.c:4200 #20 ServerLoop () at postmaster/postmaster.c:1725 #21 0x000055db187694b1 in PostmasterMain (argc=<optimized out>, argv=0x55db194898c0) at postmaster/postmaster.c:1398 #22 0x000055db1849cf79 in main (argc=3, argv=0x55db194898c0) at main/main.c:228
System:
Linux fedora 6.2.10-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 6 23:30:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Postgres 11:
-[ RECORD 1 ]--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- version | PostgreSQL 11.19 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="110" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"
Postgres 12:
-[ RECORD 1 ]--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- version | PostgreSQL 12.14 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="120" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"
Postgres 13:
-[ RECORD 1 ]--------+------------------------------------------------------------------------------------------------------------------------------------------------------------------- version | PostgreSQL 13.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), 64-bit postgis_full_version | POSTGIS="3.3.2 4975da8" [EXTENSION] PGSQL="130" GEOS="3.11.2-CAPI-1.17.2" PROJ="9.0.1" LIBXML="2.10.3" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)"
Change History (6)
comment:1 by , 19 months ago
comment:2 by , 19 months ago
Just an additional note in case pramsey is looking.
The crash is happening in the state_combine function, the
list_free(list2)
comment:3 by , 19 months ago
Priority: | medium → high |
---|
I was able to replicate the crash on my PG 11 running below.
postgis_full_version POSTGIS="3.3.2 3.3.2" [EXTENSION] PGSQL="110" GEOS="3.11.1-CAPI-1.17.1" PROJ="7.2.1" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" | PostgreSQL 11.17, compiled by Visual C++ build 1914, 64-bit
P13, 15, and 16 appeared to be fine on the same windows server and version of PostGIS
So appears to be something not specific to OS, but possibly linked to PostgreSQL major version. The difference might be specific to a major version in that maybe certain exceptions are better handled in newer PostgreSQL versions
comment:4 by , 19 months ago
It seems simply removing the line
list_free(list2);
Fixes the issue in PG11. I haven't retested in higher versions yet.
Thanks for the nice succinct example. I'm still testing this out on my systems. Will take me a bit to load up PG12 and PG11 but next on my list to try.
So far with PG16 and PG15 haven't been able to replicate, which is expected given you said it's only with 11 and 12 you have the issue.