Opened 5 months ago
Last modified 4 months ago
#5807 new defect
Joined table ST_Transform runs on each filter evaluation instead of just once per input row
Reported by: | lnicola | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | |
Component: | postgis | Version: | 3.5.x |
Keywords: | Cc: | lnicola |
Description (last modified by ) ¶
During the following query, ST_Transform
runs 1817760 times:
with tiles as ( select tile_id, ST_Transform(geom, 2157) as geom from shape_tiles_s2 where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV') ) select count(*) from polygons, tiles where ST_Intersects(polygons.wkb_geometry, tiles.geom);
If you switch to a materialized CTE, it only runs 12 times:
with tiles as materialized ( select tile_id, ST_Transform(geom, 2157) as geom from shape_tiles_s2 where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV') ) select count(*) from polygons, tiles where ST_Intersects(polygons.wkb_geometry, tiles.geom);
Don't read too much into the CTE. I checked with robe, and pretty much every other way to write it (with a join, with a subquery, with a lateral join) is even slower than the two CTE version.
To test, use:
set track_functions = 'all'; select * from pg_stat_user_functions; select pg_stat_reset();
You can download the polygons
dataset from https://data.europa.eu/data/datasets/d18f6f95-1d96-4b86-a4ba-291a416e45bb?locale=en, it's the one called "Anonymous LPIS 2023 Parcels", import using e.g.:
ogr2ogr -nln polygons -nlt geometry "PG:host=127.0.0.1 user=postgres dbname=postgres" GEOSERVICESHELP-213-PARCELS-ANON.shp
The shape_tiles_s2
table is attached. It's probably derived from S2A_OPER_GIP_TILPAR_MPC__20151209T095117_V20150622T000000_21000101T000000_B00.zip
, which you can find on https://sentiwiki.copernicus.eu/web/s2-products.
Tested on
POSTGIS="3.5.0 d2c3ca4" [EXTENSION] PGSQL="170" GEOS="3.13.0-CAPI-1.19.0" PROJ="9.5.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY
and
POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="160" GEOS="3.12.2-CAPI-1.18.2" PROJ="9.4.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/pgsql/.local/share/proj DATABASE_PATH=/usr/proj94/share/proj/proj.db" LIBXML="2.9.13" LIBJSON="0.14" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"
Change History (9)
by , 5 months ago
Attachment: | shape_tiles_s2.zip added |
---|
comment:1 by , 5 months ago
Description: | modified (diff) |
---|
comment:2 by , 5 months ago
Description: | modified (diff) |
---|
comment:3 by , 5 months ago
Description: | modified (diff) |
---|
comment:4 by , 5 months ago
comment:5 by , 5 months ago
Summary: | Uncorrelated ST_Transform runs on each filter evaluation → Joined table ST_Transform runs on each filter evaluation instead of just once per input row |
---|
comment:6 by , 4 months ago
I forgot to add that if you use && it does it with only 12 transform calls, no need for cte materialize or offset
set track_functions = 'all'; -- doing this to prevent parallelism from kicking in set max_parallel_workers_per_gather=0; select pg_stat_reset(); explain (analyze, buffers, summary) select count(*) from polygons, ( select tile_id, ST_Transform(geom, 2157) as geom from shape_tiles_s2 where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV') ) AS tiles where polygons.wkb_geometry && tiles.geom;
explain (analyze, buffers, summary) select count(*) from polygons, ( select tile_id, ST_Transform(geom, 2157) as geom from shape_tiles_s2 where tile_id in ('29UMA', '29UMT', '29 UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')) AS tiles where polygons.wkb_geometry && tiles.geom; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=7354.84..7354.85 rows=1 width=8) (actual time=9804.104..9804.151 rows=1 loops=1) Buffers: shared hit=571244 read=748046 -> Gather (cost=7354.73..7354.84 rows=1 width=8) (actual time=9804.092..9804.145 rows=2 loops=1) Workers Planned: 1 Workers Launched: 1 Buffers: shared hit=571244 read=748046 -> Partial Aggregate (cost=6354.73..6354.74 rows=1 width=8) (actual time=4907.346..4907.347 rows=1 loops=2) Buffers: shared hit=571244 read=748046 -> Nested Loop (cost=125.32..6330.59 rows=9658 width=0) (actual time=87.055..4798.034 rows=793815 loops=2) Buffers: shared hit=571244 read=748046 -> Parallel Seq Scan on shape_tiles_s2 (cost=0.03..1591.20 rows=7 width=120) (actual time=11.077..11.233 rows=6 loops=2) Filter: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[])) Rows Removed by Filter: 28337 Buffers: shared hit=1 read=1090 -> Index Scan using ix_polygons_geom on polygons (cost=125.29..675.69 rows=137 width=498) (actual time=0.381..761.839 rows=132302 loops=12) Index Cond: (wkb_geometry && st_transform(shape_tiles_s2.geom, 2157)) Buffers: shared hit=571114 read=746924 Planning: Buffers: shared hit=307 read=41 Planning Time: 4.872 ms Execution Time: 9804.564 ms (21 rows)
select funcname, calls, total_time from pg_stat_user_functions WHERE funcname IN('st_intersects', 'st_transform');
funcname | calls | total_time ---------------+-------+------------ st_transform | 12 | 150.355 st_intersects | 0 | 0 (2 rows)
by , 4 months ago
Attachment: | ticket_transform_issue.sql added |
---|
comment:7 by , 4 months ago
Just to confirm it's not something special about this dataset, I created a dummy example attached:
\i ticket_transform_issue.sql
outputs
funcname | calls | total_time ---------------+-------+------------ st_intersects | 5583 | 8.401 st_transform | 5586 | 3.268 (2 rows)
I concur I get same results on:
I tried increasing cost of ST_Transform to 50000 and did not change the plan.
It's nice to see though that the offset 0 hack of olden times still works.
e.g. the offset 0 ones do the right thing only executing ST_Transform: 12 times, ST_Intersects: 1587629
though the plans look pretty much the same to me so hard to tell you are loosing 4 seconds with the redundant ST_Transform
Similar success with:
vs. bad result
What annoys me is the plans all look pretty much the same and yet the materialized and offset 0 ones are clearly winners because they shave off 4 seconds not running ST_Transform all those extra times.