Opened 5 months ago

Last modified 4 months ago

#5807 new defect

Joined table ST_Transform runs on each filter evaluation instead of just once per input row

Reported by: lnicola Owned by: pramsey
Priority: medium Milestone:
Component: postgis Version: 3.5.x
Keywords: Cc: lnicola

Description (last modified by lnicola)

During the following query, ST_Transform runs 1817760 times:

with tiles as (                                              
    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
)
select count(*)
from polygons, tiles
where ST_Intersects(polygons.wkb_geometry, tiles.geom);

If you switch to a materialized CTE, it only runs 12 times:

with tiles as materialized (                                 
    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
)
select count(*)
from polygons, tiles
where ST_Intersects(polygons.wkb_geometry, tiles.geom);

Don't read too much into the CTE. I checked with robe, and pretty much every other way to write it (with a join, with a subquery, with a lateral join) is even slower than the two CTE version.

To test, use:

set track_functions = 'all';
select * from pg_stat_user_functions; select pg_stat_reset();

You can download the polygons dataset from https://data.europa.eu/data/datasets/d18f6f95-1d96-4b86-a4ba-291a416e45bb?locale=en, it's the one called "Anonymous LPIS 2023 Parcels", import using e.g.:

ogr2ogr -nln polygons -nlt geometry "PG:host=127.0.0.1 user=postgres dbname=postgres" GEOSERVICESHELP-213-PARCELS-ANON.shp

The shape_tiles_s2 table is attached. It's probably derived from S2A_OPER_GIP_TILPAR_MPC__20151209T095117_V20150622T000000_21000101T000000_B00.zip, which you can find on https://sentiwiki.copernicus.eu/web/s2-products.

Tested on

POSTGIS="3.5.0 d2c3ca4" [EXTENSION] PGSQL="170" GEOS="3.13.0-CAPI-1.19.0" PROJ="9.5.0 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/postgresql/.local/share/proj DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.13.0) LIBXML="2.9.14" LIBJSON="0.16" LIBPROTOBUF="1.4.1" WAGYU="0.5.0 (Internal)" TOPOLOGY

and

POSTGIS="3.4.2 c19ce56" [EXTENSION] PGSQL="160" GEOS="3.12.2-CAPI-1.18.2" PROJ="9.4.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/var/lib/pgsql/.local/share/proj DATABASE_PATH=/usr/proj94/share/proj/proj.db" LIBXML="2.9.13" LIBJSON="0.14" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)"

Change History (9)

by lnicola, 5 months ago

Attachment: shape_tiles_s2.zip added

comment:1 by lnicola, 5 months ago

Description: modified (diff)

comment:2 by lnicola, 5 months ago

Description: modified (diff)

comment:3 by lnicola, 5 months ago

Description: modified (diff)

comment:4 by robe, 5 months ago

I concur I get same results on:

POSTGIS="3.5.0 3.5.0" [EXTENSION] PGSQL="170" GEOS="3.13.0-CAPI-1.19.0" SFCGAL="SFCGAL 1.5.2, CGAL 5.6.1, BOOST 1.84.0" PROJ="8.2.1 NETWORK_ENABLED=OFF URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=C:\Users\Administrator\AppData\Local/proj DATABASE_PATH=C:\Program Files\PostgreSQL\16\share\contrib\postgis-3.5\proj\proj.db" (compiled against PROJ 8.13.0) GDAL="GDAL 3.9.2, released 2024/08/13" LIBXML="2.12.5" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" TOPOLOGY RASTERPostgreSQL 17.0 on x86_64-windows, compiled by msvc-19.41.34120, 64-bit

I tried increasing cost of ST_Transform to 50000 and did not change the plan.

It's nice to see though that the offset 0 hack of olden times still works.

e.g. the offset 0 ones do the right thing only executing ST_Transform: 12 times, ST_Intersects: 1587629

though the plans look pretty much the same to me so hard to tell you are loosing 4 seconds with the redundant ST_Transform

with tiles as (                                              
    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
offset 0
)
select count(*)
from polygons, tiles
where ST_Intersects(polygons.wkb_geometry, tiles.geom);



explain (analyze, buffers, summary, settings) with tiles as (                                                  select tile_id, ST_Transform(geom, 2157) as geom    from shape_tiles_s2      where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT',
'29UPU', '29UPV') offset 0) select count(*)  from polygons, tiles  where ST_Intersects(polygons.wkb_geometry, tiles.geom);
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=28771.51..28771.52 rows=1 width=8) (actual time=15574.398..15574.399 rows=1 loops=1)
   Buffers: shared hit=571995 read=748300
   ->  Nested Loop  (cost=0.58..28730.47 rows=16418 width=0) (actual time=151.646..15301.244 rows=1546576 loops=1)
         Buffers: shared hit=571995 read=748300
         ->  Index Scan using shape_tiles_s2_pkey on shape_tiles_s2  (cost=0.29..1555.69 rows=12 width=56) (actual time=149.562..149.855 rows=12 loops=1)
               Index Cond: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[]))
               Buffers: shared hit=70 read=26
         ->  Index Scan using ix_polygons_geom on polygons  (cost=0.29..2263.18 rows=137 width=498) (actual time=0.712..1239.095 rows=128881 loops=12)
               Index Cond: (wkb_geometry && (st_transform(shape_tiles_s2.geom, 2157)))
               Filter: st_intersects(wkb_geometry, (st_transform(shape_tiles_s2.geom, 2157)))
               Rows Removed by Filter: 3421
               Buffers: shared hit=571925 read=748274
 Planning:
   Buffers: shared hit=212 read=87
 Planning Time: 7.071 ms
 Execution Time: 15575.050 ms
(17 rows)

Similar success with:

explain (analyze, buffers, summary)  select count(*)
from polygons, (    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
offset 0) AS tiles
where ST_Intersects(polygons.wkb_geometry, tiles.geom);

                                                                     QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=28771.51..28771.52 rows=1 width=8) (actual time=15084.127..15084.129 rows=1 loops=1)
   Buffers: shared hit=571964 read=748192
   ->  Nested Loop  (cost=0.58..28730.47 rows=16418 width=0) (actual time=0.954..14765.809 rows=1546576 loops=1)
         Buffers: shared hit=571964 read=748192
         ->  Index Scan using shape_tiles_s2_pkey on shape_tiles_s2  (cost=0.29..1555.69 rows=12 width=56) (actual time=0.158..0.433 rows=12 loops=1)
               Index Cond: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[]))
               Buffers: shared read=4
         ->  Index Scan using ix_polygons_geom on polygons  (cost=0.29..2263.18 rows=137 width=498) (actual time=0.600..1206.879 rows=128881 loops=12)
               Index Cond: (wkb_geometry && (st_transform(shape_tiles_s2.geom, 2157)))
               Filter: st_intersects(wkb_geometry, (st_transform(shape_tiles_s2.geom, 2157)))
               Rows Removed by Filter: 3421
               Buffers: shared hit=571964 read=748188
 Planning Time: 0.478 ms
 Execution Time: 15084.266 ms
(14 rows)


 select funcname, calls, total_time  from pg_stat_user_functions WHERE funcname IN('st_intersects', 'st_transform');

  funcname    |  calls  | total_time
---------------+---------+------------
 st_intersects | 1587629 |   4775.398
 st_transform  |      12 |      0.254
(2 rows)



vs. bad result

SELECT pg_stat_reset();
explain (analyze, buffers, summary) select count(*)
from polygons, (    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
) AS tiles
where ST_Intersects(polygons.wkb_geometry, tiles.geom);
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=234271.40..234271.41 rows=1 width=8) (actual time=27241.602..27241.603 rows=1 loops=1)
   Buffers: shared hit=571259 read=748897
   ->  Nested Loop  (cost=125.58..234230.35 rows=16418 width=0) (actual time=0.957..26859.793 rows=1546576 loops=1)
         Buffers: shared hit=571259 read=748897
         ->  Index Scan using shape_tiles_s2_pkey on shape_tiles_s2  (cost=0.29..55.69 rows=12 width=120) (actual time=0.138..0.166 rows=12 loops=1)
               Index Cond: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[]))
               Buffers: shared read=4
         ->  Index Scan using ix_polygons_geom on polygons  (cost=125.29..19513.18 rows=137 width=498) (actual time=0.656..2213.554 rows=128881 loops=12)
               Index Cond: (wkb_geometry && st_transform(shape_tiles_s2.geom, 2157))
               Filter: st_intersects(st_transform(shape_tiles_s2.geom, 2157), wkb_geometry)
               Rows Removed by Filter: 3421
               Buffers: shared hit=571259 read=748893
 Planning Time: 0.266 ms
 Execution Time: 27241.830 ms


 select funcname, calls, total_time  from pg_stat_user_functions WHERE funcname IN('st_intersects', 'st_transform');
   funcname    |  calls  | total_time
---------------+---------+------------
 st_intersects | 1587629 |   4800.759
 st_transform  | 1587641 |  10899.837
(2 rows)
explain (analyze, buffers, summary) with tiles as materialized (                                     select tile_id, ST_Transform(geom, 21
57) as geom     from shape_tiles_s2     where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '
29UPV'))select count(*) from polygons, tiles where ST_Intersects(polygons.wkb_geometry, tiles.geom);
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=28771.63..28771.64 rows=1 width=8) (actual time=15057.196..15057.198 rows=1 loops=1)
   Buffers: shared hit=571903 read=748253
   CTE tiles
     ->  Index Scan using shape_tiles_s2_pkey on shape_tiles_s2  (cost=0.29..1555.69 rows=12 width=38) (actual time=0.159..0.421 rows=12 loops=1)
           Index Cond: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[]))
           Buffers: shared read=4
   ->  Nested Loop  (cost=0.29..27174.90 rows=16418 width=0) (actual time=0.975..14775.543 rows=1546576 loops=1)
         Buffers: shared hit=571903 read=748253
         ->  CTE Scan on tiles  (cost=0.00..0.24 rows=12 width=32) (actual time=0.165..0.475 rows=12 loops=1)
               Buffers: shared read=4
         ->  Index Scan using ix_polygons_geom on polygons  (cost=0.29..2263.18 rows=137 width=498) (actual time=0.555..1208.768 rows=128881 loops=12)
               Index Cond: (wkb_geometry && tiles.geom)
               Filter: st_intersects(wkb_geometry, tiles.geom)
               Rows Removed by Filter: 3421
               Buffers: shared hit=571903 read=748249
 Planning Time: 0.463 ms
 Execution Time: 15057.368 ms
(17 rows)

 select funcname, calls, total_time  from pg_stat_user_functions WHERE funcname IN('st_intersects', 'st_transform');

  funcname    |  calls  | total_time
---------------+---------+------------
 st_intersects | 1587629 |   4775.398
 st_transform  |      12 |      0.254
(2 rows)

What annoys me is the plans all look pretty much the same and yet the materialized and offset 0 ones are clearly winners because they shave off 4 seconds not running ST_Transform all those extra times.

comment:5 by lnicola, 5 months ago

Summary: Uncorrelated ST_Transform runs on each filter evaluationJoined table ST_Transform runs on each filter evaluation instead of just once per input row

comment:6 by robe, 4 months ago

I forgot to add that if you use && it does it with only 12 transform calls, no need for cte materialize or offset

set track_functions = 'all';
-- doing this to prevent parallelism from kicking in
set max_parallel_workers_per_gather=0; 
select pg_stat_reset();
explain (analyze, buffers, summary)  select count(*)
from polygons, (    select tile_id, ST_Transform(geom, 2157) as geom
    from shape_tiles_s2
    where tile_id in ('29UMA', '29UMT', '29UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')
) AS tiles
where polygons.wkb_geometry && tiles.geom;
 explain (analyze, buffers, summary)  select count(*) from polygons, (    select tile_id, ST_Transform(geom, 2157) as geom    from shape_tiles_s2    where tile_id in ('29UMA', '29UMT', '29
UMU', '29UMV', '29UNA', '29UNT', '29UNU', '29UNV', '29UPA', '29UPT', '29UPU', '29UPV')) AS tiles where polygons.wkb_geometry && tiles.geom;
                                                                            QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=7354.84..7354.85 rows=1 width=8) (actual time=9804.104..9804.151 rows=1 loops=1)
   Buffers: shared hit=571244 read=748046
   ->  Gather  (cost=7354.73..7354.84 rows=1 width=8) (actual time=9804.092..9804.145 rows=2 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         Buffers: shared hit=571244 read=748046
         ->  Partial Aggregate  (cost=6354.73..6354.74 rows=1 width=8) (actual time=4907.346..4907.347 rows=1 loops=2)
               Buffers: shared hit=571244 read=748046
               ->  Nested Loop  (cost=125.32..6330.59 rows=9658 width=0) (actual time=87.055..4798.034 rows=793815 loops=2)
                     Buffers: shared hit=571244 read=748046
                     ->  Parallel Seq Scan on shape_tiles_s2  (cost=0.03..1591.20 rows=7 width=120) (actual time=11.077..11.233 rows=6 loops=2)
                           Filter: (tile_id = ANY ('{29UMA,29UMT,29UMU,29UMV,29UNA,29UNT,29UNU,29UNV,29UPA,29UPT,29UPU,29UPV}'::bpchar[]))
                           Rows Removed by Filter: 28337
                           Buffers: shared hit=1 read=1090
                     ->  Index Scan using ix_polygons_geom on polygons  (cost=125.29..675.69 rows=137 width=498) (actual time=0.381..761.839 rows=132302 loops=12)
                           Index Cond: (wkb_geometry && st_transform(shape_tiles_s2.geom, 2157))
                           Buffers: shared hit=571114 read=746924
 Planning:
   Buffers: shared hit=307 read=41
 Planning Time: 4.872 ms
 Execution Time: 9804.564 ms
(21 rows)

 select funcname, calls, total_time  from pg_stat_user_functions WHERE funcname IN('st_intersects', 'st_transform'); 

  funcname    | calls | total_time
---------------+-------+------------
 st_transform  |    12 |    150.355
 st_intersects |     0 |          0
(2 rows)
Version 0, edited 4 months ago by robe (next)

by robe, 4 months ago

Attachment: ticket_transform_issue.sql added

comment:7 by robe, 4 months ago

Just to confirm it's not something special about this dataset, I created a dummy example attached:

\i ticket_transform_issue.sql

outputs


   funcname    | calls | total_time
---------------+-------+------------
 st_intersects |  5583 |      8.401
 st_transform  |  5586 |      3.268
(2 rows)

Last edited 4 months ago by robe (previous) (diff)
Note: See TracTickets for help on using tickets.