Opened 5 weeks ago

Last modified 5 weeks ago

#997 new defect

ST_Union performance degradation from GEOS 3.6.x -> 3.7.x

Reported by: greenlaw Owned by: geos-devel@…
Priority: major Milestone:
Component: Default Version: 3.7.0
Severity: Significant Keywords: union
Cc:

Description

I posted about this earlier this week on the postgis-users mailing list.

I've encountered a severe performance issue with GEOS >=3.7.0 when calling ST_Union(ST_Buffer(...)) on some large polygons (EPSG:3857) with lots of vertices. I've been testing primarily on PostgreSQL 9.5 but have reproduced the issue with multiple versions of PostGIS (2.2 to 2.5).

This appears to be a separate issue from #867. That ticket references a recent JTS performance enhancement which was ported to GEOS and merged to master, but I tested with the latest version (which contains that fix), 3.8.0rc3, and it didn't make a difference (actually performance was worse).

With PostgreSQL 9.5, PostGIS 2.3.7, and GEOS 3.6.4, my query takes ~100 sec to run, but after upgrading GEOS to 3.7.0, the query duration balloons to ~70 minutes. With 3.8.0rc3, duration again increased (to ~147 minutes). Exact timings for different version combinations is pasted below.

I will attach a pgdump of the table (21 rows, 23mb uncompressed).

Steps to reproduce:

  1. Have PostgreSQL 9.5 instance with PostGIS 2.2 - 2.5 and GEOS >= 3.7.0
  1. Import the pgdump
psql -h hostname -p port -U postgres -d dbname -f pgdump_nws_haz_hang_201910.sql
  1. Execute the query
EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY category,vteccode,prod_type,validtime,starttime,endtime;

The query will likely take between 1 - 3 hours to complete on most systems.

Timings for specific version combinations are pasted below.

3.6.1 = ~85 sec:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.6.1-CAPI-1.10.1 r0" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=35922.055..85310.959 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.014..0.053 rows=21 loops=1)
 Planning time: 0.460 ms
 Execution time: 85314.516 ms

3.6.2 = ~90 sec:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=34092.861..90393.216 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.012..0.047 rows=21 loops=1)
 Planning time: 0.315 ms
 Execution time: 90393.369 ms

3.6.3 = ~89 sec:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.6.3-CAPI-1.10.3 80c13047" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=33833.006..88916.065 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.011..0.047 rows=21 loops=1)
 Planning time: 0.306 ms
 Execution time: 88916.182 ms

3.6.4 = ~100 sec:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.6.4-CAPI-1.10.4 ba90ca5" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=35376.572..99728.566 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.024..0.052 rows=21 loops=1)
 Planning time: 0.408 ms
 Execution time: 99728.803 ms

3.7.0 = ~70 min:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.7.0-CAPI-1.11.0 673b9939" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                        QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=30353.481..4204487.862 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.017..0.053 rows=21 loops=1)
 Planning time: 0.362 ms
 Execution time: 4204488.069 ms

3.7.1 = ~72 min:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.7.1-CAPI-1.11.1 27a5e771" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=31342.188..4323753.205 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.021..0.055 rows=21 loops=1)
 Planning time: 0.446 ms
 Execution time: 4323755.076 ms

3.7.2 = ~70 min:

POSTGIS="2.3.7 r16523" PGSQL="95" GEOS="3.7.2-CAPI-1.11.2 b55d2125" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 1.11.4, released 2016/01/25" LIBXML="2.9.1" LIBJSON="0.11" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=33643.593..4195304.854 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.019..0.052 rows=21 loops=1)
 Planning time: 0.374 ms
 Execution time: 4195305.021 ms

3.8.0rc3 = ~147 min:

POSTGIS="2.5.3 r17699" [EXTENSION] PGSQL="95" GEOS="3.8.0rc3-CAPI-1.13.1 " PROJ="Rel. 6.2.0, September 1st, 2019" GDAL="GDAL 3.0.1, released 2019/06/28" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" RASTER

EXPLAIN ANALYZE SELECT ST_Union(ST_Buffer(wkb_geometry, 100.0))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.30..10.45 rows=10 width=706) (actual time=42941.634..8834235.340 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.016..0.048 rows=21 loops=1)
 Planning time: 0.198 ms
 Execution time: 8834235.446 ms

Change History (12)

comment:1 Changed 5 weeks ago by greenlaw

Looks like my attachment is over the file size limit, so pgdump can be found here:

https://drive.google.com/file/d/1dmZigm0sJwE_QviUzXrWfkJBP9L3vypN

comment:2 Changed 5 weeks ago by pramsey

I'm not reproducing this locally. Using your data and my set up (3.8rc3, pg11, postgis3) I'm seeing times similar to your good runs on old GEOS on my wee laptop. You might want to ensure that your GEOS is built with -O3 flags and not in full debug (-O0 -g) mode. That can cause 5x performance changes. I don't know *what* could cause your 40x regression.

postgis30=# select st_npoints(st_union(st_buffer(wkb_geometry,100))) from nws_haz_hang_201910 ;
 st_npoints 
------------
    1112530
(1 row)

Time: 43946.265 ms (00:43.946)

comment:3 Changed 5 weeks ago by pramsey

Wait, I wasn't running your actual SQL. Your actual SQL takes a lot of time for me. Please test *my* SQL on your setup, just to confirm that a "normal" union runs in OK time. Your SQL does a bunch of grouping, and I need to examine it a little to see what it is actually doing, in terms of which groups are taking all the time, and what they look like.

Last edited 5 weeks ago by pramsey (previous) (diff)

comment:4 Changed 5 weeks ago by greenlaw

Sorry, was not clear about the query grouping. It's been a few years since I wrote this code so it's not exactly fresh in my mind, but I believe what I was trying to accomplish is merging adjacent polygons that have matching attributes. I'm applying the buffer first in order to ensure close-by polygons are union'd even if they don't touch.

Looks like I was compiling with -O2 -Os, so I've changed that to -O3 and ran your query (using 3.7.3 for this test), which ran in ~168 sec (guess my docker instance inside a VM on my laptop is not quite as fast as your machine :). So yes looks like a normal Union works in reasonable time.

explain analyze select st_npoints(st_union(st_buffer(wkb_geometry,100))) from nws_haz_hang_201910 ;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10.15..10.19 rows=1 width=32) (actual time=168404.120..168404.121 rows=1 loops=1)
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=32) (actual time=0.006..0.041 rows=21 loops=1)
 Planning time: 0.055 ms
 Execution time: 168407.313 ms

comment:5 Changed 5 weeks ago by pramsey

I have reduced the problem to the minimum number of geometries. With the uniontest table from https://www.dropbox.com/s/la1g0xbvbdc849l/uniontest.sql.gz?dl=1 you can run

SELECT ST_Union(geom) FROM uniontest

and see the effect. It's quite particular to these geometries, or this particular arrangement of geometries. On that basis I am not going to delay the 3.8.0 release, and hope we can find the particular cause of the regression to patch up 3.8.1 and the 3.7 branch.

Last edited 5 weeks ago by pramsey (previous) (diff)

comment:6 Changed 5 weeks ago by greenlaw

Sounds good. Really appreciate you looking into this.

comment:7 Changed 5 weeks ago by mdavis

I think the problem here is caused by a union operation encountering a noding robustness error (TopologyException?), on some pair of input geometries. GEOS attempts to handle the error by invoking the heuristic line snapping code. This is very slow for geoms with large numbers of points.

The line snapping code has not changed for several versions. But there have been some changes to low-level code for computing intersections. This is likely producing a slightly different result, perhaps in the buffer operation (ironically, the buffers may actually be more accurate now, but because of that the linework of two geoms is more nearly coincident - which is the classic cause of noding errors).

More thought is required to come up with an internal GEOS solution for this. Some options are:

  • Instead of the snapping heuristic use the buffer(0) hack (which in this case also invokes some rounding to handle the noding issue, so does round all output vertices somewhat)
  • It might be possible to a priori decide if snapping will take "too long" and only then use buffer(0)
  • The new overlay code in the works does not use snapping (however, it also requires some rounding)

Given that buffers are only an approximation, another workaround might be to limit the precision of the computed buffers. This can be done now in PostGIS by inserting a ST_SnapToGrid call after the buffer (although there is a risk of introducing invalidity here - it would be better if ST_Buffer allowed specifying a precision parameter, since internally it can guarantee valid output).

JTS has the same problem (actually more obvious, since the slow snapping occurs when unioning the two lower-right geometries from the set that Paul posted)

comment:8 Changed 5 weeks ago by mdavis

I guess another option is to make line snapping faster! Although it's not an ideal heuristic anyway, since it is done externally to the overlay code. It would be better to figure out how to move it inside the overlay, where it could be done more effectively.

comment:9 Changed 5 weeks ago by mdavis

@greenlaw - are you able to try the workaround of running ST_SnapToGrid on the buffered geometry?

comment:11 Changed 5 weeks ago by greenlaw

@mdavis Yes, the ST_SnapToGrid workaround seems like it should work perfectly for me as long as I use a small enough value (0.1 produced ERROR: GEOSUnaryUnion: TopologyException: Input geom 1 is invalid: Self-intersection at or near point -15008907.701449277 7619503.4876811597 at -15008907.701449277 7619503.4876811597).

Not terribly concerned about precision here, but 0.001 seems to work just fine, and the command completes in 79 sec. The resulting geometry looks good to me.

I will put this change into production - don't expect any issues but will be sure to report back if I run into anything.

Will be happy to test any potential fixes you guys come up with down the road.

Thanks so much for your help.

EXPLAIN ANALYZE SELECT ST_Union(ST_SnapToGrid(ST_Buffer(wkb_geometry, 100.0), 0.001))::geometry(Geometry, 3857),category,vteccode,prod_type,validtime,starttime,endtime FROM public.nws_haz_hang_201910 GROUP BY  category,vteccode,prod_type,validtime,starttime,endtime;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=10.32..10.47 rows=10 width=706) (actual time=33352.916..79931.923 rows=5 loops=1)
   Group Key: category, vteccode, prod_type, validtime, starttime, endtime
   ->  Seq Scan on nws_haz_hang_201910  (cost=0.00..10.10 rows=10 width=706) (actual time=0.006..0.041 rows=21 loops=1)
 Planning time: 0.133 ms
 Execution time: 79931.984 ms

comment:12 Changed 5 weeks ago by mdavis

@greenlaw - excellent news, thanks for testing that. Hope it continues to work for you.

If you do encounter issues then try decreasing the grid size precision value a bit more.

Note: See TracTickets for help on using tickets.