Opened 3 months ago

Closed 4 weeks ago

Last modified 4 weeks ago

#4890 closed defect (fixed)

ST_Transform large slow down from upgrade to 3.1.1 from 2.4

Reported by: SAbernethy Owned by: pramsey
Priority: medium Milestone: PostGIS 3.1.2
Component: postgis Version: 3.1.x
Keywords: Cc:

Description

The issue is that the ST_Transform function inside postgis when upgraded from 2.4 to either 3.0.3 or 3.1.1 runs about 20 times slower. This is regardless of the version of postgres, We are currently running PG11 and are upgrading to PG13 and are having to upgrade postgis version due to 2.4 not being supported in PG13.

Attached is the timings for the diffrent postgis

How to reproduce setup postgres either 11 or 13, install postgis version 2.4 / 3.03 / 3.11 create empty database install posgis and run the following query "select ST_Transform('0103000020C90800000100000005000000000000008 CAE264100000000BCA34641000000008CAE264100000000CDA3464100 000000C2AE264100000000CDA3464100000040C3AE264100000000BCA 34641000000008CAE264100000000BCA34641'::geometry,4326);"

Attachments (4)

PG11 Postgis2.4.txt (3.0 KB) - added by SAbernethy 3 months ago.
PG11 Postgis3.0.3.txt (2.2 KB) - added by SAbernethy 3 months ago.
PG11 Postgis3.1.1.txt (2.2 KB) - added by SAbernethy 3 months ago.
PG13 Postgis3.1.1.txt (2.2 KB) - added by SAbernethy 3 months ago.

Download all attachments as: .zip

Change History (24)

Changed 3 months ago by SAbernethy

Attachment: PG11 Postgis2.4.txt added

Changed 3 months ago by SAbernethy

Attachment: PG11 Postgis3.0.3.txt added

Changed 3 months ago by SAbernethy

Attachment: PG11 Postgis3.1.1.txt added

Changed 3 months ago by SAbernethy

Attachment: PG13 Postgis3.1.1.txt added

comment:1 Changed 3 months ago by SAbernethy

A note on this that the sql test I have put there is a more simple form of the first example given on here https://postgis.net/docs/ST_Transform.html

comment:2 Changed 2 months ago by SAbernethy

Priority: mediumblocker

comment:3 Changed 2 months ago by SAbernethy

Done some further testing (on the same machine with same version of postgres v11 and same postgres conf) and the results are as follows.

On a table that I'm inserting rows into I do the following sql via trigger to produce the shape geometry. The data i'm pulling for the inputs is not changing between runs either.

CREATE OR REPLACE FUNCTION ot2.create_shape_for_marker_radius(_lat double precision, _lon double precision, _radius integer)
 RETURNS geometry
 LANGUAGE plpgsql
AS $function$
DECLARE
BEGIN
	RETURN postgis.st_transform(
		postgis.st_buffer(
			postgis.st_transform(postgis.st_setsrid(postgis.st_makepoint(_lon, _lat),4269), 26986),
			GREATEST(1, _radius * 0.3048)::integer),
		4326);
END;
$function$

On 2.4.8 I get following results Doing 500000 inserts takes 4 minutes and 55 secs and query times below

      avg time       |       max time      |       min time
---------------------+---------------------+-----------------
 00:00:00.000526     | 00:00:00.033375     | 00:00:00.000489

On 2.5.5 I get following results Doing 500000 inserts takes 4 minutes and 19 secs and query times below

      avg time       |       max time      |       min time
---------------------+---------------------+-----------------
 00:00:00.000516     | 00:00:00.032929     | 00:00:00.000481

On 3.0.3 I get following results Doing 500000 inserts takes 1 hour and 4 secs and query times below

      avg time       |       max time      |       min time
---------------------+---------------------+-----------------
 00:00:00.007205     | 00:00:00.054506     | 00:00:00.007068

On 3.1.1 I get the following Doing 500000 inserts takes 1 hour and 57 secs and query times below

       avg time      |       max time        |       min time
---------------------+-----------------------+-----------------
 00:00:00.007306     | 00:00:00.054673       | 00:00:00.007076

As you can see this timing difference is a large and this is a blocker for us upgrading as we can be calling this sql at least 2-3 mil times per day. A large slow down like this makes this unreasonable to move forward with upgrading either version 3.0.3 or 3.1.1 of postgis.

comment:4 Changed 2 months ago by pramsey

The results of postgis_full_version() would be interesting to me.

When we moved into the proj 6 and higher series we got the new "transformation chain" calculations which are much more expensive to set up when turning an EPSG->EPSG transformation into a transformation object ready to process coordinates.

PostGIS caches the transformation object per-statement when it gets it, so over a 1M vertex object, the new set-up cost is not really visible. However, over 1M separate statements that set up cost will not be amortized at all.

A workaround is to try and change your data flow so that you can do your transforms in one query, instead of millions of separate queries. (You also might look at st_buffer(geography) as an alternative to your current, or really just using cartographic markers instead of building out concrete circles, but none of that has anything to do with projection overhead.)

First thing though, your full versions so we can see what the proj version difference is.

comment:5 Changed 2 months ago by SAbernethy

Here is the postgis_full_version() on 2.4

POSTGIS="2.4.9" PGSQL="110" GEOS="3.9.1-CAPI-1.14.2" PROJ="Rel. 7.2.1, January 1st, 2021" GDAL="GDAL 3.2.2, released 2021/03/05" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" (core procs from "2.4.8 r17696" need upgrade) RASTER (raster procs from "2.4.8 r17696" need upgrade)

and here is 3.1

POSTGIS="3.1.1 aaf4c79" [EXTENSION] PGSQL="110" GEOS="3.9.1-CAPI-1.14.2" PROJ="7.2.1" GDAL="GDAL 3.2.2, released 2021/03/05" LIBXML="2.9.1" LIBJSON="0.11" RASTER

comment:6 Changed 2 months ago by mboeringa

Paul, does this also affect casts to geography of WGS1984, SRID 4326 stored data?

I am wondering, because I also seem to be hit by a considerable slow down, in a similar - in my case multi-threaded - processing workflow doing millions of separate updates. It is PostGIS 3.1.1 as well.

The code includes, as just one of many things, a calculation of geodesic area:

ST_Area(<GEOMETRY_COLUMN_IN_4236>::geography,True)

Would this possibly also be affected by the same issue, even though not explicitely calling ST_Transform?

That is interesting by the way, I now see that in SAbernethy's case, there appears to be no difference in the Proj version, both 7.2.1 (mine is 6.3.1), so a potential problem is elsewhere?

Last edited 2 months ago by mboeringa (previous) (diff)

comment:7 Changed 2 months ago by mboeringa

Although I now also notice that in SAbernethy's case, the 2.4 version shows:

"core procs from "2.4.8 r17696" need upgrade"

So I don't know how that affects the actual code run on 2.4, and if it is actually using Proj 7.2.1?

Last edited 2 months ago by mboeringa (previous) (diff)

comment:8 Changed 2 months ago by mboeringa

Likely related? ticket on the PROJ GitHub repository:

https://github.com/OSGeo/PROJ/issues/1367

comment:9 Changed 2 months ago by komzpa

Hello SAbernethy

will you be able to rewrite your simple function from plpgsql into sql language function? replace "return" with "select", drop the begin/end and change language. It should become much faster.

comment:10 Changed 8 weeks ago by mboeringa

Paul, I know little about PostgreSQL and PostGIS's internal working and even less about the development of both, but, if a full fix by somehow caching the transformation object at the PostgreSQL application level is to difficult to implement in a short period of time, would it at least potentially be possible to cache the transform object at the transaction level?

Many systems that do mass scale updates/inserts at least batch individual statements in relatively large sets in one transaction.

It would not be a solution for all problems and workflows, but would help quite a number of people I think.

comment:11 Changed 8 weeks ago by komzpa

Priority: blockermedium

comment:12 Changed 7 weeks ago by pramsey

With respect to ST_Area(geometry::geography) I have confirmed that the cast portion of the query hits the same object cache as the ST_Transform() function, so it is also subject to initialization overhead. (Because in order to case to geography you have to confirm that the SRID is in fact a geographic CRS, so you need to look it up, and that requires initializing the proj database, which incurs the cost.)

comment:13 Changed 4 weeks ago by Paul Ramsey <pramsey@…>

In 933e23cd/git:

Lengthen proj cache lifetime to the life of the backend, and move memory context appropriately to match. Simplify the retrieval of global constants somewhat to not require FunctionCallInfo? anymore. References #4890

comment:14 Changed 4 weeks ago by robe

Test from David Boone that exercises the issue: I’ve been fighting a performance issue for a while with some installs of Postgres/PostGIS, and finally managed to create a small reproducible test case. It shows a simple query’s execution time going from 10 to 400 ms when upgrading PostGIS from 2.5 -> 3.1. This may be the same issue or related to: https://trac.osgeo.org/postgis/ticket/4890


DROP EXTENSION IF EXISTS postgis CASCADE;
DROP TABLE IF EXISTS points;

CREATE EXTENSION postgis;

CREATE TABLE points AS SELECT * FROM (VALUES ('SRID=4326;POINT(-124.9921 49.6851)'::geometry), ('SRID=4326;POINT(-119.4032 50.0305)'::geometry), ('SRID=4326;POINT(-122.799 49.1671)'::geometry), ('SRID=4326;POINT(-122.3379 49.0597)'::geometry), ('SRID=4326;POINT(-123.1264 49.2671)'::geometry), ('SRID=4326;POINT(-122.7132 49.0519)'::geometry), ('SRID=4326;POINT(-124.3475 49.3042)'::geometry), ('SRID=4326;POINT(-119.389 49.8891)'::geometry), ('SRID=4326;POINT(-123.126 49.281)'::geometry), ('SRID=4326;POINT(-122.6606 49.1134)'::geometry), ('SRID=4326;POINT(-124.3233 49.312)'::geometry), ('SRID=4326;POINT(-124.0478 49.2397)'::geometry), ('SRID=4326;POINT(-119.2683 50.266)'::geometry), ('SRID=4326;POINT(-121.9705 49.081)'::geometry), ('SRID=4326;POINT(-123.8854 49.482)'::geometry), ('SRID=4326;POINT(-123.1528 49.77)'::geometry), ('SRID=4326;POINT(-120.8051 50.488)'::geometry), ('SRID=4326;POINT(-122.6403 49.1652)'::geometry), ('SRID=4326;POINT(-122.7717 49.2433)'::geometry), ('SRID=4326;POINT(-121.9587 49.1661)'::geometry))
p(g);

CREATE INDEX ON points USING gist(g);

CREATE OR REPLACE FUNCTION get_closest(p geometry(POINT)) RETURNS geometry(POINT) AS $$
    SELECT g FROM points ORDER BY st_transform(p, 3005) <-> st_transform(points.g, 3005) LIMIT 1
$$
STABLE
LANGUAGE SQL;

SELECT postgis_full_version();

EXPLAIN (ANALYZE) SELECT *, get_closest(g) FROM points ;


I hope this is useful, please feel free to add it to the issue tracker or any other place / person it can be of use. Please let me know if I can help further or contribute in any way.

comment:15 Changed 4 weeks ago by Regina Obe <lr@…>

In b07270c/git:

Break proj tests into two and add an additional to test improvements references #4890

comment:16 Changed 4 weeks ago by Regina Obe <lr@…>

In 7a1eb0e6/git:

Break proj tests into two and add an additional to test improvements references #4890

comment:17 Changed 4 weeks ago by robe

I've redone the test trying to break out better the tests that can take advantage of cache verses not.

Here are the benmarks I have from our various bots. General summary - Proj >= 6 startup time is just much worse than lower versions. PostGIS 3.1 has not been patched yet. PostGIS 3.2 is after pramsey's patch. regress_proj_4890 - It's clear from below that the patch improves significantly the speed of Daniel Boone's test listed above because of the improved proj caching for proj >= 6 and no noticeable impact on proj < 6.

regress_proj_basic - This is a test of custom proj added and standard proj and reuse of these projs. Again improvement about 2 fold (it's a bit better because about 72ms of that time is proj startup costs when you flip projections). Again only impact for proj >= 6 and no impact for lower proj.

regress_proj_adhoc - has input string proj4 and retransforms. Again a bit better with the patch, no impact for lower proj

regress_proj_cache_overflow - is cycling thru 12 or so projections -- so the slow upstart of newer proj api really shows here. The patch improves the case slightly but not by much. The only concern I have is that (and I checked a couple of runs), for the proj 6.1 case, the patch version does show a slight degradation. Given 6.1.1 isn't even the latest proj micro, I'm not that concerned even if there is degradation. All other proj > 6 seem to have improved slightly.

PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.0-16) 10.2.0, 64-bit
  Postgis 3.1.2dev - (e1856ea) - 2021-05-19 00:44:06
  scripts 3.1.2dev e1856ea
  raster scripts 3.1.2dev e1856ea
  GEOS: 3.8.2dev-CAPI-1.13.3
  PROJ: 7.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.1.4, released 2020/10/20 
 
 ./regress/core/regress_proj_basic .. ok in 100 ms
 ./regress/core/regress_proj_adhoc .. ok in 402 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 1353 ms
 ./regress/core/regress_proj_4890 .. ok in 506 ms
 
 PostgreSQL 13.2 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
  Postgis 3.2.0dev - (fa69ef0) - 2021-05-19 00:32:19
  scripts 3.2.0dev fa69ef0
  raster scripts 3.2.0dev fa69ef0
  GEOS: 3.9.2dev-CAPI-1.14.2
  PROJ: 7.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.1.4, released 2020/10/20
 ./regress/core/regress_proj_basic .. ok in 44 ms
 ./regress/core/regress_proj_adhoc .. ok in 355 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 1162 ms
 ./regress/core/regress_proj_4890 .. ok in 52 ms
 
 PostgreSQL 11.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-3) 10.2.1 20201224, 64-bit
  Postgis 3.1.2dev - (e1856ea) - 2021-05-19 00:43:59
  scripts 3.1.2dev e1856ea
  raster scripts 3.1.2dev e1856ea
  GEOS: 3.7.4dev-CAPI-1.11.4 2926110
  PROJ: Rel. 5.2.0, September 15th, 2018
  SFCGAL: 1.3.9
  GDAL: GDAL 2.4.4, released 2020/01/08

 ./regress/core/regress_proj_basic .. ok in 17 ms
 ./regress/core/regress_proj_adhoc .. ok in 19 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 39 ms
 ./regress/core/regress_proj_4890 .. ok in 28 ms
 
 PostgreSQL 10.15 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-3) 10.2.1 20201224, 64-bit
  Postgis 3.1.2dev - (e1856ea) - 2021-05-19 00:43:56
  scripts 3.1.2dev e1856ea
  raster scripts 3.1.2dev e1856ea
  GEOS: 3.6.6dev-CAPI-1.10.6 4fe5ee7
  PROJ: Rel. 4.9.0, 13 September 2014
  SFCGAL: 1.3.9
  GDAL: GDAL 2.3.3, released 2018/12/14
  
 ./regress/core/regress_proj_basic .. ok in 16 ms
 ./regress/core/regress_proj_adhoc .. ok in 14 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 39 ms
 ./regress/core/regress_proj_4890 .. ok in 24 ms
 
 PostgreSQL 11.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-3) 10.2.1 20201224, 64-bit
  Postgis 3.2.0dev - (fa69ef0) - 2021-05-19 00:32:13
  scripts 3.2.0dev fa69ef0
  raster scripts 3.2.0dev fa69ef0
  GEOS: 3.7.4dev-CAPI-1.11.4 2926110
  PROJ: Rel. 5.2.0, September 15th, 2018
  SFCGAL: 1.3.9
  GDAL: GDAL 2.4.4, released 2020/01/08

  ./regress/core/regress_proj_basic .. ok in 14 ms
 ./regress/core/regress_proj_adhoc .. ok in 14 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 28 ms
 ./regress/core/regress_proj_4890 .. ok in 17 ms

  PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.0-16) 10.2.0, 64-bit
  Postgis 3.1.2dev - (e1856ea) - 2021-05-19 00:43:59
  scripts 3.1.2dev e1856ea
  raster scripts 3.1.2dev e1856ea
  GEOS: 3.7.3dev-CAPI-1.11.3 4ae53b7
  PROJ: 6.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.0.4, released 2020/01/28
  
 ./regress/core/regress_proj_basic .. ok in 79 ms
 ./regress/core/regress_proj_adhoc .. ok in 157 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 2641 ms
 ./regress/core/regress_proj_4890 .. ok in 362 ms
 
 PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.0-16) 10.2.0, 64-bit
  Postgis 3.2.0dev - (fa69ef0) - 2021-05-19 00:32:30
  scripts 3.2.0dev fa69ef0
  raster scripts 3.2.0dev fa69ef0
  GEOS: 3.7.3dev-CAPI-1.11.3 4ae53b7
  PROJ: 6.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.0.4, released 2020/01/28
  
 ./regress/core/regress_proj_basic .. ok in 35 ms
 ./regress/core/regress_proj_adhoc .. ok in 153 ms
 ./regress/core/regress_proj_cache_overflow .. ok in 3055 ms
 ./regress/core/regress_proj_4890 .. ok in 45 ms
Last edited 4 weeks ago by robe (previous) (diff)

comment:18 Changed 4 weeks ago by Paul Ramsey <pramsey@…>

Resolution: fixed
Status: newclosed

In 695dc142/git:

Lengthen proj cache lifetime to the life of the backend, and move memory context appropriately to match. Simplify the retrieval of global constants somewhat to not require FunctionCallInfo? anymore. Closes #4890

comment:19 Changed 4 weeks ago by icehong

It seems like proj cache doesn't take full-effect , is it possible to got the same performance in PostgreSQL 11 and PROJ 5.2 ? Thanks for the great work, BTW.

Lengthen proj cache lifetime to the life of the backend, and move mem… CI #193:

PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.0-16) 10.2.0, 64-bit
  Postgis 3.1.2dev - (695dc14) - 2021-05-19 17:21:57
  scripts 3.1.2dev 695dc14
  raster scripts 3.1.2dev 695dc14
  GEOS: 3.8.2dev-CAPI-1.13.3
  PROJ: 7.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.1.4, released 2020/10/20
  ./regress/core/regress_proj_basic .. ok in 49 ms
  ./regress/core/regress_proj_adhoc .. ok in 414 ms
  ./regress/core/regress_proj_cache_overflow .. ok in 1352 ms
  ./regress/core/regress_proj_4890 .. ok in 56 ms

PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.0-16) 10.2.0, 64-bit
  Postgis 3.1.2dev - (695dc14) - 2021-05-19 17:21:55
  scripts 3.1.2dev 695dc14
  raster scripts 3.1.2dev 695dc14
  GEOS: 3.7.3dev-CAPI-1.11.3 4ae53b7
  PROJ: 6.1.1
  SFCGAL: 1.3.9
  GDAL: GDAL 3.0.4, released 2020/01/28
  ./regress/core/regress_proj_basic .. ok in 46 ms
  ./regress/core/regress_proj_adhoc .. ok in 160 ms
  ./regress/core/regress_proj_cache_overflow .. ok in 3307 ms
  ./regress/core/regress_proj_4890 .. ok in 48 ms

PostgreSQL 11.10 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-3) 10.2.1 20201224, 64-bit
  Postgis 3.1.2dev - (695dc14) - 2021-05-19 17:21:40
  scripts 3.1.2dev 695dc14
  raster scripts 3.1.2dev 695dc14
  GEOS: 3.7.4dev-CAPI-1.11.4 2926110
  PROJ: Rel. 5.2.0, September 15th, 2018
  SFCGAL: 1.3.9
  GDAL: GDAL 2.4.4, released 2020/01/08
  ./regress/core/regress_proj_basic .. ok in 14 ms
  ./regress/core/regress_proj_adhoc .. ok in 14 ms
  ./regress/core/regress_proj_cache_overflow .. ok in 37 ms
  ./regress/core/regress_proj_4890 .. ok in 16 ms

Last edited 4 weeks ago by icehong (previous) (diff)

comment:20 Changed 4 weeks ago by robe

@icehong why do you think that?

The issue is that the Proj new API takes much longer to cache. Like a 10 to 100 times longer than Proj 5.2. There is nothing we can do about that. The tests might be worded a little badly.

What the regress_proj_cache_overflow is testing is rotating thru like 20 SRIDs to make sure we don't crash the cache. So it's more of a crasher test than a speed test. So essentially to cost in new Proj of caching the 20 or so spatial refs overshadows it's use. It is what it is.

As far as the proj_basic - it's more or less the same story. I was thinking I should add more proj calls of the SRIDs loaded so that the caching does not overshadow the actual use. It's like only 10 calls in there with 3 different projs right now.

Question to @pramsey -- I assume we aren't caching proj in shared mem right (or are we) and it's only cached at the session level now? That's the only thing I can think would help here, but maybe we are doing that already and it's just cause the tests are using different projs that we don't see the effect.

Note: See TracTickets for help on using tickets.