Opened 5 years ago

Closed 5 years ago

#4372 closed defect (fixed)

PROJ 6: 4x slowdown over points

Reported by: Algunenano Owned by: Algunenano
Priority: high Milestone: PostGIS 3.0.0
Component: postgis Version: master
Keywords: Cc:

Description

# Setup

# Select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by clang version 8.0.0 (tags/RELEASE_800/final), 64-bit

A table with an index over st_transform.

The table was created as:

CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS
 SELECT * FROM (SELECT
 ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
 FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query
 LIMIT 0

And the index as:

CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857))

The table looks like this:

# \d+ analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4
                  Table "public.analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4"
   Column   |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description 
------------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 the_geom   | geometry                 |           |          |         | main     |              | 
 scalerank  | integer                  |           |          |         | plain    |              | 
 natscale   | integer                  |           |          |         | plain    |              | 
 labelrank  | integer                  |           |          |         | plain    |              | 
 featurecla | text                     |           |          |         | extended |              | 
 name       | text                     |           |          |         | extended |              | 
 namepar    | text                     |           |          |         | extended |              | 
 namealt    | text                     |           |          |         | extended |              | 
 diffascii  | integer                  |           |          |         | plain    |              | 
 nameascii  | text                     |           |          |         | extended |              | 
 adm0cap    | double precision         |           |          |         | plain    |              | 
 capalt     | double precision         |           |          |         | plain    |              | 
 capin      | text                     |           |          |         | extended |              | 
 worldcity  | double precision         |           |          |         | plain    |              | 
 megacity   | integer                  |           |          |         | plain    |              | 
 sov0name   | text                     |           |          |         | extended |              | 
 sov_a3     | text                     |           |          |         | extended |              | 
 adm0name   | text                     |           |          |         | extended |              | 
 adm0_a3    | text                     |           |          |         | extended |              | 
 adm1name   | text                     |           |          |         | extended |              | 
 iso_a2     | text                     |           |          |         | extended |              | 
 note       | text                     |           |          |         | extended |              | 
 latitude   | double precision         |           |          |         | plain    |              | 
 longitude  | double precision         |           |          |         | plain    |              | 
 changed    | double precision         |           |          |         | plain    |              | 
 namediff   | integer                  |           |          |         | plain    |              | 
 diffnote   | text                     |           |          |         | extended |              | 
 pop_max    | integer                  |           |          |         | plain    |              | 
 pop_min    | integer                  |           |          |         | plain    |              | 
 pop_other  | integer                  |           |          |         | plain    |              | 
 geonameid  | double precision         |           |          |         | plain    |              | 
 meganame   | text                     |           |          |         | extended |              | 
 ls_name    | text                     |           |          |         | extended |              | 
 ls_match   | integer                  |           |          |         | plain    |              | 
 checkme    | integer                  |           |          |         | plain    |              | 
 cartodb_id | integer                  |           |          |         | plain    |              | 
 created_at | timestamp with time zone |           |          |         | plain    |              | 
 updated_at | timestamp with time zone |           |          |         | plain    |              | 
Indexes:
    "analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9_st_transform_idx" gist (st_transform(the_geom, 3857))

# Testing with PROJ 5.2 and postgis trunk:

# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
 ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
 FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
INSERT 0 7313
Time: 1860.093 ms (00:01.860)

I've run the same test several times and it always takes ~1.8s.

perf divided by .so:

Perf by .so:
+   82.25%     0.00%  [unknown]
+   81.89%    18.38%  libc-2.28.so
+   67.54%    41.42%  postgres
+   58.06%     6.69%  postgis-3.so
+   15.06%    13.08%  libproj.so.13.1.1
+   11.65%     0.56%  libgeos_c.so.1.11.0
+   11.38%     8.73%  libgeos-3.8.0.so
+    6.17%     6.06%  libm-2.28.so
+    3.25%     2.99%  [kernel.kallsyms]
+    1.90%     0.05%  libpthread-2.28.so
+    1.84%     1.71%  libstdc++.so.6.0.25
     0.34%     0.34%  ld-2.28.so
     0.00%     0.00%  [vdso]

# Same with PROJ 6 (just built from master, latest ref is fb125618fd18f112ed6f37662b021d07a602ff90), and also postgis trunk (rebuilt with the dependency):

test_windshaft_cartodb_user_1_db=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
 ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
 FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
INSERT 0 7313
Time: 7184.529 ms (00:07.185)

Running it several times shows it always takes ~7.2 secs, that's as slow as the build using PROJ 5.2.

Perf by .so:

Perf by .so:
+   87.70%    33.62%  libc-2.28.so
+   79.96%     0.00%  [unknown]
+   61.70%    14.32%  postgres
+   59.86%     1.77%  postgis-3.so
+   45.09%    25.42%  libproj.so.15.0.0
+   10.26%     8.80%  libstdc++.so.6.0.25
+    7.02%     6.98%  [kernel.kallsyms]
+    3.72%     0.48%  libicuuc.so.64.1
+    3.48%     3.42%  libgcc_s.so.1
+    2.94%     0.19%  libgeos_c.so.1.11.0
+    2.83%     2.29%  libgeos-3.8.0.so
+    1.95%     1.90%  libm-2.28.so
     1.02%     0.38%  libpthread-2.28.so
     0.37%     0.37%  libsqlite3.so.0.8.6
     0.06%     0.05%  ld-2.28.so
     0.00%     0.00%  libk5crypto.so.3.1
     0.00%     0.00%  libxml2.so.2.9.9
     0.00%     0.00%  [vdso]

If I run only ST_Transform over the table, the times are more similar; proj6 is still slower but not 4x slower:

Running only ST_Transform (Select ST_Transform(the_geom, 3857) from analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;)

PROJ 5 build: Time: 1558.226 ms (00:01.558) Time: 1570.106 ms (00:01.570) Time: 1565.649 ms (00:01.566)

PROJ 6 build: Time: 1764.034 ms (00:01.764) Time: 1710.738 ms (00:01.711) Time: 1732.082 ms (00:01.732)

Have we lost some caching in the proj6 update?

Change History (18)

comment:1 by pramsey, 5 years ago

I tried this test case and didn't see a substantial difference in performance between inserting into a table with a functional index on ST_Transform and inserting into a table with an ordinary index

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1 (g geometry);
create table t2 (g geometry);
create table t3 (g geometry);

-- no index on t1
create index t2_g_x on t2 using gist (g);
create index t3_g_x on t3 using gist (st_transform(g, 4326));

\timing
-- just transform
select st_transform(st_setsrid(st_makepoint(360.0*random()-180, 180*random()-90), 4326), 3857) as g from generate_series(1,10000);
-- transform + insert, no index
insert into t1 select st_transform(st_setsrid(st_makepoint(360.0*random()-180, 180*random()-90), 4326), 3857) as g from generate_series(1,10000);
-- transform + insert + index
insert into t2 select st_transform(st_setsrid(st_makepoint(360.0*random()-180, 180*random()-90), 4326), 3857) as g from generate_series(1,10000);
-- transform + insert + functional index
insert into t3 select st_transform(st_setsrid(st_makepoint(360.0*random()-180, 180*random()-90), 4326), 3857) as g from generate_series(1,10000);

comment:2 by Algunenano, 5 years ago

Yup, I can't reproduce the 4x slowdown with your test case neither.

Here is a step by step method to reproduce my test case:

  • Download the fixture (https://raw.githubusercontent.com/CartoDB/Windshaft-cartodb/master/test/support/sql/ported/populated_places_simple_reduced.sql)
  • Create a clean database, install postgis and import the table (ignore the errors).
    psql -d proj_test -f /tmp/populated_places_simple_reduced.sql 
    
  • Create the table:
    CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS
     SELECT * FROM (SELECT
     ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
     FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query
     LIMIT 0
    
  • Create the index:
    CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857))
    
  • Insert (this is using PROJ 5):
    # INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
     ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
     FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
    
    INSERT 0 7313
    Time: 1801.362 ms (00:01.801)
    
  • Change to PROJ 6 and truncate the table (I restarted postgresql to make sure it's clean from the old .so's):
    # truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;
    FATAL:  terminating connection due to administrator command
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Succeeded.
    Time: 4.003 ms
    proj_test=# truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;
    TRUNCATE TABLE
    Time: 60.029 ms
    
  • Insert using the same query, now using PROJ 6:
    # INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
     ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
     FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
    INSERT 0 7313
    Time: 7515.885 ms (00:07.516)
    

Technically it's doing the same as yours, inserting points, but without doing transformations in the query and instead calling ST_Buffer, but I've tried removing ST_Transform in your queries and I still didn't see the slowdown.

comment:3 by strk, 5 years ago

Can you try to invert the order of your excercise, just in case we're missing some other state ? (like needed vacuum after truncate..)

comment:4 by Algunenano, 5 years ago

Inverting the order I get the same result (proj6 is 4x slower):

# create database proj_test2;
CREATE DATABASE
postgres=# \c proj_test2;
You are now connected to database "proj_test2" as user "postgres".
proj_test2=# \timing on
Timing is on.
proj_test2=# create extension postgis;
CREATE EXTENSION
Time: 566.929 ms
proj_test2=# \q
 psql -d proj_test2 -f /tmp/populated_places_simple_reduced.sql 
CREATE TABLE
CREATE SEQUENCE
ALTER SEQUENCE
ALTER TABLE
[...]
$ psql -U postgres proj_test2 
psql (11.2)
Type "help" for help.

proj_test2=# \timing on
Timing is on.
proj_test2=# CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS
proj_test2-#  SELECT * FROM (SELECT
proj_test2(#  ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
proj_test2(#  FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query
proj_test2-#  LIMIT 0;
SELECT 0
Time: 37.848 ms
proj_test2=# CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857))
proj_test2-# ;
CREATE INDEX
Time: 5.513 ms
proj_test2=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
proj_test2-#  ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
proj_test2-#  FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
INSERT 0 7313
Time: 1813.043 ms (00:01.813)
proj_test2=# \q

Install proj6 + postgis compiled against it:

$ sudo pacman -U proj-git/proj-git-5.2.0.r710.gfb125618f-1-x86_64.pkg.tar postgis-git/postgis-git-3.0_svn_trunk_LTO.12686.cb0e3bc69-1-x86_64.pkg.tar.proj6
[...]
$ psql -U postgres proj_test2 
psql (11.2)
Type "help" for help.

proj_test2=# truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 ;
TRUNCATE TABLE
proj_test2=# \timing on
Timing is on.
proj_test2=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT
 ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at
 FROM (select * from populated_places_simple_reduced) _camshaft_buffer;
INSERT 0 7313
Time: 7516.018 ms (00:07.516)
proj_test2=#

comment:5 by Algunenano, 5 years ago

I've just noticed (trying proj6.1 again) that I get a lot of warning in postgres logs:

jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create: no database context specified
jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create: no database context specified
jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db
jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db

My proj installation seems to be ok according to proj.pc:

cat /usr/lib/pkgconfig/proj.pc
prefix=/usr
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include
datadir=${prefix}/share/proj

Name: proj
Description: Cartographic Projections Library.
Requires:
Version: 6.1.0
Libs: -L${libdir} -lproj
Cflags: -I${includedir}

proj.db is in /usr/share/proj/proj.db, which matches ${prefix}/share/proj/proj.db.

I'm trying to understand if I need to setup an env variable, but I was expecting this to be automatic.

comment:6 by Algunenano, 5 years ago

If I remove the call to SetPROJLibPath to avoid the calls to proj_context_set_search_paths the error messages disappear, but what used to be 1.8s with PROJ 5.2 or 7.6s with PROJ6 before this change now becomes 20x slower: 116.2secs.

comment:7 by Algunenano, 5 years ago

Doing tests using svn-trunk as it is right now:

# Select count(*) from populated_places_simple_reduced;
 count 
-------
  7313
(1 row)
projtest=# \d populated_places_simple_reduced
                                                Table "public.populated_places_simple_reduced"
        Column        |           Type           | Collation | Nullable |                               Default                            
   
----------------------+--------------------------+-----------+----------+------------------------------------------------------------------
---
 the_geom             | geometry(Geometry,4326)  |           |          | 
 scalerank            | integer                  |           |          | 
 natscale             | integer                  |           |          | 
 labelrank            | integer                  |           |          | 
 featurecla           | text                     |           |          | 
 name                 | text                     |           |          | 
 namepar              | text                     |           |          | 
 namealt              | text                     |           |          | 
 diffascii            | integer                  |           |          | 
 nameascii            | text                     |           |          | 
 adm0cap              | double precision         |           |          | 
 capalt               | double precision         |           |          | 
 capin                | text                     |           |          | 
 worldcity            | double precision         |           |          | 
 megacity             | integer                  |           |          | 
 sov0name             | text                     |           |          | 
 sov_a3               | text                     |           |          | 
 adm0name             | text                     |           |          | 
 adm0_a3              | text                     |           |          | 
 adm1name             | text                     |           |          | 
 iso_a2               | text                     |           |          | 
 note                 | text                     |           |          | 
 latitude             | double precision         |           |          | 
 longitude            | double precision         |           |          | 
 changed              | double precision         |           |          | 
 namediff             | integer                  |           |          | 
 diffnote             | text                     |           |          | 
 pop_max              | integer                  |           |          | 
 pop_min              | integer                  |           |          | 
 pop_other            | integer                  |           |          | 
 geonameid            | double precision         |           |          | 
 meganame             | text                     |           |          | 
 ls_name              | text                     |           |          | 
 ls_match             | integer                  |           |          | 
 checkme              | integer                  |           |          | 
 cartodb_id           | integer                  |           | not null | nextval('populated_places_simple_reduced_cartodb_id_seq'::regclas
s)
 created_at           | timestamp with time zone |           | not null | now()
 updated_at           | timestamp with time zone |           | not null | now()
 the_geom_webmercator | geometry(Geometry,3857)  |           |          | 
Indexes:
    "populated_places_simple_reduced_pkey" PRIMARY KEY, btree (cartodb_id)
    "populated_places_simple_reduced_cartodb_id_key" UNIQUE CONSTRAINT, btree (cartodb_id)
    "populated_places_simple_reduced_the_geom_webmercator_idx" gist (the_geom_webmercator)
    "the_geom_ffebe516_1e32_11e4_ae44_6476ba93d4f6" gist (the_geom)

# Select count(1), st_geometrytype(the_geom_webmercator) as t from populated_places_simple_reduced group by t ;
 count |    t     
-------+----------
     1 | 
  7312 | ST_Point
(2 rows)

PROJ 5.2:

Select st_transform(the_geom_webmercator, 4234) from populated_places_simple_reduced;
Time: 22.917 ms (first time)
Time: 14.330 ms (second time)

PROJ 6.1:

Time: 69.540 ms (first time)
Time: 51.091 ms (second time)

This slowdown also appears when doing 3857 → 4326.

comment:8 by Algunenano, 5 years ago

I have a WIP in https://github.com/postgis/postgis/pull/422 to address some of the issues found here.

The main fix to gain the performance back is to also cache the proj_crs_is_swapped calls to avoid making them in every ST_Transform call.

I'm still investigating further to try to understand why removing the invalid set_searchpath makes some calls 10x slower with PROJ6. Also I need to review that I haven't broken anything in proj5

comment:9 by Algunenano, 5 years ago

Summary: PROJ 6: 4x slowdown on indexes over ST_TransformPROJ 6: 4x slowdown over points

Now testing without removing the invalid set_searchpath and caching proj_crs_is_swapped, ST_Transform has a similar performance between 5.2 and 6.

Once I've investigated further into the index issue, I've noticed that the slowdown isn't related to indexing / caching, but to ST_Buffer(geography):

SQL:

explain analyze SELECT
 ST_Buffer(the_geom::geography, 50000) from analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;

PROJ 5.2: 640.939 ms PROJ 6.1: 2112.235 ms

That seems to be related to the multiple projections created by spheroid_init_from_srid, which is called for every row. I'll dig deeper.

Still pending: Check why removing the invalid set_searchpath makes some calls 10x slower with PROJ6.

comment:10 by Algunenano, 5 years ago

Digging deeper: these are the functions affected by the slowdown (apart from transform, which is fixed by one of my patches):

  • TWKBFromLWGEOM.
  • TWKBFromLWGEOMArray.
  • geography_in / geography_from_text / geography_from_geometry / geography_from_geometry / geography_recv / geography_typmod_in as they call srid_check_latlong.
  • geography_centroid / geography_distance_knn / geography_distance_uncached / geography_distance / geography_dwithin / geography_distance_tree / geography_dwithin_uncached / geography_area / geography_perimeter / geography_length / geography_project / geography_azimuth as they call spheroid_init_from_srid.

An example with the same query using ST_AsTWKB: 404ms in PROJ5 vs 1230ms in PROJ6.

comment:11 by Algunenano, 5 years ago

I've replaced the PR with https://github.com/postgis/postgis/pull/423

What this does:

  • Cleans up configure.ac when running under PROJ6 (avoids error message that its ignored).
  • PROJ 6+ (and only 6+):
    • Caches source_swapped / target_swapped for all PJs. This brings ST_Transform to, more or less, almost the same performance as 5.1 (except the first row, which creates the PJ).
    • Caches source_is_latlong / source_semi_major_metre / source_semi_minor_metre when required (source srid == target srid). This brings a bunch of functions (ST_AsTWKB, geography functions) to, more or less, the same performance as before (except the first row, which creates the PJ).
    • Uses proj_trans instead of proj_trans_generic for single points (faster and makes it on par of 5.2 performance for that case).
  • PROJ (all releases):
    • Allocate auxiliar memory (LWPROJ / PJ) in the parent memory context instead of manually using malloc/free.
    • Avoid double lookup (one to check if the item is in the cache, one for retrieval).
    • Increases PROJ_CACHE_ITEMS from 16 to 128. PROJ_BACKEND_HASH_SIZE goes from 32 to 256. This is to minimize the huge impact of creating projections between 5.2 and 6.1, specially for ST_Buffer(geography) which autogenerates srids based on SRID_RESERVE_OFFSET. There are over 120+ used SRIDs in that range, so deleting and recreating PJs is a huge performance loss (it isn't as bad for 5.1 since PJ creation is much faster).
    • Removes the call to proj_context_set_search_paths. AFAIK, we don't distribute proj files anymore so this is legacy. It's also preventing the default proj installation of finding proj.db and using the fallback method (faster but more imprecise). If it breaks something, I'll consider reading it just for Windows (as it's mentioned in the comments of the function).

Any comments are welcome. I'll try to break this down into several commits for clarity.

comment:12 by Raul Marin, 5 years ago

In 17541:

PROJ6: Speed improvements

Caches more elements to avoid recalculation per row

References #4372

comment:13 by Raul Marin, 5 years ago

In 17542:

Increase size of internal PROJ cache

Creating PROJ projections is specially slow for PROJ6, so
we want to avoid recalculating them as much as possible

This is specially important for ST_Buffer(geography) since
it decides the SRID, and thus the projection, dinamically
based on the geometry position around the globe.

References #4372

comment:14 by Raul Marin, 5 years ago

In 17543:

PROJ: Avoid double lookups

References #4372

comment:15 by Raul Marin, 5 years ago

In 17544:

PROJ: Use the cache memory context

Before it was using malloc/free manually, so with
this change we depend fully on Postgres allocator

References #4372

comment:16 by Raul Marin, 5 years ago

In 17545:

PROJ6: Use proj_trans for single points

It's slighly faster to use it for single points,
and improves it to match PROJ5 performance

References #4372

comment:17 by Raul Marin, 5 years ago

In 17547:

PROJ: Remove the extra MemoryContext for PG9.6+

In PG9.6+ we can allocate callbacks in the parent node
and associate them directly without creating new contextes

References #4372

comment:18 by Raul Marin, 5 years ago

Resolution: fixed
Status: assignedclosed

In 17548:

PROJ cache: Improve the eviction process

Instead of selecting a projection that doesn't match
the input source srid and silently drop all cache elements
after it (potentially dropping the whole cache), we keep
a count to know how many item each element has been used
and only drop the element with less uses.

Closes #4372

Note: See TracTickets for help on using tickets.