Opened 6 years ago
Closed 6 years ago
#4372 closed defect (fixed)
PROJ 6: 4x slowdown over points
Reported by: | Algunenano | Owned by: | Algunenano |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 3.0.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description ¶
# Setup
# Select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by clang version 8.0.0 (tags/RELEASE_800/final), 64-bit
A table with an index over st_transform.
The table was created as:
CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS SELECT * FROM (SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query LIMIT 0
And the index as:
CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857))
The table looks like this:
# \d+ analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 Table "public.analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+--------------------------+-----------+----------+---------+----------+--------------+------------- the_geom | geometry | | | | main | | scalerank | integer | | | | plain | | natscale | integer | | | | plain | | labelrank | integer | | | | plain | | featurecla | text | | | | extended | | name | text | | | | extended | | namepar | text | | | | extended | | namealt | text | | | | extended | | diffascii | integer | | | | plain | | nameascii | text | | | | extended | | adm0cap | double precision | | | | plain | | capalt | double precision | | | | plain | | capin | text | | | | extended | | worldcity | double precision | | | | plain | | megacity | integer | | | | plain | | sov0name | text | | | | extended | | sov_a3 | text | | | | extended | | adm0name | text | | | | extended | | adm0_a3 | text | | | | extended | | adm1name | text | | | | extended | | iso_a2 | text | | | | extended | | note | text | | | | extended | | latitude | double precision | | | | plain | | longitude | double precision | | | | plain | | changed | double precision | | | | plain | | namediff | integer | | | | plain | | diffnote | text | | | | extended | | pop_max | integer | | | | plain | | pop_min | integer | | | | plain | | pop_other | integer | | | | plain | | geonameid | double precision | | | | plain | | meganame | text | | | | extended | | ls_name | text | | | | extended | | ls_match | integer | | | | plain | | checkme | integer | | | | plain | | cartodb_id | integer | | | | plain | | created_at | timestamp with time zone | | | | plain | | updated_at | timestamp with time zone | | | | plain | | Indexes: "analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9_st_transform_idx" gist (st_transform(the_geom, 3857))
# Testing with PROJ 5.2 and postgis trunk:
# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 1860.093 ms (00:01.860)
I've run the same test several times and it always takes ~1.8s.
perf divided by .so:
Perf by .so: + 82.25% 0.00% [unknown] + 81.89% 18.38% libc-2.28.so + 67.54% 41.42% postgres + 58.06% 6.69% postgis-3.so + 15.06% 13.08% libproj.so.13.1.1 + 11.65% 0.56% libgeos_c.so.1.11.0 + 11.38% 8.73% libgeos-3.8.0.so + 6.17% 6.06% libm-2.28.so + 3.25% 2.99% [kernel.kallsyms] + 1.90% 0.05% libpthread-2.28.so + 1.84% 1.71% libstdc++.so.6.0.25 0.34% 0.34% ld-2.28.so 0.00% 0.00% [vdso]
# Same with PROJ 6 (just built from master, latest ref is fb125618fd18f112ed6f37662b021d07a602ff90), and also postgis trunk (rebuilt with the dependency):
test_windshaft_cartodb_user_1_db=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 7184.529 ms (00:07.185)
Running it several times shows it always takes ~7.2 secs, that's as slow as the build using PROJ 5.2.
Perf by .so:
Perf by .so: + 87.70% 33.62% libc-2.28.so + 79.96% 0.00% [unknown] + 61.70% 14.32% postgres + 59.86% 1.77% postgis-3.so + 45.09% 25.42% libproj.so.15.0.0 + 10.26% 8.80% libstdc++.so.6.0.25 + 7.02% 6.98% [kernel.kallsyms] + 3.72% 0.48% libicuuc.so.64.1 + 3.48% 3.42% libgcc_s.so.1 + 2.94% 0.19% libgeos_c.so.1.11.0 + 2.83% 2.29% libgeos-3.8.0.so + 1.95% 1.90% libm-2.28.so 1.02% 0.38% libpthread-2.28.so 0.37% 0.37% libsqlite3.so.0.8.6 0.06% 0.05% ld-2.28.so 0.00% 0.00% libk5crypto.so.3.1 0.00% 0.00% libxml2.so.2.9.9 0.00% 0.00% [vdso]
If I run only ST_Transform over the table, the times are more similar; proj6 is still slower but not 4x slower:
Running only ST_Transform (Select ST_Transform(the_geom, 3857) from analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;
)
PROJ 5 build: Time: 1558.226 ms (00:01.558) Time: 1570.106 ms (00:01.570) Time: 1565.649 ms (00:01.566)
PROJ 6 build: Time: 1764.034 ms (00:01.764) Time: 1710.738 ms (00:01.711) Time: 1732.082 ms (00:01.732)
Have we lost some caching in the proj6 update?
Change History (18)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Yup, I can't reproduce the 4x slowdown with your test case neither.
Here is a step by step method to reproduce my test case:
- Download the fixture (https://raw.githubusercontent.com/CartoDB/Windshaft-cartodb/master/test/support/sql/ported/populated_places_simple_reduced.sql)
- Create a clean database, install postgis and import the table (ignore the errors).
psql -d proj_test -f /tmp/populated_places_simple_reduced.sql
- Create the table:
CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS SELECT * FROM (SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query LIMIT 0
- Create the index:
CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857))
- Insert (this is using PROJ 5):
# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 1801.362 ms (00:01.801)
- Change to PROJ 6 and truncate the table (I restarted postgresql to make sure it's clean from the old .so's):
# truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4; FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. Time: 4.003 ms proj_test=# truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4; TRUNCATE TABLE Time: 60.029 ms
- Insert using the same query, now using PROJ 6:
# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 7515.885 ms (00:07.516)
Technically it's doing the same as yours, inserting points, but without doing transformations in the query and instead calling ST_Buffer, but I've tried removing ST_Transform in your queries and I still didn't see the slowdown.
comment:3 by , 6 years ago
Can you try to invert the order of your excercise, just in case we're missing some other state ? (like needed vacuum after truncate..)
comment:4 by , 6 years ago
Inverting the order I get the same result (proj6 is 4x slower):
# create database proj_test2; CREATE DATABASE postgres=# \c proj_test2; You are now connected to database "proj_test2" as user "postgres". proj_test2=# \timing on Timing is on. proj_test2=# create extension postgis; CREATE EXTENSION Time: 566.929 ms proj_test2=# \q
psql -d proj_test2 -f /tmp/populated_places_simple_reduced.sql CREATE TABLE CREATE SEQUENCE ALTER SEQUENCE ALTER TABLE [...]
$ psql -U postgres proj_test2 psql (11.2) Type "help" for help. proj_test2=# \timing on Timing is on. proj_test2=# CREATE TABLE analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 AS proj_test2-# SELECT * FROM (SELECT proj_test2(# ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at proj_test2(# FROM (select * from populated_places_simple_reduced) _camshaft_buffer) _analysis_create_table_query proj_test2-# LIMIT 0; SELECT 0 Time: 37.848 ms proj_test2=# CREATE INDEX ON analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 USING GIST (ST_Transform(the_geom, 3857)) proj_test2-# ; CREATE INDEX Time: 5.513 ms proj_test2=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT proj_test2-# ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at proj_test2-# FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 1813.043 ms (00:01.813) proj_test2=# \q
Install proj6 + postgis compiled against it:
$ sudo pacman -U proj-git/proj-git-5.2.0.r710.gfb125618f-1-x86_64.pkg.tar postgis-git/postgis-git-3.0_svn_trunk_LTO.12686.cb0e3bc69-1-x86_64.pkg.tar.proj6 [...]
$ psql -U postgres proj_test2 psql (11.2) Type "help" for help. proj_test2=# truncate analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 ; TRUNCATE TABLE proj_test2=# \timing on Timing is on. proj_test2=# INSERT INTO analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4 SELECT ST_Buffer(the_geom::geography, 50000)::geometry the_geom,scalerank,natscale,labelrank,featurecla,name,namepar,namealt,diffascii,nameascii,adm0cap,capalt,capin,worldcity,megacity,sov0name,sov_a3,adm0name,adm0_a3,adm1name,iso_a2,note,latitude,longitude,changed,namediff,diffnote,pop_max,pop_min,pop_other,geonameid,meganame,ls_name,ls_match,checkme,cartodb_id,created_at,updated_at FROM (select * from populated_places_simple_reduced) _camshaft_buffer; INSERT 0 7313 Time: 7516.018 ms (00:07.516) proj_test2=#
comment:5 by , 6 years ago
I've just noticed (trying proj6.1 again) that I get a lot of warning in postgres logs:
jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create: no database context specified jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create: no database context specified jun 14 12:31:13 Mordor postgres[27104]: proj_create: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: proj_create_operation_factory_context: Cannot find proj.db jun 14 12:31:13 Mordor postgres[27104]: pj_obj_create: Cannot find proj.db
My proj installation seems to be ok according to proj.pc:
cat /usr/lib/pkgconfig/proj.pc prefix=/usr exec_prefix=${prefix} libdir=${exec_prefix}/lib includedir=${prefix}/include datadir=${prefix}/share/proj Name: proj Description: Cartographic Projections Library. Requires: Version: 6.1.0 Libs: -L${libdir} -lproj Cflags: -I${includedir}
proj.db is in /usr/share/proj/proj.db
, which matches ${prefix}/share/proj/proj.db
.
I'm trying to understand if I need to setup an env variable, but I was expecting this to be automatic.
comment:6 by , 6 years ago
If I remove the call to SetPROJLibPath
to avoid the calls to proj_context_set_search_paths
the error messages disappear, but what used to be 1.8s with PROJ 5.2 or 7.6s with PROJ6 before this change now becomes 20x slower: 116.2secs.
comment:7 by , 6 years ago
Doing tests using svn-trunk as it is right now:
# Select count(*) from populated_places_simple_reduced; count ------- 7313 (1 row) projtest=# \d populated_places_simple_reduced Table "public.populated_places_simple_reduced" Column | Type | Collation | Nullable | Default ----------------------+--------------------------+-----------+----------+------------------------------------------------------------------ --- the_geom | geometry(Geometry,4326) | | | scalerank | integer | | | natscale | integer | | | labelrank | integer | | | featurecla | text | | | name | text | | | namepar | text | | | namealt | text | | | diffascii | integer | | | nameascii | text | | | adm0cap | double precision | | | capalt | double precision | | | capin | text | | | worldcity | double precision | | | megacity | integer | | | sov0name | text | | | sov_a3 | text | | | adm0name | text | | | adm0_a3 | text | | | adm1name | text | | | iso_a2 | text | | | note | text | | | latitude | double precision | | | longitude | double precision | | | changed | double precision | | | namediff | integer | | | diffnote | text | | | pop_max | integer | | | pop_min | integer | | | pop_other | integer | | | geonameid | double precision | | | meganame | text | | | ls_name | text | | | ls_match | integer | | | checkme | integer | | | cartodb_id | integer | | not null | nextval('populated_places_simple_reduced_cartodb_id_seq'::regclas s) created_at | timestamp with time zone | | not null | now() updated_at | timestamp with time zone | | not null | now() the_geom_webmercator | geometry(Geometry,3857) | | | Indexes: "populated_places_simple_reduced_pkey" PRIMARY KEY, btree (cartodb_id) "populated_places_simple_reduced_cartodb_id_key" UNIQUE CONSTRAINT, btree (cartodb_id) "populated_places_simple_reduced_the_geom_webmercator_idx" gist (the_geom_webmercator) "the_geom_ffebe516_1e32_11e4_ae44_6476ba93d4f6" gist (the_geom) # Select count(1), st_geometrytype(the_geom_webmercator) as t from populated_places_simple_reduced group by t ; count | t -------+---------- 1 | 7312 | ST_Point (2 rows)
PROJ 5.2:
Select st_transform(the_geom_webmercator, 4234) from populated_places_simple_reduced; Time: 22.917 ms (first time) Time: 14.330 ms (second time)
PROJ 6.1:
Time: 69.540 ms (first time) Time: 51.091 ms (second time)
This slowdown also appears when doing 3857 → 4326.
comment:8 by , 6 years ago
I have a WIP in https://github.com/postgis/postgis/pull/422 to address some of the issues found here.
The main fix to gain the performance back is to also cache the proj_crs_is_swapped calls to avoid making them in every ST_Transform call.
I'm still investigating further to try to understand why removing the invalid set_searchpath makes some calls 10x slower with PROJ6. Also I need to review that I haven't broken anything in proj5
comment:9 by , 6 years ago
Summary: | PROJ 6: 4x slowdown on indexes over ST_Transform → PROJ 6: 4x slowdown over points |
---|
Now testing without removing the invalid set_searchpath and caching proj_crs_is_swapped, ST_Transform has a similar performance between 5.2 and 6.
Once I've investigated further into the index issue, I've noticed that the slowdown isn't related to indexing / caching, but to ST_Buffer(geography):
SQL:
explain analyze SELECT ST_Buffer(the_geom::geography, 50000) from analysis_2f13a3dbd7_9de00c87a82f0929cfc345d2d9f1d01e05811bc4;
PROJ 5.2: 640.939 ms PROJ 6.1: 2112.235 ms
That seems to be related to the multiple projections created by spheroid_init_from_srid
, which is called for every row. I'll dig deeper.
Still pending: Check why removing the invalid set_searchpath makes some calls 10x slower with PROJ6.
comment:10 by , 6 years ago
Digging deeper: these are the functions affected by the slowdown (apart from transform, which is fixed by one of my patches):
TWKBFromLWGEOM
.
TWKBFromLWGEOMArray
.
geography_in
/geography_from_text
/geography_from_geometry
/geography_from_geometry
/geography_recv
/geography_typmod_in
as they callsrid_check_latlong
.
geography_centroid
/geography_distance_knn
/geography_distance_uncached
/geography_distance
/geography_dwithin
/geography_distance_tree
/geography_dwithin_uncached
/geography_area
/geography_perimeter
/geography_length
/geography_project
/geography_azimuth
as they callspheroid_init_from_srid
.
An example with the same query using ST_AsTWKB: 404ms in PROJ5 vs 1230ms in PROJ6.
comment:11 by , 6 years ago
I've replaced the PR with https://github.com/postgis/postgis/pull/423
What this does:
- Cleans up configure.ac when running under PROJ6 (avoids error message that its ignored).
- PROJ 6+ (and only 6+):
- Caches source_swapped / target_swapped for all PJs. This brings ST_Transform to, more or less, almost the same performance as 5.1 (except the first row, which creates the PJ).
- Caches source_is_latlong / source_semi_major_metre / source_semi_minor_metre when required (source srid == target srid). This brings a bunch of functions (ST_AsTWKB, geography functions) to, more or less, the same performance as before (except the first row, which creates the PJ).
- Uses
proj_trans
instead ofproj_trans_generic
for single points (faster and makes it on par of 5.2 performance for that case).
- PROJ (all releases):
- Allocate auxiliar memory (LWPROJ / PJ) in the parent memory context instead of manually using malloc/free.
- Avoid double lookup (one to check if the item is in the cache, one for retrieval).
- Increases PROJ_CACHE_ITEMS from 16 to 128. PROJ_BACKEND_HASH_SIZE goes from 32 to 256. This is to minimize the huge impact of creating projections between 5.2 and 6.1, specially for ST_Buffer(geography) which autogenerates srids based on
SRID_RESERVE_OFFSET
. There are over 120+ used SRIDs in that range, so deleting and recreating PJs is a huge performance loss (it isn't as bad for 5.1 since PJ creation is much faster). - Removes the call to
proj_context_set_search_paths
. AFAIK, we don't distribute proj files anymore so this is legacy. It's also preventing the default proj installation of finding proj.db and using the fallback method (faster but more imprecise). If it breaks something, I'll consider reading it just for Windows (as it's mentioned in the comments of the function).
Any comments are welcome. I'll try to break this down into several commits for clarity.
I tried this test case and didn't see a substantial difference in performance between inserting into a table with a functional index on ST_Transform and inserting into a table with an ordinary index