#3092 closed enhancement (fixed)
Slow performance of geometry_columns in PostGIS >= 2
Reported by: | rouault | Owned by: | robe |
---|---|---|---|
Priority: | high | Milestone: | PostGIS 2.1.8 |
Component: | postgis | Version: | 2.0.x |
Keywords: | history | Cc: | strk |
Description
Original discussion in this thread http://lists.osgeo.org/pipermail/gdal-dev/2015-March/041299.html
Querying geometry_columns exhibits poor performance (perhaps quadratic in number of spatial tables) due to geometry_columns potentially calling postgis_constraint_dims(), postgis_constraint_srid() and postgis_constraint_type(), because each causes a SELECT to be executed for each row if some typmod constraints are missing.
The following request gives similar information but much faster:
select c.relname, a.attname, t.typname, postgis_typmod_dims(a.atttypmod) dim, postgis_typmod_srid(a.atttypmod) srid, postgis_typmod_type(a.atttypmod)::text typ, array_agg(s.consrc)::text FROM pg_class c join pg_attribute a on a.attrelid=c.oid join pg_namespace n on c.relnamespace = n.oid AND (c.relkind = 'r'::"char" OR c.relkind = 'v'::"char" OR c.relkind = 'm'::"char" OR c.relkind = 'f'::"char") AND NOT ( n.nspname = 'public' AND c.relname = 'raster_columns' ) join pg_type t on a.atttypid = t.oid and (t.typname = 'geometry'::name OR t.typname = 'geography'::name) left join pg_constraint s on s.connamespace = n.oid AND s.conrelid = c.oid AND a.attnum = ANY (s.conkey) AND (s.consrc LIKE '%geometrytype(% = %' OR s.consrc LIKE '%ndims(% = %' OR s.consrc LIKE '%srid(% = %') GROUP BY c.relname, a.attname, t.typname, dim, srid, typ, c.oid ORDER BY c.oid
Some analysis of the aggregated constraints similar to what is done in postgis_constraint_XXXX() should be done to merge that info with the one of postgis_typmod_XXXX()
Change History (25)
comment:1 by , 10 years ago
Milestone: | PostGIS 2.1.7 → PostGIS 2.1.8 |
---|---|
Priority: | medium → high |
Type: | defect → patch |
comment:2 by , 10 years ago
comment:4 by , 10 years ago
robe do you want to try this with your amazing sql skillz? Basically start from the ARRAYs that Even is spitting out on the constraints and derive the SRID/TYPE/etc, avoids running a query for each column value as currently done.
comment:5 by , 10 years ago
How can I resist if you put it that way? Okay will work on it this weekend.
comment:7 by , 10 years ago
Try out r13443, which replaces the many extra queries with simple loops on arrays built up in a prior query modelled on Even's.
comment:8 by , 10 years ago
Hmm I had a different thought that doesn't use arrays at all, but I'll compare yours with what I was going to try.
I had created this script to test out the speed. The 10 I tried to bump the 10 to 1000 but ground to a halt creating those tables on the machine I was testing on. Might have been cause I was testing on an old 9.3 beta.
DROP SCHEMA IF EXISTS test CASCADE; CREATE SCHEMA test; DO language plpgsql $$ DECLARE var_sql text; BEGIN SELECT string_agg('CREATE TABLE test.constraint_test_' || lower(f.geom) || '_' || i::text || '(gid serial); SELECT addgeometrycolumn(''test'', ' || quote_literal('constraint_test_' || lower(f.geom) || '_' || i::text) || ' , ''geom'', ' || f.srid::text || ',' || quote_literal(f.geom) || ', 2, false);', '') INTO var_sql FROM (VALUES ('POINT'::text, 4326), ('MULTIPOLYGON'::text, 4269), ('LINESTRING'::text, 4326)) f(geom,srid) , generate_series(1,10) As i; EXECUTE var_sql; END; $$
comment:9 by , 10 years ago
pramsey at a glance I see a problem. You don't want to use a WITH clause here because it will force having to query all rows before you can filter. I suspect that won't scale with large numbers of tables.
comment:10 by , 10 years ago
Okay here is my proposed. No dependencies on any functions and for my simple tests the speed seems much better than existing and slower than yours for full table set but generally faster for filtered set of tables.
CREATE OR REPLACE VIEW geometry_columns AS SELECT current_database()::varchar(256) AS f_table_catalog, n.nspname::varchar(256) AS f_table_schema, c.relname::varchar(256) AS f_table_name, a.attname::varchar(256) AS f_geometry_column, COALESCE(postgis_typmod_dims(atttypmod), s.dims, 2) AS coord_dimension, COALESCE(NULLIF(postgis_typmod_srid(atttypmod),0), s.srid, 0) AS srid, -- force to be uppercase with no ZM so is backwards compatible -- with old geometry_columns replace( replace( COALESCE( NULLIF(upper(postgis_typmod_type(atttypmod)::text), 'GEOMETRY'), s.type, 'GEOMETRY' ), 'ZM', '' ), 'Z', '' )::varchar(30) AS type FROM pg_class c JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped JOIN pg_namespace n ON c.relnamespace = n.oid JOIN pg_type t ON a.atttypid = t.oid LEFT JOIN (SELECT connamespace, conrelid, conkey , MAX( CASE WHEN consrc ILIKE '%geometrytype(% = %' THEN replace(split_part(consrc, '''', 2), ')', '') ELSE NULL END ) As type , MAX( CASE WHEN consrc ILIKE '%srid(% = %' THEN replace(replace(split_part(consrc, ' = ', 2), ')', ''), '(', '')::integer ELSE NULL END) As srid , MAX( CASE WHEN consrc ILIKE '%ndims(% = %' THEN replace(split_part(consrc, ' = ', 2), ')', '')::integer ELSE NULL END) As dims FROM pg_constraint WHERE consrc ILIKE '%geometrytype(% = %' OR consrc ILIKE '%ndims(% = %' OR consrc ILIKE '%srid(% = %' GROUP BY connamespace, conrelid, conkey ) AS s ON s.connamespace = n.oid AND s.conrelid = c.oid AND a.attnum = ANY (s.conkey) WHERE c.relkind IN ('r'::"char", 'v'::"char", 'm'::"char", 'f'::"char") AND NOT ( c.relname = 'raster_columns' ) AND t.typname = 'geometry'::name AND NOT pg_is_other_temp_schema(c.relnamespace) AND has_table_privilege( c.oid, 'SELECT'::text );
For full table scan your version beats by a wide margin. For a 1500 table set yours comes out
-- Paul's 140 ms, Regina's 440ms, original 1860ms SELECT * from geometry_columns;
— For filtered the story is a little different
-- Paul's 80ms, Regina's 40ms, original 640ms -- 500 rows returned SELECT * FROM geometry_columns WHERE f_table_name LIKE 'constraint%point%';
I still need to test it with multi geom column tables and typmod to make sure no consequences I missed and also a bigger set of tables. I only tested with 1500 tables all of constraint type and no typmod so not a sufficiently complete test.
comment:11 by , 10 years ago
Hmm and yet another version — lateral beats all but would require putting in conditional for > 9.2
CREATE OR REPLACE VIEW geometry_columns_lateral AS SELECT current_database()::varchar(256) AS f_table_catalog, n.nspname::varchar(256) AS f_table_schema, c.relname::varchar(256) AS f_table_name, a.attname::varchar(256) AS f_geometry_column, COALESCE(postgis_typmod_dims(atttypmod), s.dims, 2) AS coord_dimension, COALESCE(NULLIF(postgis_typmod_srid(atttypmod),0), s.srid, 0) AS srid, -- force to be uppercase with no ZM so is backwards compatible -- with old geometry_columns replace( replace( COALESCE( NULLIF(upper(postgis_typmod_type(atttypmod)::text), 'GEOMETRY'), s.type, 'GEOMETRY' ), 'ZM', '' ), 'Z', '' )::varchar(30) AS type FROM pg_class c JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped JOIN pg_namespace n ON c.relnamespace = n.oid JOIN pg_type t ON a.atttypid = t.oid LEFT JOIN LATERAL (SELECT connamespace, conrelid, conkey , MAX( CASE WHEN consrc ILIKE '%geometrytype(% = %' THEN replace(split_part(consrc, '''', 2), ')', '') ELSE NULL END ) As type , MAX( CASE WHEN consrc ILIKE '%srid(% = %' THEN replace(replace(split_part(consrc, ' = ', 2), ')', ''), '(', '')::integer ELSE NULL END) As srid , MAX( CASE WHEN consrc ILIKE '%ndims(% = %' THEN replace(split_part(consrc, ' = ', 2), ')', '')::integer ELSE NULL END) As dims FROM pg_constraint As s WHERE (consrc LIKE '%geometrytype(% = %' OR consrc LIKE '%ndims(% = %' OR consrc LIKE '%srid(% = %' ) AND s.connamespace = n.oid AND s.conrelid = c.oid AND a.attnum = ANY (s.conkey) GROUP BY connamespace, conrelid, conkey ) AS s ON true WHERE c.relkind IN ('r'::"char", 'v'::"char", 'm'::"char", 'f'::"char") AND NOT ( c.relname = 'raster_columns' ) AND t.typname = 'geometry'::name AND NOT pg_is_other_temp_schema(c.relnamespace) AND has_table_privilege( c.oid, 'SELECT'::text );
-- 120 ms SELECT * FROM geometry_columns_lateral; -- 40 ms SELECT * FROM geometry_columns_lateral WHERE f_table_name LIKE 'constraint%point%';
comment:12 by , 10 years ago
I'm a bit torn -
I did more testing and posted my results on dev mailing list.
http://lists.osgeo.org/pipermail/postgis-devel/2015-April/024838.html
comment:13 by , 10 years ago
errata — leaning toward solution 2 and 3 (with a conditional to use 3 for 9.3+)
http://lists.osgeo.org/pipermail/postgis-devel/2015-April/024839.html
comment:14 by , 10 years ago
This one seems to be a winner. I'm going to commit today unless there are objections. Everything is done in the view (no extra functions), no use of new syntax like LATERAL so works for any version of PostgreSQL I can think of and it beats or matches all contenders on all tests. It uses 3 joins instead of 1 to constraint table.
-- STATS on my 30,000 geometry column set -- -- -- 2080-2090 ms (30,000 records - same speed as lateral, a little faster than committed array solution which was -- 2520ms - 2830 ms for same test ) SELECT * FROM geometry_columns; -- 380 ms - 5,000 rows (constraint columns beats all solutions including lateral one, current committed array is -- 1260 ms -- 5,000 rows) SELECT * FROM geometry_columns WHERE type = 'POINT'; -- 1410ms 20,000 rows (compared to array which is 1840ms, lateral 1400 ms) SELECT * from geometry_columns where type = 'LINESTRING'; -- Exact table match test 2 rows 60 ms, compared to 510 ms array approach and 220 ms lateral. -- The time goes down to 20 ms if I remove the varchar(256) casting SELECT * from geometry_columns WHERE f_table_name = 'constraint_test_linestring_1382';
— view is below —
CREATE OR REPLACE VIEW geometry_columns AS SELECT current_database()::character varying(256) AS f_table_catalog, -- TODO consider getting rid of CASTING it slows things down by 2 to 6 fold for table/column exact matches. -- Geography doesn't have casting by the way n.nspname::varchar(256) AS f_table_schema, c.relname::varchar(256) AS f_table_name, a.attname::varchar(256) AS f_geometry_column, COALESCE(postgis_typmod_dims(a.atttypmod), sn.ndims, 2) AS coord_dimension, COALESCE(NULLIF(postgis_typmod_srid(a.atttypmod), 0), sr.srid, 0) AS srid, replace(replace(COALESCE(NULLIF(upper(postgis_typmod_type(a.atttypmod)), 'GEOMETRY'::text), st.type, 'GEOMETRY'::text), 'ZM'::text, ''::text), 'Z'::text, ''::text)::character varying(30) AS type FROM pg_class c JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped JOIN pg_namespace n ON c.relnamespace = n.oid JOIN pg_type t ON a.atttypid = t.oid LEFT JOIN ( SELECT s.connamespace, s.conrelid, s.conkey, replace(split_part(s.consrc, ''''::text, 2), ')'::text, ''::text) As type FROM pg_constraint AS s WHERE s.consrc ~~* '%geometrytype(% = %'::text ) st ON st.connamespace = n.oid AND st.conrelid = c.oid AND (a.attnum = ANY (st.conkey)) LEFT JOIN ( SELECT s.connamespace, s.conrelid, s.conkey, replace(split_part(s.consrc, ' = '::text, 2), ')'::text, ''::text)::integer As ndims FROM pg_constraint AS s WHERE s.consrc ~~* '%ndims(% = %'::text ) sn ON sn.connamespace = n.oid AND sn.conrelid = c.oid AND (a.attnum = ANY (sn.conkey)) LEFT JOIN ( SELECT s.connamespace, s.conrelid, s.conkey, replace(replace(split_part(s.consrc, ' = '::text, 2), ')'::text, ''::text), '('::text, ''::text)::integer As srid FROM pg_constraint AS s WHERE s.consrc ~~* '%srid(% = %'::text ) sr ON sr.connamespace = n.oid AND sr.conrelid = c.oid AND (a.attnum = ANY (sr.conkey)) WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'm'::"char", 'f'::"char"])) AND NOT c.relname = 'raster_columns'::name AND t.typname = 'geometry'::name AND NOT pg_is_other_temp_schema(c.relnamespace) AND has_table_privilege(c.oid, 'SELECT'::text);
comment:15 by , 10 years ago
Even better, with this new solution I can inline Paul's better regex logic. Since I'm no longer using an aggregate, using a set returning function is okay.
comment:16 by , 10 years ago
Once I can compile again after #3102 is fixed, I'll commit my changes.
pramsey, BTW I tried inlining the regex_match line of your functions in the new query and while I can, it has a performance penalty.
The timing for all pull then becomes the same as the geom array solution. So I lose 450ms of time and some time on the others as well (that was just trying with replacing the srid one). I'm debating if that is worthwhile especially since the existing solution (used since PostGIS 2.0 doesn't use regexp).
comment:17 by , 10 years ago
committed change at r13446 to trunk.
I'm going to do some tests on lower versions of PostgreSQL to make sure performance is just as good. Only did testing on 9.4 w 64-bit windows.
If performance is good, I'll backport this to 2.1 and add a regression test.
strk — you know that interrupatability voodoo you got going that does timing — is it possible to reuse the timing logic to incorporate a timing test for this?
comment:18 by , 10 years ago
Owner: | changed from | to
---|
comment:19 by , 10 years ago
Cc: | added |
---|
comment:20 by , 10 years ago
Backported to 2.1 branch at r13446. I did a couple of quick tests thru 9.2. I'll let the bots check 9.1 and below and we probably should put some timings in.
There was a slight performance hit with SELECT count(*) in all platforms I tested. But that performance hit is considerably low compared to the large performance gains in other areas. So here is what I have:
-- generate 30,000 geometry columns -- ALTER SYSTEM SET max_locks_per_transaction = '100000'; -- pre 9.4 need to edit postgres.conf, after change need to restart service. DROP SCHEMA IF EXISTS test CASCADE; CREATE SCHEMA test; --250620 ms need to increase max -- need to increase max_locks_per_transaction; DO language plpgsql $$ DECLARE var_sql text; BEGIN SELECT string_agg('CREATE TABLE test.constraint_test_' || lower(f.geom) || '_' || i::text || '(gid serial, geom2 geometry(LINESTRING,2249) ); SELECT addgeometrycolumn(''test'', ' || quote_literal('constraint_test_' || lower(f.geom) || '_' || i::text) || ' , ''geom'', ' || f.srid::text || ',' || quote_literal(f.geom) || ', 2, false);', '') INTO var_sql FROM (VALUES ('POINT'::text, 4326), ('MULTIPOLYGON'::text, 4269), ('LINESTRING'::text, 4326)) f(geom,srid) , generate_series(1,5000) As i; EXECUTE var_sql; END; $$
The test results on 30,000 geometry columns database, old compared to new 3 join approach. I also marked those helper functions used in the old view as deprecated in PostGIS 2.2 (for future removal). Leaving in just in case some 3rd party tools are using them.
-- 30000, 9.5 old 340 ms, new 740ms -- 9.3 old 270 ms, new 480ms -- 9.2 old 330 ms, new 450 ms SELECT count(*) from geometry_columns; -- 30000 records, 9.5alpha old 222,260 ms, new 2030 ms -- 9.3 old 349,659ms, new 2080 ms -- 9.2.10 old > 730,000 ms (stopped the process), new 2320 ms SELECT * FROM geometry_columns; -- 5000 records, 9.5 old 147,280 ms, new 560 ms -- 9.3 old 127,900 ms, new 380 ms -- 9.2 old - didn't bother, new 570 ms SELECT * FROM geometry_columns WHERE type = 'POINT'; -- 20000 records, old 9.5 148,270 ms, new 1370 ms -- 9.3 old 258,801 ms, new 1410 ms -- 9.2.10 old didn't bother, new 1630 ms SELECT * FROM geometry_columns WHERE type = 'LINESTRING'; -- 2 records, 9.5 old 210ms, new 9.5 70-160 ms -- 9.3 old 220ms, new 170-180 ms -- 9.2.10 new 200-210ms SELECT * FROM geometry_columns WHERE f_table_name = 'constraint_test_linestring_1382';
comment:21 by , 10 years ago
Keywords: | history added |
---|
comment:23 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
closing this out but will add another for regress performance test
comment:24 by , 10 years ago
Robe the interrupt hack creates a table writing in the value of now() and then checks the value again against now() after performing its thing. It's not perfect. Ideally it should also consider some kind of "standard time" it takes to do things so to use "relative time".
comment:25 by , 10 years ago
Hmm why aren't you using clock_timestamp(). I think that is less prone to vagaries of transactions.
BTW did a quick test with if you have just typmod tables (30,000 columns) on 9.3, and happy to say for a pure typmod system, the timings of old and new are the same (aside from the count(*) which still suffers a penalty for some reason).
(Regarding the "patch" qualification: the above SQL isn't a replacement for the current view definition, so that would require some extra work)