Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#3092 closed enhancement (fixed)

Slow performance of geometry_columns in PostGIS >= 2

Reported by: rouault Owned by: robe
Priority: high Milestone: PostGIS 2.1.8
Component: postgis Version: 2.0.x
Keywords: history Cc: strk

Description

Original discussion in this thread http://lists.osgeo.org/pipermail/gdal-dev/2015-March/041299.html

Querying geometry_columns exhibits poor performance (perhaps quadratic in number of spatial tables) due to geometry_columns potentially calling postgis_constraint_dims(), postgis_constraint_srid() and postgis_constraint_type(), because each causes a SELECT to be executed for each row if some typmod constraints are missing.

The following request gives similar information but much faster:

select c.relname, 
       a.attname,
       t.typname,
       postgis_typmod_dims(a.atttypmod) dim,
       postgis_typmod_srid(a.atttypmod) srid,
       postgis_typmod_type(a.atttypmod)::text typ,
       array_agg(s.consrc)::text
FROM pg_class c join pg_attribute a on a.attrelid=c.oid
                join pg_namespace n on c.relnamespace = n.oid
                AND (c.relkind = 'r'::"char" OR c.relkind = 'v'::"char" OR c.relkind = 'm'::"char" OR c.relkind = 'f'::"char")
                AND NOT ( n.nspname = 'public' AND c.relname = 
'raster_columns' )
                join pg_type t on a.atttypid = t.oid and (t.typname = 
'geometry'::name OR t.typname = 'geography'::name)
                left join pg_constraint s on s.connamespace = n.oid AND 
s.conrelid = c.oid
                          AND a.attnum = ANY (s.conkey)
                          AND (s.consrc LIKE '%geometrytype(% = %' OR s.consrc LIKE '%ndims(% = %' OR s.consrc LIKE '%srid(% = %')
GROUP BY c.relname, a.attname, t.typname, dim, srid, typ, c.oid
ORDER BY c.oid

Some analysis of the aggregated constraints similar to what is done in postgis_constraint_XXXX() should be done to merge that info with the one of postgis_typmod_XXXX()

Change History (25)

comment:1 by robe, 9 years ago

Milestone: PostGIS 2.1.7PostGIS 2.1.8
Priority: mediumhigh
Type: defectpatch

comment:2 by rouault, 9 years ago

(Regarding the "patch" qualification: the above SQL isn't a replacement for the current view definition, so that would require some extra work)

comment:3 by robe, 9 years ago

Type: patchenhancement

oh :)

comment:4 by pramsey, 9 years ago

robe do you want to try this with your amazing sql skillz? Basically start from the ARRAYs that Even is spitting out on the constraints and derive the SRID/TYPE/etc, avoids running a query for each column value as currently done.

comment:5 by robe, 9 years ago

How can I resist if you put it that way? Okay will work on it this weekend.

comment:6 by robe, 9 years ago

See also #2976 and #2365

comment:7 by pramsey, 9 years ago

Try out r13443, which replaces the many extra queries with simple loops on arrays built up in a prior query modelled on Even's.

comment:8 by robe, 9 years ago

Hmm I had a different thought that doesn't use arrays at all, but I'll compare yours with what I was going to try.

I had created this script to test out the speed. The 10 I tried to bump the 10 to 1000 but ground to a halt creating those tables on the machine I was testing on. Might have been cause I was testing on an old 9.3 beta.

DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;

DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
  SELECT string_agg('CREATE TABLE test.constraint_test_' || lower(f.geom) || '_' || i::text || '(gid serial);
  SELECT addgeometrycolumn(''test'', ' 
  || quote_literal('constraint_test_' || lower(f.geom) || '_' || i::text) || ' , ''geom'', 
	' || f.srid::text || ',' || quote_literal(f.geom) || ', 2, false);', '') INTO var_sql
  FROM (VALUES ('POINT'::text, 4326), ('MULTIPOLYGON'::text, 4269), ('LINESTRING'::text, 4326)) f(geom,srid) 
		, generate_series(1,10) As i;
  EXECUTE var_sql;
END;
$$

comment:9 by robe, 9 years ago

pramsey at a glance I see a problem. You don't want to use a WITH clause here because it will force having to query all rows before you can filter. I suspect that won't scale with large numbers of tables.

comment:10 by robe, 9 years ago

Okay here is my proposed. No dependencies on any functions and for my simple tests the speed seems much better than existing and slower than yours for full table set but generally faster for filtered set of tables.

CREATE OR REPLACE VIEW geometry_columns AS 
SELECT  current_database()::varchar(256) AS f_table_catalog,
  n.nspname::varchar(256) AS f_table_schema, 
  c.relname::varchar(256) AS f_table_name,  
  a.attname::varchar(256) AS f_geometry_column,
  COALESCE(postgis_typmod_dims(atttypmod),
           s.dims,
           2) AS coord_dimension, 
 COALESCE(NULLIF(postgis_typmod_srid(atttypmod),0),
           s.srid,
           0) AS srid, 
  -- force to be uppercase with no ZM so is backwards compatible
  -- with old geometry_columns
  replace(
    replace(
      COALESCE(
        NULLIF(upper(postgis_typmod_type(atttypmod)::text), 'GEOMETRY'),
       s.type,
        'GEOMETRY'
      ), 'ZM', ''
    ), 'Z', ''
  )::varchar(30) AS type
FROM pg_class c 
JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped
JOIN pg_namespace n ON c.relnamespace = n.oid
JOIN pg_type t ON a.atttypid = t.oid 
LEFT JOIN 
(SELECT connamespace, conrelid, conkey
    , MAX( CASE WHEN consrc ILIKE '%geometrytype(% = %'  THEN replace(split_part(consrc, '''', 2), ')', '') ELSE NULL END ) As type
, MAX( CASE WHEN consrc ILIKE '%srid(% = %' THEN  replace(replace(split_part(consrc, ' = ', 2), ')', ''), '(', '')::integer  ELSE NULL END) As srid
, MAX( CASE WHEN consrc ILIKE '%ndims(% = %' THEN replace(split_part(consrc, ' = ', 2), ')', '')::integer ELSE NULL END) As dims
  FROM pg_constraint 
WHERE consrc ILIKE '%geometrytype(% = %' OR consrc ILIKE '%ndims(% = %' OR consrc ILIKE '%srid(% = %' 
 GROUP BY connamespace, conrelid, conkey
 ) AS s ON s.connamespace = n.oid 
  AND s.conrelid = c.oid 
  AND a.attnum = ANY (s.conkey) 
WHERE c.relkind IN ('r'::"char", 'v'::"char", 'm'::"char", 'f'::"char") 
AND NOT ( c.relname = 'raster_columns' )
AND t.typname = 'geometry'::name
AND NOT pg_is_other_temp_schema(c.relnamespace)
AND has_table_privilege( c.oid, 'SELECT'::text );

For full table scan your version beats by a wide margin. For a 1500 table set yours comes out

-- Paul's 140 ms, Regina's 440ms, original 1860ms
SELECT * from geometry_columns;

— For filtered the story is a little different

-- Paul's 80ms, Regina's 40ms, original  640ms
-- 500 rows returned
SELECT * FROM geometry_columns
WHERE f_table_name LIKE 'constraint%point%';

I still need to test it with multi geom column tables and typmod to make sure no consequences I missed and also a bigger set of tables. I only tested with 1500 tables all of constraint type and no typmod so not a sufficiently complete test.

comment:11 by robe, 9 years ago

Hmm and yet another version — lateral beats all but would require putting in conditional for > 9.2

CREATE OR REPLACE VIEW geometry_columns_lateral AS 
SELECT  current_database()::varchar(256) AS f_table_catalog,
  n.nspname::varchar(256) AS f_table_schema, 
  c.relname::varchar(256) AS f_table_name,  
  a.attname::varchar(256) AS f_geometry_column,
  COALESCE(postgis_typmod_dims(atttypmod),
           s.dims,
           2) AS coord_dimension, 
 COALESCE(NULLIF(postgis_typmod_srid(atttypmod),0),
           s.srid,
           0) AS srid, 
  -- force to be uppercase with no ZM so is backwards compatible
  -- with old geometry_columns
  replace(
    replace(
      COALESCE(
        NULLIF(upper(postgis_typmod_type(atttypmod)::text), 'GEOMETRY'),
       s.type,
        'GEOMETRY'
      ), 'ZM', ''
    ), 'Z', ''
  )::varchar(30) AS type
FROM pg_class c 
JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped
JOIN pg_namespace n ON c.relnamespace = n.oid
JOIN pg_type t ON a.atttypid = t.oid 
LEFT JOIN LATERAL 
(SELECT connamespace, conrelid, conkey
    , MAX( CASE WHEN consrc ILIKE '%geometrytype(% = %'  THEN replace(split_part(consrc, '''', 2), ')', '') ELSE NULL END ) As type
, MAX( CASE WHEN consrc ILIKE '%srid(% = %' THEN  replace(replace(split_part(consrc, ' = ', 2), ')', ''), '(', '')::integer  ELSE NULL END) As srid
, MAX( CASE WHEN consrc ILIKE '%ndims(% = %' THEN replace(split_part(consrc, ' = ', 2), ')', '')::integer ELSE NULL END) As dims
  FROM pg_constraint As s
WHERE (consrc LIKE '%geometrytype(% = %' OR consrc LIKE '%ndims(% = %' OR consrc LIKE '%srid(% = %' )
AND s.connamespace = n.oid 
  AND s.conrelid = c.oid 
  AND a.attnum = ANY (s.conkey) 
 GROUP BY connamespace, conrelid, conkey
 ) AS s ON true
WHERE c.relkind IN ('r'::"char", 'v'::"char", 'm'::"char", 'f'::"char") 
AND NOT ( c.relname = 'raster_columns' )
AND t.typname = 'geometry'::name
AND NOT pg_is_other_temp_schema(c.relnamespace)
AND has_table_privilege( c.oid, 'SELECT'::text );
-- 120 ms
SELECT * FROM geometry_columns_lateral; 

-- 40 ms
SELECT * FROM geometry_columns_lateral
WHERE f_table_name LIKE 'constraint%point%';

comment:12 by robe, 9 years ago

I'm a bit torn -

I did more testing and posted my results on dev mailing list.

http://lists.osgeo.org/pipermail/postgis-devel/2015-April/024838.html

comment:13 by robe, 9 years ago

errata — leaning toward solution 2 and 3 (with a conditional to use 3 for 9.3+)

http://lists.osgeo.org/pipermail/postgis-devel/2015-April/024839.html

comment:14 by robe, 9 years ago

This one seems to be a winner. I'm going to commit today unless there are objections. Everything is done in the view (no extra functions), no use of new syntax like LATERAL so works for any version of PostgreSQL I can think of and it beats or matches all contenders on all tests. It uses 3 joins instead of 1 to constraint table.

-- STATS on my 30,000 geometry column set --

-- -- 2080-2090 ms (30,000 records - same speed as lateral, a little faster than committed array solution which was -- 2520ms - 2830 ms for same test )
SELECT *  FROM  geometry_columns;

-- 380 ms - 5,000 rows (constraint columns beats all solutions including lateral one,
current committed array is -- 1260 ms -- 5,000 rows) 
SELECT * FROM  geometry_columns WHERE type = 'POINT';

-- 1410ms 20,000 rows (compared to array which is 1840ms, lateral 1400 ms)
SELECT * from geometry_columns_robe4   
where type = 'LINESTRING'; 


-- Exact table match test 2 rows 60 ms, compared to 510 ms array approach and 220 ms lateral.  
-- The time goes down to 20 ms if I remove the varchar(256) casting
SELECT * from geometry_columns
WHERE  f_table_name = 'constraint_test_linestring_1382'; 



-- 1490ms- 1500 ms   - 20,000 rows (mix of constraint and  columns) 
SELECT * FROM  geometry_columns WHERE type = 'LINESTRING';

-- 220 - 240 ms
SELECT * from geometry_columns
WHERE  f_table_name = 'constraint_test_linestring_1382';

— view is below —

CREATE OR REPLACE VIEW geometry_columns AS 
 SELECT current_database()::character varying(256) AS f_table_catalog,
-- TODO consider getting rid of CASTING it slows things down by 2 to 6 fold for table/column exact matches.  
--   Geography doesn't have casting by the way
    n.nspname::varchar(256) AS f_table_schema,
    c.relname::varchar(256) AS f_table_name,
    a.attname::varchar(256) AS f_geometry_column,
    COALESCE(postgis_typmod_dims(a.atttypmod), sn.ndims, 2) AS coord_dimension,
    COALESCE(NULLIF(postgis_typmod_srid(a.atttypmod), 0), sr.srid, 0) AS srid,
    replace(replace(COALESCE(NULLIF(upper(postgis_typmod_type(a.atttypmod)), 'GEOMETRY'::text), st.type, 'GEOMETRY'::text), 'ZM'::text, ''::text), 'Z'::text, ''::text)::character varying(30) AS type
   FROM pg_class c
     JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped
     JOIN pg_namespace n ON c.relnamespace = n.oid
     JOIN pg_type t ON a.atttypid = t.oid
     LEFT JOIN ( SELECT s.connamespace,
            s.conrelid,
            s.conkey, replace(split_part(s.consrc, ''''::text, 2), ')'::text, ''::text) As type
           FROM pg_constraint AS s
          WHERE s.consrc ~~* '%geometrytype(% = %'::text 
          
) st ON st.connamespace = n.oid AND st.conrelid = c.oid AND (a.attnum = ANY (st.conkey))
     LEFT JOIN ( SELECT s.connamespace,
            s.conrelid,
            s.conkey, replace(split_part(s.consrc, ' = '::text, 2), ')'::text, ''::text)::integer As ndims
           FROM pg_constraint AS s
          WHERE s.consrc ~~* '%ndims(% = %'::text
          
) sn ON sn.connamespace = n.oid AND sn.conrelid = c.oid AND (a.attnum = ANY (sn.conkey))
     LEFT JOIN ( SELECT s.connamespace,
            s.conrelid,
            s.conkey, replace(replace(split_part(s.consrc, ' = '::text, 2), ')'::text, ''::text), '('::text, ''::text)::integer As srid
           FROM pg_constraint AS s
          WHERE s.consrc ~~* '%srid(% = %'::text
          
) sr ON sr.connamespace = n.oid AND sr.conrelid = c.oid AND (a.attnum = ANY (sr.conkey))
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'm'::"char", 'f'::"char"])) AND NOT c.relname = 'raster_columns'::name AND t.typname = 'geometry'::name AND NOT pg_is_other_temp_schema(c.relnamespace) AND has_table_privilege(c.oid, 'SELECT'::text);

Version 0, edited 9 years ago by robe (next)

comment:15 by robe, 9 years ago

Even better, with this new solution I can inline Paul's better regex logic. Since I'm no longer using an aggregate, using a set returning function is okay.

comment:16 by robe, 9 years ago

Once I can compile again after #3102 is fixed, I'll commit my changes.

pramsey, BTW I tried inlining the regex_match line of your functions in the new query and while I can, it has a performance penalty.

The timing for all pull then becomes the same as the geom array solution. So I lose 450ms of time and some time on the others as well (that was just trying with replacing the srid one). I'm debating if that is worthwhile especially since the existing solution (used since PostGIS 2.0 doesn't use regexp).

comment:17 by robe, 9 years ago

committed change at r13446 to trunk.

I'm going to do some tests on lower versions of PostgreSQL to make sure performance is just as good. Only did testing on 9.4 w 64-bit windows.

If performance is good, I'll backport this to 2.1 and add a regression test.

strk — you know that interrupatability voodoo you got going that does timing — is it possible to reuse the timing logic to incorporate a timing test for this?

comment:18 by robe, 9 years ago

Owner: changed from pramsey to robe

comment:19 by robe, 9 years ago

Cc: strk added

comment:20 by robe, 9 years ago

Backported to 2.1 branch at r13446. I did a couple of quick tests thru 9.2. I'll let the bots check 9.1 and below and we probably should put some timings in.

There was a slight performance hit with SELECT count(*) in all platforms I tested. But that performance hit is considerably low compared to the large performance gains in other areas. So here is what I have:

-- generate 30,000 geometry columns
-- ALTER SYSTEM SET max_locks_per_transaction = '100000'; -- pre 9.4 need to edit postgres.conf, after change need to restart service.
DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test; --250620 ms need to increase max
-- need to increase max_locks_per_transaction;

DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
  SELECT string_agg('CREATE TABLE test.constraint_test_' || lower(f.geom) || '_' || i::text || '(gid serial, geom2 geometry(LINESTRING,2249) );
  SELECT addgeometrycolumn(''test'', ' 
  || quote_literal('constraint_test_' || lower(f.geom) || '_' || i::text) || ' , ''geom'', 
	' || f.srid::text || ',' || quote_literal(f.geom) || ', 2, false);', '') INTO var_sql
  FROM (VALUES ('POINT'::text, 4326), ('MULTIPOLYGON'::text, 4269), ('LINESTRING'::text, 4326)) f(geom,srid) 
		, generate_series(1,5000) As i;
  EXECUTE var_sql;
END;
$$

The test results on 30,000 geometry columns database, old compared to new 3 join approach. I also marked those helper functions used in the old view as deprecated in PostGIS 2.2 (for future removal). Leaving in just in case some 3rd party tools are using them.

-- 30000, 9.5 old 340 ms, new 740ms
-- 9.3 old 270 ms, new 480ms
-- 9.2 old 330 ms, new 450 ms
SELECT count(*) from geometry_columns;

-- 30000 records,  9.5alpha  old 222,260 ms, new 2030 ms  
-- 9.3 old 349,659ms, new 2080 ms
-- 9.2.10 old > 730,000 ms (stopped the process), new 2320 ms
SELECT * FROM geometry_columns;

-- 5000 records, 9.5 old  147,280 ms, new 560 ms
-- 9.3 old 127,900 ms, new 380 ms
-- 9.2 old - didn't bother, new 570 ms
SELECT * FROM geometry_columns WHERE type = 'POINT';

-- 20000 records, old 9.5  148,270 ms, new 1370 ms
-- 9.3 old 258,801 ms, new 1410 ms
-- 9.2.10 old didn't bother, new 1630 ms
SELECT * FROM geometry_columns WHERE type = 'LINESTRING';

-- 2 records, 9.5 old   210ms, new 9.5 70-160 ms
-- 9.3 old 220ms, new 170-180 ms
-- 9.2.10 new 200-210ms
SELECT * FROM geometry_columns  
WHERE f_table_name = 'constraint_test_linestring_1382';

comment:21 by robe, 9 years ago

Keywords: history added

comment:22 by robe, 9 years ago

bah I meant r13448 is 2.1 commit.

comment:23 by robe, 9 years ago

Resolution: fixed
Status: newclosed

closing this out but will add another for regress performance test

comment:24 by strk, 9 years ago

Robe the interrupt hack creates a table writing in the value of now() and then checks the value again against now() after performing its thing. It's not perfect. Ideally it should also consider some kind of "standard time" it takes to do things so to use "relative time".

comment:25 by robe, 9 years ago

Hmm why aren't you using clock_timestamp(). I think that is less prone to vagaries of transactions.

BTW did a quick test with if you have just typmod tables (30,000 columns) on 9.3, and happy to say for a pure typmod system, the timings of old and new are the same (aside from the count(*) which still suffers a penalty for some reason).

Note: See TracTickets for help on using tickets.