Opened 7 years ago

Last modified 5 years ago

#3103 new task

geometry_columns, find_srid performance regress tests

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: 2.1.x
Keywords: Cc:

Description

We need to add a test for both checking correctness and performance of querying geometry_columns and using find_srid. Per issues uncovered in #3092 and #2365.

The way the planner behaves has changed over time too so simply assuming because performance is good in pg version x for a particular view it will be good in pg version whatever is fraught with problems.

Change History (11)

comment:1 by robe, 7 years ago

Owner: changed from pramsey to robe

comment:2 by robe, 7 years ago

Committed at r13453 change to no longer cast to varchar. I didn't see a difference with find_srid (totally illogical). find_srid even in my old view took 20 ms (even though direct query on view was taking 200 ms with old view). so more tests required.

comment:3 by robe, 7 years ago

On closer inspection the find_srid that whole splitting exercise to account for if people throw the schema name as part of the table name is a real performance killer.

If I change the function to this:

CREATE OR REPLACE FUNCTION find_srid(
   character varying, character varying, character varying)
  RETURNS integer AS
$$
BEGIN
	RETURN srid FROM geometry_columns 
	    WHERE (f_table_schema = $1 OR $1 = '') 
            AND f_table_name = $2  AND f_geometry_column = $3 LIMIT 1;
END;
$$
  LANGUAGE plpgsql IMMUTABLE STRICT;

And have a table of 30,000 tests I created with this:

SELECT f_table_schema::Text, f_table_name::text, f_geometry_column::text INTO tmp_geoms from geometry_columns ;

-- finishes in 1861 ms
SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) FROM tmp_geoms;


-- With the old find_srid that has all this error handling gook and split table name
-- into schema table, I lost patience waiting
 

Why it's such a massive overhead on my 9.3 instance I'm not sure, but I also don't think all that splitting is worth the overhead of something people shouldn't be doing anyway. It's pretty clear to me that table name should not include the schema name when we have a slot for schema name.

Last edited 7 years ago by robe (previous) (diff)

comment:4 by strk, 7 years ago

does turning the function into an SQL one makes it inlinable/faster ?

comment:5 by robe, 7 years ago

Nope (removing STRICT Of course so it can inline) and when I tried making it an SQL function, I ran into another problem. Since PostgreSQL can see into an SQL function, it complained that the view didn't exist when I tried to upgrade. I guess because I dropped the view at beginning. Not sure. I have to revisit that. I was thinking this function should really be SQL stable rather than IMMUTABLE.

I'm still really puzzled how that seemingly simple condition could have such major speed consequences. Makes me concerned something else is up.

comment:6 by strk, 7 years ago

I'd avoid requiring a DROP VIEW / CREATE VIEW if it's not for fixing a really really important issue. Upgrades are already an hell as they stand…

comment:7 by robe, 7 years ago

We do have to make our upgrade logic smart about when it needs to drop and when not.

For this case, would never do it in 2.1, but for 2.2 I think its worth it, just because I hate that geography_columns and geometry_columns are different and it does provide quite a bit of performance boost if you have a lot of records in geometry_columns.

comment:8 by strk, 7 years ago

I'd first make a deeper research about implications of varchar vs. text. Internally they should be the same (size header, variable length).

comment:9 by strk, 7 years ago

Possibly interesting numbers:

strk=# explain analyze select sum(x) from ( select i::varchar::float8 x from generate_series(1,5000000) i ) foo;
Aggregate  (cost=22.50..22.51 rows=1 width=4) (actual time=5743.902..5743.902 rows=1 loops=1)
  ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=1101.722..2218.290 rows=5000000 loops=1)
Total runtime: 5823.127 ms
strk=# explain analyze select sum(x) from ( select i::text::float8 x from generate_series(1,5000000) i ) foo;
Aggregate  (cost=22.50..22.51 rows=1 width=4) (actual time=5724.225..5724.225 rows=1 loops=1)
  ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=1106.496..2226.769 rows=5000000 loops=1)
Total runtime: 5803.895 ms
strk=# explain analyze select sum(x) from ( select i::varchar::text::float8 x from generate_series(1,5000000) i ) foo;
Aggregate  (cost=22.50..22.51 rows=1 width=4) (actual time=5757.578..5757.578 rows=1 loops=1)
  ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=1111.779..2228.768 rows=5000000 loops=1)
Total runtime: 5836.428 ms
strk=# explain analyze select sum(x) from ( select i::text::varchar::float8 x from generate_series(1,5000000) i ) foo;
Aggregate  (cost=22.50..22.51 rows=1 width=4) (actual time=5756.883..5756.883 rows=1 loops=1)
  ->  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000 width=4) (actual time=1132.671..2249.771 rows=5000000 loops=1)
Total runtime: 5835.886 ms

comment:10 by robe, 7 years ago

Milestone: PostGIS 2.2.0PostGIS Future

comment:11 by robe, 5 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.