Opened 6 years ago

Closed 4 years ago

Last modified 4 years ago

#2365 closed defect (fixed)

find_srid performance regression with large numbers of tables

Reported by: angrygoat Owned by: pramsey
Priority: medium Milestone: PostGIS 2.1.8
Component: postgis Version: 2.0.x
Keywords: history Cc: keithmoss

Description

The speed of find_srid has dropped significantly in PostGIS >= 2.0 when a large number of tables exist in the database. It looks like this is related to the new definition of geometry_columns as a view.

On my system, for a database containing a hundred tables find_srid() takes ~ 10ms, with five thousand tables ~ 100ms, with fifteen thousand tables ~ 450ms.

I'm not sure if this is a bug or a regression you'd consider worth fixing, however mapserver's postgis vector data source will put a find_srid() call in every query if the user doesn't specify "using srid=<srid>". With PostGIS>=2.0 on a database with lots of tables, not specifying the srid is suddenly a massive (rather than very minor) performance issue.

I've attached a rather ugly shell script that reproduces the issue :-) Thanks!

Attachments (1)

test.sh (677 bytes) - added by angrygoat 6 years ago.

Download all attachments as: .zip

Change History (18)

Changed 6 years ago by angrygoat

Attachment: test.sh added

comment:1 Changed 6 years ago by keithmoss

Cc: keithmoss added

comment:2 Changed 6 years ago by Bborie Park

Milestone: PostGIS 2.1.0PostGIS 2.0.4

I suspect the long-term solution is for the view to become a materialized view (PostgreSQL 9.3+). A temporary solution is to rename the view to something different and then create a table based upon the view.

ALTER VIEW geometry_columns RENAME TO _geometry_columns;

CREATE TABLE geometry_columns AS
  SELECT * FROM _geometry_columns;

You'd have to update the table once in a while though...

comment:3 Changed 6 years ago by robe

Priority: lowmedium

I didn't even know that function existed and looks like it hasn't been touched in years. I was going to suggest marking it as immutable but noticed that is already set. I would suggest bumping up the cost on it to say 500 or 1000 since its not a completely trivial function anymore (and the cost of 100 sounds a bit low). That will encourage it to cache more in shared mem so you'd then probably suffer the penalty only once in a while.

Give that a try and if it works I think we can probably do at least that much since its definitely more costly than some of the other functions we have.

I'm also wondering if the tables you have use constraints or typmod. I would expect typmod ones to be much faster so you may consider converting your tables to typmod if you haven't already.

comment:4 Changed 6 years ago by angrygoat

Hey

I've managed to work around by avoiding the find_srid() call in the frequently executed query (just telling mapserver the SRID) – just logged this because I didn't want other people to fall down the same hole.

You can reproduce this by just creating a bunch of plain non-geometry tables, so I don't know about typmod vs. constraints mattering. The one geometry table my script makes is typmod though.

Thanks for all the help!

comment:5 Changed 6 years ago by pramsey

Resolution: wontfix
Status: newclosed

I think there's enough workarounds (just specify your srid; replace the view with a table) that we're not going to touch this one.

comment:6 Changed 6 years ago by robe

food for thought with 9.3, you can make it a materialized view as we describe here:

http://www.postgresonline.com/journal/archives/314-Materialized-geometry_columns-using-Event-Triggers.html

comment:7 Changed 4 years ago by robe

try to improve speed -- see #3092 for details.

comment:8 Changed 4 years ago by robe

Milestone: PostGIS 2.0.4PostGIS 2.2.0

I'm reopening this one. Looking at the find_srid function, I think this may be somewhat improved by the commit discussed in #3092, but to improve even more, we really got to get rid of the casting I noted in the TODO of the view.

Unfortunately taking out the cast, would require dropping and recreating a view and since the output types would become of type: name instead of varchar, it would be considered an API change, so would have to go in 2.2.

I'm guessing though that doing that would increase performance about 3 fold at least since find_srid just does a table/column lookup like this:

Aha see pramsey - column lookup

select SRID into sr from geometry_columns where f_table_schema like schem and f_table_name = tabl and f_geometry_column = $3;

Anyway find_srid probably needs a bit of cleanup too.

comment:9 Changed 4 years ago by robe

Resolution: wontfix
Status: closedreopened

comment:10 Changed 4 years ago by robe

See #3103 also sorry should have put comment here.

I'd really like to simplify logic to :

CREATE OR REPLACE FUNCTION find_srid(
   character varying, character varying, character varying)
  RETURNS integer AS
$$
BEGIN
	RETURN srid FROM geometry_columns 
	    WHERE (f_table_schema = $1 OR $1 = '') 
            AND f_table_name = $2  AND f_geometry_column = $3 LIMIT 1;
END;
$$
  LANGUAGE plpgsql IMMUTABLE STRICT;

But it's a breaking change since it would mean find_srid would no longer support

constructs like below and I'm getting rid of error handling raise notice warnings.

SELECT find_srid('','myschema.mytable', 'geom');

But that extra logic is massively killing performance on both the 9.3.6 and 9.4.1 instance I have and not sure if any third party tool even relies on that logic.

comment:11 Changed 4 years ago by pramsey

In a schema-qualified situation, mapserver will run the construct you don't like. Ie, if the DATA line is "the_geom from myschema.mytable"

comment:12 Changed 4 years ago by pramsey

Does using a strpos instead of a LIKE make any difference? I'm surprised a one-time test of "does it have a dot' is having a noticeable effect on performance.

	IF ( schem = '' AND strpos(tabl,'.') > 0 ) THEN

I mean, the test gets run once, and only if schema is empty... how is this causing noticeable performance differences?

comment:13 Changed 4 years ago by robe

A tincy better but still pretty bad. this time I did on only 20 records and had to wrap in a subselect otherwise seems to try to apply to whole table

Here are benchmarks I have -- the tmpgeoms is just a raw table bulk insert from my 30,000 geometry_columns table

-- this is existing function
-- 5530 ms 
SELECT find_srid('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;

-- 130 ms
SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;

----

-- this is the one using strpos for check instead of LIKE -- 5530 ms
SELECT find_srid_new2('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;

-- 130 ms
SELECT find_srid_new2(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;


----

-- Do not bother doing split check at all and go straight to query
-- 30ms
SELECT find_srid_new('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;

-- 20 ms
SELECT find_srid_new(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms  ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;

Yah seems pretty shocking. Makes me think something might be wrong with my install or a bug in postgres. I ran on PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit. But had similar experience with 9.3. This is using my new view (with casts removed. As I recall though the old view was just as bad.

Last edited 4 years ago by robe (previous) (diff)

comment:14 Changed 4 years ago by robe

Okay I think I got it. I don't need to change as much as I thought I needed to change to make this fast. It seems to be the declaration at the top causing all the mischief

If I leave the function as it was but change this

DECLARE
	schem text;
	tabl text;
	sr int4;

to

DECLARE
	schem varchar;
	tabl varchar;
	sr int4;

Then seems performance is almost as good as the one I had not doing any checking and not having any error handler. More tests and I'll commit.

comment:15 Changed 4 years ago by robe

Commited at trunk 2.2 at r13461. I did put in a note it's 2.1.8 cause I plan to backport to 2.1. I can't see how this new version is any worse than what we had before, but I'll test with the casting view to make sure performance is not bad.

With this version I keep all the silly checking of old and error handling.

and timings don't try my patience anymore.

For my 30,000 row geometry_columns view, mostly of constraint based columns, my stats are now surprisingly respectable.

-- 1912 ms
SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) 
  FROM  tmp_geoms As f;

-- 1911 ms

SELECT find_srid('', f_table_schema || '.' || f_table_name, f_geometry_column) FROM  tmp_geoms As f;

comment:16 Changed 4 years ago by robe

Keywords: history added
Milestone: PostGIS 2.2.0PostGIS 2.1.8
Resolution: fixed
Status: reopenedclosed

Backported to 2.1 at r13462

comment:17 Changed 4 years ago by robe

Although I changed the find_srid to be the same in 2.1, the marked improvement only seems to happen with the combination of removing the cast from geometry_columns view (done in 2.2, but can't in 2.1 because it requires column type change) and also the change in find_srid.

I think find_srid id performance is still better though but not like a super improvement and way too painful to run looking up all 30,000.

Note: See TracTickets for help on using tickets.