Opened 10 years ago
Closed 8 years ago
Last modified 8 years ago
#2365 closed defect (fixed)
find_srid performance regression with large numbers of tables
|Reported by:||angrygoat||Owned by:||pramsey|
The speed of find_srid has dropped significantly in PostGIS ≥ 2.0 when a large number of tables exist in the database. It looks like this is related to the new definition of geometry_columns as a view.
On my system, for a database containing a hundred tables find_srid() takes ~ 10ms, with five thousand tables ~ 100ms, with fifteen thousand tables ~ 450ms.
I'm not sure if this is a bug or a regression you'd consider worth fixing, however mapserver's postgis vector data source will put a find_srid() call in every query if the user doesn't specify "using srid≤srid>". With PostGIS≥2.0 on a database with lots of tables, not specifying the srid is suddenly a massive (rather than very minor) performance issue.
I've attached a rather ugly shell script that reproduces the issue Thanks!
Change History (18)
by , 10 years ago
comment:1 by , 10 years ago
comment:2 by , 10 years ago
|Milestone:||PostGIS 2.1.0 → PostGIS 2.0.4|
comment:3 by , 10 years ago
|Priority:||low → medium|
I didn't even know that function existed and looks like it hasn't been touched in years. I was going to suggest marking it as immutable but noticed that is already set. I would suggest bumping up the cost on it to say 500 or 1000 since its not a completely trivial function anymore (and the cost of 100 sounds a bit low). That will encourage it to cache more in shared mem so you'd then probably suffer the penalty only once in a while.
Give that a try and if it works I think we can probably do at least that much since its definitely more costly than some of the other functions we have.
I'm also wondering if the tables you have use constraints or typmod. I would expect typmod ones to be much faster so you may consider converting your tables to typmod if you haven't already.
comment:4 by , 10 years ago
I've managed to work around by avoiding the find_srid() call in the frequently executed query (just telling mapserver the SRID) – just logged this because I didn't want other people to fall down the same hole.
You can reproduce this by just creating a bunch of plain non-geometry tables, so I don't know about typmod vs. constraints mattering. The one geometry table my script makes is typmod though.
Thanks for all the help!
comment:5 by , 10 years ago
|Status:||new → closed|
I think there's enough workarounds (just specify your srid; replace the view with a table) that we're not going to touch this one.
comment:6 by , 10 years ago
food for thought with 9.3, you can make it a materialized view as we describe here:
comment:7 by , 8 years ago
try to improve speed — see #3092 for details.
comment:8 by , 8 years ago
|Milestone:||PostGIS 2.0.4 → PostGIS 2.2.0|
I'm reopening this one. Looking at the find_srid function, I think this may be somewhat improved by the commit discussed in #3092, but to improve even more, we really got to get rid of the casting I noted in the TODO of the view.
Unfortunately taking out the cast, would require dropping and recreating a view and since the output types would become of type: name instead of varchar, it would be considered an API change, so would have to go in 2.2.
I'm guessing though that doing that would increase performance about 3 fold at least since find_srid just does a table/column lookup like this:
Aha see pramsey - column lookup
select SRID into sr from geometry_columns where f_table_schema like schem and f_table_name = tabl and f_geometry_column = $3;
Anyway find_srid probably needs a bit of cleanup too.
comment:9 by , 8 years ago
|Status:||closed → reopened|
comment:10 by , 8 years ago
See #3103 also sorry should have put comment here.
I'd really like to simplify logic to :
CREATE OR REPLACE FUNCTION find_srid( character varying, character varying, character varying) RETURNS integer AS $$ BEGIN RETURN srid FROM geometry_columns WHERE (f_table_schema = $1 OR $1 = '') AND f_table_name = $2 AND f_geometry_column = $3 LIMIT 1; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;
But it's a breaking change since it would mean find_srid would no longer support
constructs like below and I'm getting rid of error handling raise notice warnings.
SELECT find_srid('','myschema.mytable', 'geom');
But that extra logic is massively killing performance on both the 9.3.6 and 9.4.1 instance I have and not sure if any third party tool even relies on that logic.
comment:11 by , 8 years ago
In a schema-qualified situation, mapserver will run the construct you don't like. Ie, if the DATA line is "the_geom from myschema.mytable"
comment:12 by , 8 years ago
Does using a strpos instead of a LIKE make any difference? I'm surprised a one-time test of "does it have a dot' is having a noticeable effect on performance.
IF ( schem = '' AND strpos(tabl,'.') > 0 ) THEN
I mean, the test gets run once, and only if schema is empty… how is this causing noticeable performance differences?
comment:13 by , 8 years ago
A tincy better but still pretty bad. this time I did on only 20 records and had to wrap in a subselect otherwise seems to try to apply to whole table
Here are benchmarks I have — the tmpgeoms is just a raw table bulk insert from my 30,000 geometry_columns table
-- this is existing function -- 5530 ms SELECT find_srid('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f; -- 130 ms SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f; ---- -- this is the one using strpos for check instead of LIKE -- 5530 ms SELECT find_srid_new2('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f; -- 130 ms SELECT find_srid_new2(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f; ---- -- Do not bother doing split check at all and go straight to query -- 30ms SELECT find_srid_new('', f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f; -- 20 ms SELECT find_srid_new(f_table_schema, f_table_name, f_geometry_column) FROM (SELECT * FROM tmp_geoms ORDER BY f_table_name, f_geometry_column LIMIT 20) As f;
Yah seems pretty shocking. Makes me think something might be wrong with my install or a bug in postgres. I ran on PostgreSQL 9.4.1, compiled by Visual C++ build 1800, 64-bit. But had similar experience with 9.3. This is using my new view (with casts removed. As I recall though the old view was just as bad.
comment:14 by , 8 years ago
Okay I think I got it. I don't need to change as much as I thought I needed to change to make this fast. It seems to be the declaration at the top causing all the mischief
If I leave the function as it was but change this
DECLARE schem text; tabl text; sr int4;
DECLARE schem varchar; tabl varchar; sr int4;
Then seems performance is almost as good as the one I had not doing any checking and not having any error handler. More tests and I'll commit.
comment:15 by , 8 years ago
Commited at trunk 2.2 at r13461. I did put in a note it's 2.1.8 cause I plan to backport to 2.1. I can't see how this new version is any worse than what we had before, but I'll test with the casting view to make sure performance is not bad.
With this version I keep all the silly checking of old and error handling.
and timings don't try my patience anymore.
For my 30,000 row geometry_columns view, mostly of constraint based columns, my stats are now surprisingly respectable.
-- 1912 ms SELECT find_srid(f_table_schema, f_table_name, f_geometry_column) FROM tmp_geoms As f; -- 1911 ms SELECT find_srid('', f_table_schema || '.' || f_table_name, f_geometry_column) FROM tmp_geoms As f;
comment:16 by , 8 years ago
|Milestone:||PostGIS 2.2.0 → PostGIS 2.1.8|
|Status:||reopened → closed|
Backported to 2.1 at r13462
comment:17 by , 8 years ago
Although I changed the find_srid to be the same in 2.1, the marked improvement only seems to happen with the combination of removing the cast from geometry_columns view (done in 2.2, but can't in 2.1 because it requires column type change) and also the change in find_srid.
I think find_srid id performance is still better though but not like a super improvement and way too painful to run looking up all 30,000.
I suspect the long-term solution is for the view to become a materialized view (PostgreSQL 9.3+). A temporary solution is to rename the view to something different and then create a table based upon the view.
You'd have to update the table once in a while though…