geometry_columns performance on large databases
|Reported by:||rnewton||Owned by:||pramsey|
I discovered this performance issue when using pgsql2shp. Our environment uses thousands of tables in a single database where the general interaction is to not perform lookups via the postgres tables (pg_class, pg_namespace, etc), but rather directly by name.
The query that retrieves the projection string in pgsql2shp uses the geometry_columns view to find the proper projection. In our environment, this lookup is very slow because it has to enumerate all the tables in order to find every one that includes a geometry column.
This had the effect of causing our shapefile queries to timeout in some cases. To export the shape for a single city was taking about 15 minutes.
As a work-around, we recreated the geometry_columns view as a materialized view. Even without indexes, this greatly improves the performance, in our environment. The same single-city shapefile will be generated in about 1 second. We can schedule the sync of the materialized view on a nightly basis or resync when adding tables with new geometry columns, which is infrequent for us.
Ours might be a special case due to the huge number of tables in our database, ability to use PostgreSQL 9.3+ and stable number of tables with geometry columns.
A possible, related fix would be to include a flag to pgsql2shp for specifying the projection. This wouldn't replace the geometry_columns check, but rather provide a short cut.
My suggestion would be to use a materialized view for geometry_columns (as we're doing now) and triggering the resync when a new geometry column is added to a table (the same hook that used to exist when geometry_columns was a table).