Opened 9 years ago

Last modified 7 years ago

#2998 new enhancement

Define a strategy to ensure stats are collected during topology building

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS Fund Me
Component: topology Version: master
Keywords: performance Cc:

Description

It is very likely that the queries used during topology building are inefficient due to lack of statistics enabling use of indexes.

Statistics are not updated until transaction commit, so this is a problem for queries like:

UPDATE mycoverage SET topogeom = toTopoGeom(geom);

This ticket is to define a strategy to run ANALYZE during these kind of operations, in a somewhat automatic but not invasive way.

Change History (4)

comment:1 by strk, 9 years ago

So one idea is for functions like ST_AddEdgeNewFaces, ST_AddEdgeModFaces, ST_AddEdge and the like to all use a common "edge adder" function which would keep track of how many primitives are added and run ANALYZE <table> after every <X> additions in <table>. Since every primitive table has an id, driven by a sequence, it could be relatively inexpensive to do, using a modulo operation.

comment:2 by strk, 9 years ago

Note that it is not necessarely true that indexes won't be used due to lack of stats, as the estimator makes some guesses on itself, and the invoked type-specific selectivity estimators may also behave in unexpected ways.

In this example, passing 99 geometries to toTopoGeom in a single statement resulted in these operations on the "edge_data" table:

=# select * from pg_stat_all_tables where relname = 'edge_data' and schemaname = 'topo_ulfareale';
relid             | 16228229
schemaname        | topo_ulfareale
relname           | edge_data
seq_scan          | 417
seq_tup_read      | 28976
idx_scan          | 3438
idx_tup_fetch     | 3170
n_tup_ins         | 175
n_tup_upd         | 365
n_tup_del         | 0
n_tup_hot_upd     | 65
n_live_tup        | 175
n_dead_tup        | 365
last_vacuum       | 
last_autovacuum   | 
last_analyze      | 
last_autoanalyze  | 
vacuum_count      | 0
autovacuum_count  | 0
analyze_count     | 0
autoanalyze_count | 0

As you can see there was no analyze ever run, still there have been ~3.5k index scans and "only" 417 sequencial scans. For just 99 inputs it's much over too many scans in general, but shows that index scans still happen.

comment:3 by strk, 9 years ago

Keywords: performance added

comment:4 by robe, 7 years ago

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.