PostGis Topology performance improvements: edge_data indexes
|Reported by:||Björn Harrtell||Owned by:||strk|
I've been following the effort to investigate and improve topology performance.
I've used pg_stat_statements (with track all) to get statement statistics when loading a dataset of 100000 polygons into a topology. I use TopoGeo_AddPolygon and separate transactions for each area to avoid any issues with a single long running transaction and as the significant time spent is in TopoGeo_AddPolygon anyway.
In the resulting top list of statements with most time spent at third and fourth place came the following statements which I found suspicious:
UPDATE topo.edge_data SET next_right_edge = ?, abs_next_right_edge = ? WHERE edge_id != ? AND next_right_edge = ?
UPDATE topo.edge_data SET next_left_edge = ?, abs_next_left_edge = ? WHERE edge_id != ? AND next_left_edge = ?
I noted that next_right_edge and next_left_edge are not indexed in edge_data and conclude that these updates must be triggering sequential scans of edge_data. Adding indexes results in a very significant performance increase and from what I can tell a performance curve with diminishing increases in time per object loaded.