Opened 9 years ago

Closed 9 years ago

#3010 closed enhancement (duplicate)

PostGis Topology performance improvements: edge_data indexes

Reported by: Björn Harrtell Owned by: strk
Priority: medium Milestone:
Component: topology Version: 2.1.x
Keywords: Cc:

Description

Hi,

I've been following the effort to investigate and improve topology performance.

I've used pg_stat_statements (with track all) to get statement statistics when loading a dataset of 100000 polygons into a topology. I use TopoGeo_AddPolygon and separate transactions for each area to avoid any issues with a single long running transaction and as the significant time spent is in TopoGeo_AddPolygon anyway.

In the resulting top list of statements with most time spent at third and fourth place came the following statements which I found suspicious:

UPDATE topo.edge_data SET next_right_edge = ?, abs_next_right_edge = ? WHERE edge_id != ? AND next_right_edge = ? UPDATE topo.edge_data SET next_left_edge = ?, abs_next_left_edge = ? WHERE edge_id != ? AND next_left_edge = ?

I noted that next_right_edge and next_left_edge are not indexed in edge_data and conclude that these updates must be triggering sequential scans of edge_data. Adding indexes results in a very significant performance increase and from what I can tell a performance curve with diminishing increases in time per object loaded.

Attachments (3)

6000.png (41.8 KB ) - added by Björn Harrtell 9 years ago.
300.png (146.5 KB ) - added by Björn Harrtell 9 years ago.
50.png (53.4 KB ) - added by Björn Harrtell 9 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by Björn Harrtell, 9 years ago

by Björn Harrtell, 9 years ago

Attachment: 6000.png added

by Björn Harrtell, 9 years ago

Attachment: 300.png added

by Björn Harrtell, 9 years ago

Attachment: 50.png added

comment:2 by Björn Harrtell, 9 years ago

Added some diagrams plotting time to load, per polygon, one million polygons into a topology. The whole process of loading ~1,3 million polygons took ~24h. This is after adding the indexes as described in this ticket, without the indexes I gave up loading this many polygons.

Diagrams are named after max milliseconds displayed. Some funny patterns emerge when looking at max 6000 or 300 ms which I think is related to my input data. The max 50 ms diagram shows a logarithmic trend line clearly.

comment:3 by strk, 9 years ago

See #3000 for a precedent individuation of this issue.

comment:4 by strk, 9 years ago

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.