#2083 closed enhancement (fixed)
ST_RemEdgeModFace performance review
Reported by: | strk | Owned by: | strk |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.3.0 |
Component: | topology | Version: | 2.0.x |
Keywords: | performance | Cc: |
Description
ST_RemEdgeModFace queries the relation table to see if a face can be destroyed without affecting existing TopoGeometry objects. The query does not find any usable index because the only index we have is on all the columns and that query isn't using a match on all the column.
We could do two things:
- See if it's possible to rewrite the query to make use of the index
- Add an additional index on the table
Change History (27)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
I guess is better to add another index. However if the choos is to rewrite the query this is the opportunity to remove the constraint to remove other face only if this not destroyed other objects but support instead the inherit the change to other objects.
comment:3 by , 12 years ago
Supporting changes to higher TopoGeometry objects would require a more complex set of parameters because there are different things you may want to do. In particular, assuming there are only two non-overlapping TopoGeometry objects separated by the edge you're about to remove you may want to either destroy one of the two TopoGeom and make the other take the whole space (which one, in that case?) or you may want both TopoGeoms to take the whole space , or you may want to destroy both.
This ticket is for improving the performance, if you have ideas on semantic changes it is better to discuss them elsewhere.
I do feel the limit imposed by TopoGeometries, but I consider it a safeguard. What's really needed now, IMHO, are easy TopoGeometry editing function (addElement/dropElement).
comment:4 by , 11 years ago
Milestone: | PostGIS 2.1.0 → PostGIS 2.2.0 |
---|
comment:5 by , 9 years ago
Another function that would benefit from an additional index to the relation table is the ST_ModEdgeSplit one, as reported on the mailing list by a user: http://lists.osgeo.org/pipermail/postgis-users/2015-March/040326.html
comment:6 by , 9 years ago
Hi
After adding the index topo_ar5_sysdata.relation(abs(element_id)) the performance is much better for a single select the time goes from more than 3000 ms to around 20-40 ms. In the layer we are testing the topo_ar5_sysdata.relation which contains 27794698 rows. For the end user the update time goes down from 30 seconds to about 1 second for a user that that is drawing simple polygon to update this layer.
Her is the explain with the functional index added
explain analyze SELECT r.* FROM topo_ar5_sysdata.relation r, topology.layer l WHERE l.topology_id = 21 AND l.level = 0 AND l.layer_id = r.layer_id AND abs(r.element_id) = 30145 AND r.element_type = 2 ;
QUERY PLAN
Hash Join (cost=1450.51..64998.02 rows=100884 width=16) (actual time=0.061..0.066 rows=1 loops=1)
Hash Cond: (r.layer_id = l.layer_id) → Bitmap Heap Scan on relation r (cost=1449.96..63610.31 rows=100884 width=16) (actual time=0.027..0.032 rows=1 loops=1)
Recheck Cond: (abs(element_id) = 30145) Filter: (element_type = 2) Rows Removed by Filter: 1 → Bitmap Index Scan on opo_ar5_sysdata_relation_abs_element_id_idx (cost=0.00..1424.74 rows=138973 width=0) (actual time=0.019..0.019 rows=2 loops=1)
Index Cond: (abs(element_id) = 30145)
→ Hash (cost=0.53..0.53 rows=2 width=4) (actual time=0.023..0.023 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB → Seq Scan on layer l (cost=0.00..0.53 rows=2 width=4) (actual time=0.016..0.019 rows=2 loops=1)
Filter: ((topology_id = 21) AND (level = 0))
Total runtime: 0.126 ms
(13 rows)
To avoid extra index I may also use a or, but that only works a long as most of the element_id > 0. I have not seen any rows with a value below zero yet.
' AND abs(r.element_id) = e.edge_id'; |
—-
' AND ((r.element_id >-1 AND r.element_id = e.edge_id) OR ( r.element_id < 0 AND abs(r.element_id) = e.edge_id))';
I go for the functional index now and that may also cover other cases that I have not seen in the logs yet.
comment:7 by , 9 years ago
If by "extra index" you mean the only one added to the current one, you cannot save that. Queries exist that just check for the existance of any TopoGeometry using a specific element_id, so using OR would not help there.
comment:8 by , 9 years ago
Milestone: | PostGIS 2.2.0 → PostGIS Future |
---|
comment:10 by , 3 years ago
Keywords: | performance added |
---|
comment:11 by , 2 years ago
In a case I'm looking at I confirm that adding an index on abs(relation.elem_id) improves the speed. In my case from 64 seconds to 13 seconds to remove 7207 unused edges.
Related ticket: #4887
comment:12 by , 2 years ago
I'm thinking having a function to remove multiple edges at once could be useful, in general, to reduce the cost of such operation. Such function could use a single query to verify removing the edges would not represent a problem with TopoGeometries and eventually could update all edge linking and side faces at once.
comment:13 by , 2 years ago
I'm digging further in this, executing each query executed internally by ST_RemEdgeModFace. The first query looks like this:
select r.topogeo_id, r.layer_id, l.schema_name, l.table_name, l.feature_column FROM topology.layer l INNER JOIN <TOPONAME>.relation r ON (l.layer_id = r.layer_id) WHERE l.level = 0 AND l.feature_type IN ( 2, 4 ) AND l.topology_id = :topology_id AND r.element_type = 2 AND abs(r.element_id) = :edge_id;
The conditions used on the relation table are: layer_id, element_type, element_id.
The index is created with fields: layer_id, topogeo_id, element_type, element_id — this makes literal (query) values only usable in the order given, so since we do NOT have a 'topogeo_id' literal, the only usable literal is 'layer_id', which in the case I'm looking at matches *all* records, thus the planner decides to use a sequential scan.
If the unique index was created with reordered fields (layer_id, element_type, element_id, topogeo_id) it would then be possibly usable for filtering on element_type, and if we changed the abs(r.element_id) to an r.element_id in (10, -10) we'd have the possibility to also make juse of the element_id in the filter. But reordering might slow down other uses (fetching elements of a known topogeometry) so this is not the best way to proceed.
Rewriting this query is thus NOT a viable solution.
A new index could serve queries searching for TopoGeometry objects using a given element of the topology. Elements of a topology are always identified by the tuple ( layer_id, element_type, element_id ) because the *same* element_id may have different meanings based on element_type and layer_id. Among the three fields, element_id is probably always the most selective.
The next query, only performed if the edge to be removed has left_face != right_face, looks like this:
SELECT t.* FROM ( SELECT r.topogeo_id, r.layer_id, l.schema_name, l.table_name, l.feature_column, array_agg(r.element_id) as elems FROM topology.layer l INNER JOIN "${TOPO}".relation r ON (l.layer_id = r.layer_id) WHERE l.level = 0 and l.feature_type IN (3, 4) AND l.topology_id = id(findTopology('${TOPO}')) AND r.element_type = 3 AND r.element_id = ANY (ARRAY[${LEFT_FACE}, ${RIGHT_FACE}]::int4[]) GROUP by r.topogeo_id, r.layer_id, l.schema_name, l.table_name, l.feature_column ) t WHERE NOT t.elems @> ARRAY[${LEFT_FACE}, ${RIGHT_FACE}]::int4[];
The conditions used on the table are: element_type and element_id, no layer_id in this case as we're looking for faces in any non-hierarchical layer.
comment:15 by , 2 years ago
The next slow part is updating nodes containing in face, which is suffering from lack of index on node.containing face (see #2861)
comment:17 by , 2 years ago
Here is the list of all index I added when running resolve overlap (I will remove code when as upgrade to new Postgis versions)
EXECUTE Format('CREATE INDEX ON %s.relation(layer_id)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.relation(abs(element_id))', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.relation(element_id)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.relation(topogeo_id)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.edge_data USING GIST (geom)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.edge_data(abs_next_left_edge)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.edge_data(abs_next_right_edge)', (_topology_info).topology_name); EXECUTE Format('CREATE INDEX ON %s.node(containing_face)', (_topology_info).topology_name);
comment:18 by , 2 years ago
comment:19 by , 2 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
relation(layer_id) should not be needed because you usually don't have those many layers and anyway the btree (layer_id, topogeo_id, element_id, element_type) index would be already used when a query using layer_id is issued.
The abs(element_id) I removed the need for in [854103fa2e087b53485c8cf403a1fcb952146be2/git]
relation(topogeo_id) would also be covered by the existing btree on all members, as topogeo_id is second member after layer_id, and you always search for topogeo_id togheter with layer_id.
The edge_data(geom) is already created.
The node(containing_face) I pushed already with [e4495fb792fe31a87a84e92297d276baf254007a/git]
The only left over are the indices on abs_next_{left,right}_edge which I guess would be useful upon deleting edges, to re-link the edges pointing to the one being removed. I'm reopening the ticket to research upon this one.
comment:20 by , 2 years ago
Confirmed: those triggers are deferred so you don't see the cost of firing them until you COMMIT your transaction or force firing them (set constraint immediate). The cost of those triggers is very very high, especially during mass removal of edges.
Given we have a good ValidateTopology I wonder if we should completely drop those foreign keys (and other constraints as well).
Here are some times (seconds elapsed since start of process). The operation is conducted against a topology having
Without the index on the abs_* attributes:
0 psql:clearonly.sql:50: NOTICE: Removing unused 0.590315 psql:clearonly.sql:50: NOTICE: Removed 178 0.590379 psql:clearonly.sql:50: NOTICE: Removing isolated 0.622091 psql:clearonly.sql:50: NOTICE: Removed 127 0.622126 psql:clearonly.sql:50: NOTICE: Removing unneeded 0.624795 psql:clearonly.sql:50: NOTICE: Removed 0 0.624816 305 0.624918 Firing next_left_edge_exists 249.487 SET CONSTRAINTS 249.487 Firing next_right_edge_exists 508.951 SET CONSTRAINTS
We're basically talking about under one second to remove 178 edges and 127 nodes (which are basically merging of 127 pairs of edges, mostly. And over 8 MINUTES to check the foreign keys.
Adding those indexes give instead these numbers:
0 psql:clearonly.sql:53: NOTICE: Removing unused 0.795837 psql:clearonly.sql:53: NOTICE: Removed 178 0.795877 psql:clearonly.sql:53: NOTICE: Removing isolated 0.827427 psql:clearonly.sql:53: NOTICE: Removed 127 0.827465 psql:clearonly.sql:53: NOTICE: Removing unneeded 0.829889 psql:clearonly.sql:53: NOTICE: Removed 0 0.829908 305 0.830004 Firing next_left_edge_exists 0.843845 SET CONSTRAINTS 0.843973 Firing next_right_edge_exists 0.859349 SET CONSTRAINTS
Which means foreign keys check goes down from 8 minutes to 0.3 seconds. Worth adding those indices I'd say.
comment:21 by , 2 years ago
Very good, but could that indicate a missing index ?
That has been problem in some other cases I have had this problem I think.
comment:22 by , 2 years ago
Yes, as you said, the index on the "abs" version of next left/right edge was missing. Will be added with https://gitlab.com/postgis/postgis/-/merge_requests/78
comment:23 by , 2 years ago
That missing index ONLY affects edge DELETION. Updates are not affected by lack of it (but I guess can be slightly slower as they need to also update the additional index)
comment:24 by , 2 years ago
Not sure but I trust constraints more than function that maybe not be called .
Can it be the way the constraints are defined ?
comment:26 by , 2 years ago
Not sure but I trust constraints more than function that maybe not be called .
Well, we know the current constraints are NOT covering all cases of invalid topologies, so they may be just giving a false sense of security. Making them cover all cases would probably make them prohibitively slow, but it could be an idea to experiment with (better discuss on the mailing list, this ticket is not the right place).
comment:27 by , 2 years ago
Milestone: | PostGIS Fund Me → PostGIS 3.3.0 |
---|
Another thing to do could be moving the check for TopoGeometry sanity before the update calls, as it should be much faster and it's useful to be able to just try to drop an edge and see if it succeeded or not based on presence of a TopoGeometry..