Opened 5 years ago
Closed 2 years ago
#962 closed defect (invalid)
GEOSNode is much slower that GEOSUnaryUnion
Reported by: | komzpa | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 3.11.0 |
Component: | Default | Version: | 3.6.2 |
Severity: | Unassigned | Keywords: | |
Cc: |
Description
We need to node all roads in a city to feed result further into triangulation. We found that PostGIS query works ~1500s with ST_Node vs. ~200s with ST_UnaryUnion. Looking at the code, JTS uses some snaprounding noder and GEOS implementation is stub O(N2), even though snaprounding implementation is available for UnaryUnion.
Change History (10)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Here's what I think is going on. GEOSNode
uses the GEOS GeometryNoder, which in turn uses IteratedNoder. IteratedNoder
uses MCIndexNoder
, which is actually O(NlogN)-ish. But the function of IteratedNoder
is to attempt to fully node lines by repeatedly re-noding until no new nodes are created. There are situations involving almost-parallel lines where repeated noding keeps adding nodes, but also perturbing the resultant line segments so that they still cross. So this will obviously be very slow.
OTOH the GEOS UnaryUnion
uses the overlay snapping heuristic if it detects noding failure. That's probably succeeding here, and so the computation is faster.
However, in the case DB mentions the UnaryUnion is perhaps hitting the recent correctness heuristic that was introduced to prevent some other failures. It seems to be quite slow in some cases. Probably the dataset does not cause too much (perhaps none) repeated noding in GEOSNode, so that is faster.
comment:3 by , 5 years ago
It is the case that JTS provides the GeometryNoder which uses snap-rounding. That would probably be reasonably fast and should not have the noding failures (at the price of having to round all coordinates slightly). The upcoming OverlayNG should also have this same behaviour and performance. So that will be a medium-term fix for this. It will be a task to revisit the GEOS noding and make it utilize the new code.
comment:4 by , 5 years ago
One further note is that it looks like GEOS is using a original Cascaded Union approach for UnaryUnion for both polygons and lines. In fact in theory using a Cascaded approach on lines doesn't provide much performance benefit, because unlike polygons unioning lines doesn't discard many if any vertices.
JTS doesn't do this in its UnaryUnion- it just does a full union of the line with an empty geometry, which effectively forces the noding of the lines (but may throw a TopologyException if the noding can't be computed correctly due to robustness issues - this is demonstrated in #392).
I guess the GEOS approach handles noding issues better, due to the snapping heuristic in the mix. But this is something else that could be revisited when the new overlay hits the codebase.
comment:5 by , 5 years ago
Yes, the cascaded union for lines is very expensive in some cases. I put together a performance benchmark that's like the #982 case (many short lines), and removing the cascaded union improves performance by about 85%. So it might make sense to use the cascaded approach only if the standard approach throws an exception.
comment:6 by , 5 years ago
The data in #982 is absolutely a worst-case scenario for CascadedUnion, since it has very little spatial coherence, and the number of nodes is on the order of O(n2). So it's not surprising it's very slow.
Invoking CascadedUnion only on union exception would be a quick fix for this case. But I have my doubts about using Cascaded Union at all for sets of lines. I think it should be possible to determine how the Cascaded Union code is improving robustness (likely due to snapping) and simply invoke that directly. Although, the snapping heuristic is possibly low-performance itself, and so it might not produce much advantage.
Ultimately, better noding is going to be the solution. Either snap-rounding, or perhaps with further research an approach that preserves more precision where possible.
comment:7 by , 5 years ago
I think the tickets referred to in the UnaryUnion comments suggest that the cascaded union helps robustness by unioning nearby lines together (which supposedly helps because of common bit removal.) I think it's becoming off-topic for this ticket but we can discuss at https://github.com/libgeos/geos/pull/213
comment:8 by , 3 years ago
Milestone: | → 3.10.0 |
---|
comment:10 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
This issue feels stale and could probably use a new version over on GitHub with tests and datasets against main.
A counterexample is the test data in #982, which on my machine takes 4 minutes with ST_Node, or 3.5 hours with ST_UnaryUnion.