Opened 5 years ago

Closed 2 years ago

#962 closed defect (invalid)

GEOSNode is much slower that GEOSUnaryUnion

Reported by: komzpa Owned by: geos-devel@…
Priority: major Milestone: 3.11.0
Component: Default Version: 3.6.2
Severity: Unassigned Keywords:
Cc:

Description

We need to node all roads in a city to feed result further into triangulation. We found that PostGIS query works ~1500s with ST_Node vs. ~200s with ST_UnaryUnion. Looking at the code, JTS uses some snaprounding noder and GEOS implementation is stub O(N2), even though snaprounding implementation is available for UnaryUnion.

Change History (10)

comment:1 by dbaston, 5 years ago

A counterexample is the test data in #982, which on my machine takes 4 minutes with ST_Node, or 3.5 hours with ST_UnaryUnion.

comment:2 by mdavis, 5 years ago

Here's what I think is going on. GEOSNode uses the GEOS GeometryNoder, which in turn uses IteratedNoder. IteratedNoder uses MCIndexNoder, which is actually O(NlogN)-ish. But the function of IteratedNoder is to attempt to fully node lines by repeatedly re-noding until no new nodes are created. There are situations involving almost-parallel lines where repeated noding keeps adding nodes, but also perturbing the resultant line segments so that they still cross. So this will obviously be very slow.

OTOH the GEOS UnaryUnion uses the overlay snapping heuristic if it detects noding failure. That's probably succeeding here, and so the computation is faster.

However, in the case DB mentions the UnaryUnion is perhaps hitting the recent correctness heuristic that was introduced to prevent some other failures. It seems to be quite slow in some cases. Probably the dataset does not cause too much (perhaps none) repeated noding in GEOSNode, so that is faster.

comment:3 by mdavis, 5 years ago

It is the case that JTS provides the GeometryNoder which uses snap-rounding. That would probably be reasonably fast and should not have the noding failures (at the price of having to round all coordinates slightly). The upcoming OverlayNG should also have this same behaviour and performance. So that will be a medium-term fix for this. It will be a task to revisit the GEOS noding and make it utilize the new code.

comment:4 by mdavis, 5 years ago

One further note is that it looks like GEOS is using a original Cascaded Union approach for UnaryUnion for both polygons and lines. In fact in theory using a Cascaded approach on lines doesn't provide much performance benefit, because unlike polygons unioning lines doesn't discard many if any vertices.

JTS doesn't do this in its UnaryUnion- it just does a full union of the line with an empty geometry, which effectively forces the noding of the lines (but may throw a TopologyException if the noding can't be computed correctly due to robustness issues - this is demonstrated in #392).

I guess the GEOS approach handles noding issues better, due to the snapping heuristic in the mix. But this is something else that could be revisited when the new overlay hits the codebase.

comment:5 by dbaston, 5 years ago

Yes, the cascaded union for lines is very expensive in some cases. I put together a performance benchmark that's like the #982 case (many short lines), and removing the cascaded union improves performance by about 85%. So it might make sense to use the cascaded approach only if the standard approach throws an exception.

comment:6 by mdavis, 5 years ago

The data in #982 is absolutely a worst-case scenario for CascadedUnion, since it has very little spatial coherence, and the number of nodes is on the order of O(n2). So it's not surprising it's very slow.

Invoking CascadedUnion only on union exception would be a quick fix for this case. But I have my doubts about using Cascaded Union at all for sets of lines. I think it should be possible to determine how the Cascaded Union code is improving robustness (likely due to snapping) and simply invoke that directly. Although, the snapping heuristic is possibly low-performance itself, and so it might not produce much advantage.

Ultimately, better noding is going to be the solution. Either snap-rounding, or perhaps with further research an approach that preserves more precision where possible.

comment:7 by dbaston, 5 years ago

I think the tickets referred to in the UnaryUnion comments suggest that the cascaded union helps robustness by unioning nearby lines together (which supposedly helps because of common bit removal.) I think it's becoming off-topic for this ticket but we can discuss at https://github.com/libgeos/geos/pull/213

comment:8 by pramsey, 3 years ago

Milestone: 3.10.0

comment:9 by robe, 3 years ago

Milestone: 3.10.03.11.0

Retargeting in prep for GEOS 3.10.0 release

comment:10 by pramsey, 2 years ago

Resolution: invalid
Status: newclosed

This issue feels stale and could probably use a new version over on GitHub with tests and datasets against main.

Note: See TracTickets for help on using tickets.