Opened 4 years ago
Closed 4 years ago
Last modified 4 years ago
#982 closed defect (worksforme)
Postgis ST_ConcaveHull Runtime GEOS Problem?
|Reported by:||tsw||Owned by:|
A concave hull query with the attached data set under Postgresql 9.4 / PostGIS 2.2 / Geos 3.5.0 runs in under a second.
The same query using the same data on Postgresql 9.6 / PostGIS 2.5.2 / GEOS 3.7.1 is still running after 30 minutes.
I've also attached a perf top screen shot which to me implies this is a GEOS problem.
Change History (24)
by , 4 years ago
|Attachment:||Screenshot from 2019-08-02 11-25-34.png added|
by , 4 years ago
Data set used to generate error
comment:1 by , 4 years ago
This might be related to #867
comment:2 by , 4 years ago
|Status:||new → closed|
I'm more inclined to think this is a PostGIS issue: note the addition of a ST_Union call at https://github.com/postgis/postgis/commit/ef537b71ca46bfab44f481877b98bd59020f6acb
Feel free to reopen if I'm mistaken.
comment:3 by , 4 years ago
|Status:||closed → reopened|
PostGIS ticket: https://trac.osgeo.org/postgis/ticket/4469
Here author says he reverted this change and it made no difference for him: https://firstname.lastname@example.org/msg07593.html
I tried commenting out the line in the ST_ConcaveHull function you suggested but that doesn't make a difference. I suspect there is some underlying library that has changed.
@dbaston would be cool if there's a way to express your intention without closing tickets as "invalid", as this one is a valid one.
comment:4 by , 4 years ago
@Komzpa, would be cool if you could help out by running this query on a single version of PostGIS with two different versions of GEOS to confirm the performance difference.
comment:5 by , 4 years ago
FWIW, if I run ST_ConcaveHull(geom, 0.99) using GEOS master against PostGIS trunk, the query takes forever. If I run it using GEOS master against the ST_ConcaveHull implementation from PostGIS 2.3.10 it completes instantly.
comment:6 by , 4 years ago
I've further tested this today and calls to public.ST_UnaryUnion in ST_Concave hull don't make any difference as they are only applied to polygon geometries, and in this case, lines are the problem.
HOWEVER, the ST_UnaryUnion call made at the end of _st_concavehull added in response to issue 3638 is the source of this problem. If this is commented out the query runs instantly. Not sure if there is another way to address 3638, but this change causes the slowdown I document here.
comment:7 by , 4 years ago
So, slow part is ST_Union / GEOSUnaryUnion, unioning the attached lines and their hull posted below (almost-hull, but that's PostGIS side issue). Brokenlines may seem to require insane amount of noding, but in the actual call almost all of it can be skipped if some kind of shortcut like "drop the line if it's fully covered by polygon" is implemented.
comment:8 by , 4 years ago
So I am again inclined to close this as a PostGIS issue.
comment:9 by , 4 years ago
@dbaston PostGIS is calling GEOSUnion in natural way as part of higher level algorithm.
Single GEOSUnaryUnion call is working unexpectedly slow. This manifests as slow PostGIS. That manifestation is problem of PostGIS. The slowness of GEOSUnaryUnion is GEOS problem, will be there whether ST_ConcaveHull is rewritten or not, and this ticket has everything to reproduce it.
If GEOS is not concerned of performance of overlay ops then this ticket can be closed together with #867. I hope this is not the case.
comment:10 by , 4 years ago
@Komzpa This ticket description claims that there is a performance regression in GEOS. Is there a performance regression, or something unique about these inputs, or are you just saying that GEOS is generally not as fast as you'd like it to be?
comment:11 by , 4 years ago
In other words, should the ticket be reframed as "please optimize the case of unary union of lines with many intersections" ?
comment:12 by , 4 years ago
@dbaston this ticket does not claim there's performance regression in GEOS. It claims there's a performance issue in current PostGIS profiling of which ends up in GEOS.
Best way of handling such issues it to link (or block) them to possible fix implementation tickets and recheck whether the issue still stands after each of those. This one can be blocked on overlays-ng implementation if that's ticketed, and on in-GEOS concave hull if that might become a thing. This way original conversation is not lost, and complexity of solution space is captured.
comment:13 by , 4 years ago
But this has nothing to do with concave hull, in the end? The issue is that you'd like the unary union of many lines to be faster, no? Would that not be better captured in a ticket that leaves PostGIS, concave hull, and all other noise out of it?
comment:14 by , 4 years ago
It may be the concave hull with this data set identified an underlying issue.
As I don't understand the underlying algorithm however I'll point out a few things. If you run the test ST_Concavehull query as it currently stands in PostGIS and GEOS and buffer the lines by 10 meters, it is still never completes. I'm not sure that indicates a GEOS problem or not.
Manipulating the _st_concavehull code to remove UnaryUnion fixes the performance but then not everything is encapsulated by the hull.
I'll leave it to you folks to determine the cause and the fix, but wanted to provide as much information as possible to help diagnose the cause.
comment:15 by , 4 years ago
Thanks for the report, @tsw, and sorry for leaving you caught in the crossfire.
comment:16 by , 4 years ago
Wading in here... to my mind the use of ST_Union to cover up deficiencies in the Concave Hull algorithm seems very fishy. That should not be needed if the Concave Hull algorithm was correct.
So while this is hitting the GEOS issue with slow UnaryUnion (probably #867), it's not a direct example of that problem.
Since the Concave Hull algorithm is in PostGIS this should be ticketed there as a request for an improved algorithm.
comment:17 by , 4 years ago
Now that I've looked at the data in this test case, I can see that this dataset is absolutely a worst-case scenario for union. The reason is that there are so many intersections between the lines, it is going to produce on the order of O(n2) node points. This means the noding process has to run in slow O(n2) time. Also, memory usage is going to be very large.
So this is an example of where using Union within the Concave Hull algorithm is not a good idea.
comment:18 by , 4 years ago
(If I recall, it was something like 2.3 million intersections)
comment:19 by , 4 years ago
From input containing about 3000 line segments. So yep, O(n2).
comment:20 by , 4 years ago
Here's the stats from JTS:
A : GEOMETRYCOLLECTION - 1007 elements, 3485 pts
Result : MULTILINESTRING - 1370751 elements, 2742973 pts
Time: 72.4 s
Because there are a lot of >= 3 point linestrings, the result component count is more than the square of the input component count. It's good to see the close correlation of the length of the input and output datasets.
comment:21 by , 4 years ago
|Status:||reopened → closed|
I'm not sure what success looks like for this ticket, some comments to which seem to be asking for arbitrary operations to complete in no measurable time, but hopefully the overlay performance improvements in 3.8 will be adequate.
comment:22 by , 4 years ago
Would success look like a performance which is much closer to the old time value? Maybe not "under a second", but no more than a few seconds (hopefully).
screenshot of perf top while query is running