Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1806 closed defect (fixed)

Extremely slow and CPU-intensive ST_MakeValid (ST_BuildArea) case

Reported by: strk Owned by: strk
Priority: high Milestone: PostGIS 2.0.1
Component: postgis Version: 2.0.x
Keywords: Cc:

Description (last modified by strk)

The attached shapefile contains a single geometry. A multipolygon composed by a single polygon containing 4213 rings with a total of 82304 vertices.

Passing it as an argument to ST_MakeValid takes forever.

Attachments (1)

slowmakevalid.zip (836.3 KB ) - added by strk 12 years ago.
shapefile with the single offending geometry

Download all attachments as: .zip

Change History (24)

comment:1 by strk, 12 years ago

How does it look in qgis:

No image "offendingline.png" attached to Ticket #1806

comment:2 by strk, 12 years ago

It looks like the procedure is stuck within LWGEOM_GEOS_buildArea

comment:3 by strk, 12 years ago

Summary: Extremely slow and CPU-intensive ST_MakeValid caseExtremely slow and CPU-intensive ST_MakeValid (ST_BuildArea) case

Yep, confirmed you can hook things forever by calling ST_BuildArea(ST_Boundary(geom))

comment:4 by strk, 12 years ago

Owner: changed from pramsey to strk
Status: newassigned

Also memory usage keeps increasing.

Plan: rewrite BuildArea to avoid all those SymDifference calls and simply skip polygons fully contained in other polygons (assuming those ones are holes of the other). Should work pretty good and fast…

comment:5 by strk, 12 years ago

polygonize finds 3536 polygons, so ST_BuildArea is stuck computing the symdifference between each of them and the aggregated result. While it goes on the time it takes to compute the symdifference grows. It starts with roughly 20 items per second. At iteration 1700 computes ~5 symdifference per second… Every iteration has to rebuild the topology which was already built in the previous iteration.

by strk, 12 years ago

Attachment: slowmakevalid.zip added

shapefile with the single offending geometry

comment:6 by strk, 12 years ago

Step back: I was completely wrong about how the geometry was rendered. At least, I can't handle to render it that way again. The information about composition is correct.

comment:7 by strk, 12 years ago

Description: modified (diff)

comment:8 by strk, 12 years ago

Another observation: the output of ST_BuildArea is plain _wrong_ in this case ! I'll work on reducing the case to address correctness first.

comment:9 by strk, 12 years ago

Another false alarm: ST_BuildArea was just wrong because I was passing it non-noded input. ST_MakeValid performs the noding step itself.

comment:10 by strk, 12 years ago

r9729 sorts polygonized output items by number of points, slighly improving performances. A reduction of the offending input, with number of holes cut from 4000+ to only 500 showed a speedup from 12 seconds to 9 seconds. Not such a big improvement.

Another thing to look at is short-circuits in GEOSSymDifference (none seen as of 3.3.3)

comment:12 by strk, 12 years ago

After speeding up GEOS more, this case didn't gain that much speed. It's obviously a killer to perform 4000+ overlay operations… need a different approach!

comment:13 by strk, 12 years ago

Alright I've an implementation speeding a reduced case up by a factor of 100. The reduced case gets "only" 418 polygons from polygonize. I'm testing more and will finally commit.

comment:14 by strk, 12 years ago

Resolution: fixed
Status: assignedclosed

Alright, with r9731 the testcase completes ST_MakeValid in 27 seconds. Before that commit the only BuildArea phase (after noding the boundaries) would have taken 27 _minutes_. I'm pretty satisfied.

comment:15 by hugoledoux, 12 years ago

We have designed and implemented a different algorithm to automatically repair polygons, it avoids the 4000+ overlay operations mentioned for this polygon (behaviour is not quadratic like ST_MakeValid). The code is not in PostGIS though, but can be obtained there: http://tudelft-gist.github.com/prepair/ For this polygon, it takes ~3s on my laptop.

comment:16 by strk, 12 years ago

@hugoledoux: thank you very much for the pointer, looking forward to see that code get into PostGIS!

Meanwhile I shall point out that as of r9731 the complexity of ST_MakeValid is not quadratic anymore, doing a single overlay during the BuildArea phase.

The current hot-spot for performance impromvements in ST_MakeValid are the polygonizer (finding 5561 faces in about ~15 seconds) and a Difference call used to find which edges are missing from the output of BuildArea (finding no boundary missing in ~12 seconds). Plus there are ~3 seconds used to link 5561 polygonized faces to their ancestors (finding which face is an hole of which other face, which can be probably improved).

in reply to:  15 comment:17 by aperi2007, 12 years ago

Replying to hugoledoux:

We have designed and implemented a different algorithm to automatically repair polygons, it avoids the 4000+ overlay operations mentioned for this polygon (behaviour is not quadratic like ST_MakeValid). The code is not in PostGIS though, but can be obtained there: http://tudelft-gist.github.com/prepair/ For this polygon, it takes ~3s on my laptop.

Hi, I see the sample in your link: http://tudelft-gist.github.com/prepair/

I don't understand why the starting polygon is invalid. However starting from a hypotetical invalid polygon. I guess it's solution should don't break it's typology if possible.

So if the starting geometry is a polygon with a hole, the result after the makevalid should still be (if possible) a polygon with a hole it should try to don't be another kind of geometry (a multipolygon) neither it should be an increase on the record number so it should not return two polygons (2 records) to substitute one polygon (one record).

Otherwise the MakeValid could not be usable really because it need to see one by one all the change to do to every geometry before apply it to the starting table.

comment:18 by hugoledoux, 12 years ago

@aperi2007: the polygon is actually invalid acc. to the OGC rules because the inner ring (grey ring) separate the outer ring into 2 separate polygons. The rules of the OGC state, among others, that the inteterior of a polygon must be connected. The only solution is to split the polygon into 2 polygons (which means that the inner ring disappears, indeed).

ST_MakeValid does not return per se a POLYGON, but a GEOMETRYCOLLECTION which can contain a MULTIPOLYGON, to cope with these situations.

comment:19 by aperi2007, 12 years ago

@aperi2007: the polygon is actually invalid acc. to the OGC rules because the inner ring (grey ring) separate the outer ring into 2 separate polygons. The rules of the OGC state, among others, that the inteterior of a polygon must be connected.

Sorry, I forgot the connected rule. The polygon is invalid.

The only solution is to split the polygon into 2 polygons (which means that the >inner ring disappears, indeed).

Of course this is the only valid solution.

ST_MakeValid does not return per se a POLYGON, but a GEOMETRYCOLLECTION which can >contain a MULTIPOLYGON, to cope with these situations.

Yes , the ST_MakeValid must try to maintain the original type when this is possible. In this sample this is not possible , so it will return a more complex type.

Just as puntualize question: the st_makevalid never return a set of records, it return always one only record. If the geometry invalid is more complex, the MakeValid will produce an unique geometrycollection with as many polygon or multipolygons and/or lines and/or points to cover all the vertexs of the original geometry. The important is that is wll return always one only record. Is this the strategy you follow also ?

comment:20 by hugoledoux, 12 years ago

By design we chose not to return "unused" geometries. So if an inner rings collapses to a line we do not return it to the user.

We always return one MULTIPOLYGON; if the input polygon is only a line for instance then we return an empty MULTIPOLYGON.

comment:21 by aperi2007, 12 years ago

One of the main goal of ST_MakeValid is to do not lost any vertex. This is a strong choice of cocurse, but is the only strategy to allow an user to have all the vertex in a tabel and allow a posst processing phase to repair better the geometries. Infact in a geometry like this:

 -----------------
|                 |
|     -------     |
|    |       |    |
|     -------     |
|                 |
 -----------------

For some impredictable situations could became like this:

 -----------------
|                 |
|                 |
|    --------     |
|                 |
|                 |
 -----------------

That is surely an invalid Polygon.

A strategy to transform this in a collection with this two parts:

 -----------------
|                 |
|                 |
|                 |  +    --------
|                 |
|                 |
 -----------------

Will allow to load this two geometri parts in two tables and working because this are both two valid geometries.

After this, a power user could surely analyze the two parts (polygons and lines ) and rearrange the lines restoring the original size (the hole in the polygon).

Instead a strategy where the line is lost returning only the external polygon like this:

 -----------------
|                 |
|                 |
|                 | 
|                 |
|                 |
 -----------------

Will deny any possibility to restoring the original hole.

A strategy like this don't allow to use the DB as a startup for an editing of any kind of geometry.

comment:22 by hugoledoux, 12 years ago

I agree that keeping all the geometries can be useful, but we decided to have an *automatic* method where the user presses one button and gets a valid geometry. Our method is predictable since it can be explained in 3 simple steps, so the user knows what to expect.

I believe that if the user wants to manually edit a polygon (with a GUI for instance), she can always go back to the original geometry.

Our design choice makes our algorithm very fast and scalable to very large polygon, modifying it to return all the geometries would make it use more memory and would slow it down…

comment:23 by aperi2007, 12 years ago

I believe that if the user wants to manually edit a polygon (with a GUI for instance), she can always go back to the original geometry.

This is true only in theory.

If you have an invalid geometry and you would edit it. You need to transform it in a valid geometry otherwise every operation will end with an exception.

So , you need to transform the geometry in a valid geometry to edit it with successfull. But if the ST_MakeValid lost some vertex you cannot never recreate the really geometry but only a more simply not always really true geometry.

Think for example at this sample:

 -----------------
|                 |
|     -------     |
|    |       |    |
|     -------     |
|                 |
 -----------------

Ad a garden with an hous in the middle. :)

You you see instead this:

 -----------------
|                 |
|                 |
|                 | 
|                 |
|                 |
 -----------------

You see a gently lone and isolated garden. who say you that there was originally an house in the middle ? :)

So ok, your algorithm is more rapid, but rapid is not always a giood strategy if it mean lost something that could has a cost to retrieve.

Instead if you see

 -----------------
|                 |
|                 |
|                 |  +    --------
|                 |
|                 |
 -----------------

You can understand that perhaps there was something in the middle.

However . I understand this is a question that has different point of view.

If I like to use the system only for low scales, perhaps lost something is not so important.

Ok, but this mean that postgis is not so important when an user need to do in the bigger details. And so it is a gently system only for rapid and simply internet presentation. Not for big and sophisticated elaboration of data.

Who need load something geometry on postgis before It need to work that geometry with something else product to repair , with some method that give me the real geometry don't a simplification of reality. (Perhaps oracle ?, Perhaps ArcGIS ?, perhaps AV3 ?. I don't know)

But why the user cannot user postgis ?

Response: "because postgis like to be the faster." And never is more fast than "throw away the vertex". :)

Note: See TracTickets for help on using tickets.