Opened 15 years ago

Closed 14 years ago

#399 closed task (fixed)

ST_MakeValid

Reported by: strk Owned by: strk
Priority: medium Milestone: PostGIS 2.0.0
Component: postgis Version:
Keywords: Cc: andrea.peri@…

Description

A function attempting to turn an invalid geometry to a valid one.

Change History (34)

comment:1 by kneufeld, 15 years ago

What about linear features (since validity only applies to areal features)? It's quite common for people to want to turn non-simple lines into noded simple ones.

May I suggest we have two functions to mirror IsValid and IsSimple: ST_MakeValid and ST_MakeSimple.

ST_CleanGeometry could be a wrapper for both, but ST_MakeSimple would likely have to return a geometry set

comment:2 by pramsey, 15 years ago

Similarly for area features, a figure-eight polygon can be made valid by (a) discarding one of the lobes (practical if the lobe is below the cleaning tolerance) or (b) turning it into a multipolygon. I think in general these functions will have to return MULTI* regardless of their inputs.

comment:3 by kneufeld, 15 years ago

Ah, yes, of course. But, return MULTI* or a geometry set? I think it has to a set.

The only way to make this multipolygon valid is to return two separate polygons MULTIPOLYGON(((0 0, 0 1, 1 1, 1 0, 0 0)),

((1 0, 1 1, 1 2, 2 1, 1 0)))

comment:4 by pramsey, 15 years ago

MULTIPOLYGON is allowed to touch at one point:

select isvalid('MULTIPOLYGON(((0 0, 0 1, 1 1, 1 0, 0 0)), ((1 1, 1 2, 2 2, 2 1, 1 1)))');
 isvalid 
---------
 t
(1 row)

Gets more complex as the rings start to share whole segments… in those cases the boundaries should actually be dissolved… which is what the buffer(0) case does.

comment:5 by strk, 15 years ago

Sorry to get in the middle of this debate, just wanted to add a link to a wiki page with ideas about handling the areal case: http://trac.osgeo.org/postgis/wiki/UsersWikiCleanPolygons

comment:6 by strk, 15 years ago

Now to get INTO the debate… why returning MULTI* always ? I'd rather get a SINGLE back if the input can be represented as such. The user still have the option to force to a MULTI later after all…

comment:7 by pramsey, 15 years ago

Why return MULTI? Tyranny of homogeneity. If you don't, you'll end up with a heterogeneous result set. However, a guess an argument can be made that we already return those and people wrap them in ST_Multi(). On the third hand though, if everyone is always wrapping their geometry processing calls in ST_Multi() maybe that means we're doing things wrong to start with.

in reply to:  4 comment:8 by kneufeld, 15 years ago

Replying to pramsey:

MULTIPOLYGON is allowed to touch at one point:

select isvalid('MULTIPOLYGON(((0 0, 0 1, 1 1, 1 0, 0 0)), ((1 1, 1 2, 2 2, 2 1, 1 1)))');
 isvalid 
---------
 t
(1 row)

Gets more complex as the rings start to share whole segments… in those cases the boundaries should actually be dissolved… which is what the buffer(0) case does.

Right, Mulitpolyons can touch at a point, but not on a line. You think it's best to automatically dissolve bordering polygons? Yes, it is what buffer(0) does, but buffer(0) was never intended to clean. What about having a function parameter so the can user specify dissolve or not?

comment:9 by strk, 15 years ago

If you don't want to dissolve you can always ST_dump() and take each component apart, then re-combine the ones that *really* belong to the feature (as the only reason for not wanting a dissolve here would be to split the feature)

in reply to:  6 comment:10 by kneufeld, 15 years ago

Replying to strk:

Now to get INTO the debate… why returning MULTI* always ? I'd rather get a SINGLE back if the input can be represented as such. The user still have the option to force to a MULTI later after all…

I agree, a SINGLE back would be best if the input can be represented as such. I was just thinking of corner cases … multipoints, multipolygons, geometrycollections.

I suppose always returning a MULTI works if bordering polygons are always dissolved in multipolygons, and the same input (or null, or error) is returned for multipoints.

Yeah, ok, you've convinced me. Collections are definitely easier to work with than sets.

comment:11 by strk, 15 years ago

Status: newassigned

Here's an interesting case for you guys to play with:

POLYGON((10 10,10 20,30 20,30 10,20 10,20 30,30 30,30 20,5 20,5 25,10 25, 10 10))

If the goal of a ST_CleanGeometry is to maintain as much information from the input but make it into a valid shape, the above would sound to me as becoming a mix of two polygons and two lines.

comment:12 by kneufeld, 15 years ago

Sure. Even the case of a polygon whose inner ring shares the exterior ring boundary line, or one where a portion of the polygon collapses to a line.

We'd probably want to return GeometryCollections of polygons and lines in these cases, rather than arbitrarily throwing away data.

comment:13 by pramsey, 15 years ago

I disagree. The key piece of information is 'POLYGON'. The expectation is that feature is a polygon. Therefore anything in the lower dimensions (linear parts, point parts) is fair game to be discarded. As you add tolerances, even actual two dimensional parts that are below tolerance are fair game to be discarded.

comment:14 by strk, 15 years ago

about "polygon expectation", this reminds me that adding typed-empty support might be a good idea (or can only return GEOMETRYCOLLECTION currently for a collapsed POLYGON)

comment:15 by strk, 15 years ago

Alright, here is my client suggestion for the ST_GeometryClean contract:

  • Returned geometry will have same dimensionality of input
  • No vertices from original geometry will be discarded
  • The returned geometry will be valid

Failing any of the above should throw an exception. Adding vertices is fine.

comment:16 by kneufeld, 15 years ago

This implies that a polygon where a portion collapses to a line returns a GEOMETRYCOLLECTION of a polygon and line (resultant dimension = 2). (if we observe bullet two)

Also, a polygon where the whole thing collapses to a line returns …? NULL? An exception that breaks the current transaction block? A POLYGON EMPTY?

comment:17 by pramsey, 15 years ago

I think the contract needs to include a tolerance and a willingness to discard things under tolerance. I'm thinking particularly about small loops in rings that can be safely discarded without losing the intent of the ring.

I think NULL or EMPTY is the answer if something collapses below the input dimensionality.

comment:18 by strk, 15 years ago

We agree on NULL as the implementation for "exceptions". We don't want real exceptions to abort transactions.

No tolerance Paul. This is really more an ST_makeValid then a ST_cleanGeometry (we might actually be doing what Kevin suggested in first comment).

Kevin, about GEOMETRYCOLLECTION you're right, I haven't thought about dimensionality of COLLECTIONs being given by max dimensionality in collection. But in this case they would like to avoid collections. If a collection is required to express all input, it'll be considered an impossible case to resolve (thus NULL).

I wonder if I'll end up implementing with just 'return NULL' :P

comment:19 by pramsey, 15 years ago

It is possible to run a tolerance-enabled process with a tolerance of 0 :)

comment:20 by kneufeld, 15 years ago

It does sound like we're talking about requirements and expectations for different functions.

Paul, I do like your idea of a tolerance. I think this is common use case - users wanting to remove jitter from their geometries. But how does one define jitter based on tolerance? Jitter could be removed by running through Douglas-Peuker, except that it may not be desirable to also simplify the geometry. So, having something that removes that small loop or collapsed portion of a polygon is good. This sounds like it could be a wrapper for Sandro's proposed contract that never drops a vertex - a function that compares the input and resultant geometries, dropping all of lower dimension and proportional size.

Sandro, if the functional contract is to never drop a vertex (which I think is a good idea) *and* never returns a collection, I think you're right, you'll be returning null an awful lot. Nevertheless, this too could be a simple sql wrapper function. IF geometrytype(resultant_geom) = 'GEOMETRYCOLLECTION' return null;

Personally, I like the proposed contract. Same dimension, no dropped vertices, is_valid, (and maybe is_simple) - a function similar to buffer(0) that deconstructs all input geometries and rebuilds as best as it can. Non-simple points and lines are made simple, non-valid polygons are made valid, collections are created where necessary. Wrappers (or parameters I suppose) can then be used to filter the resultset. My 2 cents.

comment:21 by strk, 15 years ago

A minor adjustment to contract statement 2:

  • No *distinct* vertices from original geometry will be discarded (consecutive equal vertices removal is expected)

comment:22 by strk, 15 years ago

Alright. Something to see is in r5240.

It is an ST_MakeValid() function, taking an optional boolean parameter saying if dimensionally collapsed parts should also be collected in output. You can take a look at the new regression test for example of cases handled by this one.

Collapsed parts are basically only portion of rings which go up and down the same path and do not fall inside the containing shell. They are also referred to as "cut edges" from polygonize operation. The function currently doesn't collect dimensional collapsed rings which are internal to the shell as that part is considered already included in the topological point set found in output.

In the same revision there's a stub for an ST_CleanGeometry() basically wrapping ST_MakeValid and performing additional cleanups (forceRHR, removeRepeatedPoint) and checks on output (same type, no discarded points). This is just a stub, not completed yet (and no regress test consequently).

Early tests and comments welcome.

comment:23 by strk, 15 years ago

So, after leaks plug and some changes in interface, here's a summary about current ST_MakeValid:

  1. If input is already valid, it is returned untouched
  2. If it is not, a correction attempt will take place
  3. If correction strategy can't be found (unsupported invalidity or type), NULL is returned.
  4. The return is guaranteed to be valid or null
  5. The return is likely to "look-like" what you get by rendering the invalid one

Alright, point 5 has little of scientific here. The requirement was worded as "maintains the input profile". A kind of topological approach might be saying: "represents all interior and boundary of input geometry". Only problem is that being the input invalid what's interior and what exterior is really an interpretation.

Important issue:

  • Point 5 above implies that a dimensionally collapsed inner ring of a polygon will not necessarely need to be returned as a LINESTRING or POINT component of a collection. This is because the whole ring is already represented as being in the interior of the returned geometry. You can see this is different from ensuring "no input vertexes will be discarded".

I'm currently handling the "check input vertices" thing in the ST_GeometryClean wrapper, but maybe it's worth to move it down to ST_MakeValid.

comment:24 by strk, 15 years ago

Just to add an use case for the CleanGeometry (this one yes wants a tolerance). I'm working with statistical regions and building their boundaries by ST_Union'ing the boundaries of the countries those regions are composed by.

The resulting union is full of very small gaps near the original country boundaries. I'm cleaning the result by hand (qgis) but would rather run an ST_CleanGeometry. The tolerance in this case might be a factor of the total area which would drop any internal ring smaller than that factor. Need to think more about consequences though …

comment:25 by strk, 15 years ago

So, as of r5281 the ST_MakeValid function does a good job at retaining all input vertices. It does so at the cost of putting some lines in a collection togheter with polygons. But also, it tries its best to reduce the amount of lines stuffed in by dropping those that have all vertices already represented as nodes on the areal part boundary.

You still get lines which fall *inside* the areal part, if they do have vertices which are not on the boundary.

Now I guess a ST_GeometryClean might try to do something with those lines. For example, if tolerated (how to express?) it could just drop those collapsed portions of rings. Or it could try to make a very small buffer on them.

comment:26 by strk, 15 years ago

Cc: andrea.peri@… added

Behaviour changed again. Now even lines which have all vertices already represented as nodes of the original boundary are kept, as long as they are not *part* of the original boundary. Maximum effort is done at keeping all the boundaries. This means dropping areas as required to keep all lines (think two overlapping polygons with the overlap part becoming an hole of the output).

I haven't checked carefully but I think vertices of lines which do not add significant informations to the line itself (collinear) are not necessarely maintained. It is left to the caller to check if all vertices are still represented.

I guess it's time to work on documentation and testing.

comment:27 by strk, 15 years ago

Summary: ST_CleanGeometryST_MakeValid

Changed the summary as this one really became the ST_MakeValid. Will open a new one for ST_MakeSimple.

comment:28 by strk, 15 years ago

See #455 for ST_MakeSimple

comment:29 by strk, 15 years ago

Now, what would you think about buffering the collapsed parts of the original boundary and rebuild an area taking the new boundaries into account ? Would improve maintaining the original geometry type and still drop no vertices.

comment:30 by strk, 15 years ago

ofc I'm talking about a real single-sided buffer (not the offset-curve thing)

comment:31 by strk, 15 years ago

forget about the single-sided thing, must eventually be put in an ST_CleanGeometry function. I think ST_MakeValid is good as is.

comment:32 by strk, 15 years ago

Ok, time for some testing here. I've documented the current interface and functionality in the manual.

comment:33 by mwtoews, 14 years ago

If you are looking for test-cases for input and expected output, add this: http://trac.osgeo.org/geos/ticket/380

comment:34 by strk, 14 years ago

Resolution: fixed
Status: assignedclosed

This task is completed.

Note: See TracTickets for help on using tickets.