Opened 12 years ago

Closed 11 years ago

#1987 closed defect (fixed)

Results given by ST_Simplify inconsistent between PostGIS 1.5.2 and 2.0.1

Reported by: ChrisInCambo Owned by: pramsey
Priority: blocker Milestone: PostGIS 2.0.3
Component: postgis Version: 2.0.x
Keywords: history Cc:

Description

I've recently upgraded to 2.0.1 and have noticed some inconsistencies between the both version in the behaviour of ST_Simplify.

If you run the statement from the docs:

SELECT ST_Npoints(the_geom) As np_before, ST_NPoints(ST_Simplify(the_geom,0.1)) As np01_notbadcircle, ST_NPoints(ST_Simplify(the_geom,0.5)) As np05_notquitecircle, ST_NPoints(ST_Simplify(the_geom,1)) As np1_octagon, ST_NPoints(ST_Simplify(the_geom,10)) As np10_triangle, (ST_Simplify(the_geom,100) is null) As np100_geometrygoesaway FROM (SELECT ST_Buffer('POINT(1 3)', 10,12) As the_geom) As foo;

Both return the same results for each column except "np100_geometrygoesaway", with 1.5.2 returning true as expected and 2.0.1 returning false.

When I use this method using real world data the differences are massive. For example the feature that I've tested on was 2120 points before simplification, then gets reduced to 7 when using a threshold of 0.1 on version 1.5.2, but for version 2.0.1 it only gets reduced to 2110.

Change History (39)

comment:1 by strk, 12 years ago

This is due to #1698 — I guess it'll take some kind of parameter to make everybody happy…

comment:2 by ChrisInCambo, 12 years ago

Thanks, Strk. Is there any kind of workaround, because for me (and I imagine most others) is to reduce the size of the data down the wire. At the moment, using ST_Simplify isn't shaving off enough to make it viable.

comment:3 by strk, 12 years ago

You may pass the result of ST_Simplify trough ST_MakeValid and filter out non-areal component. It'll be slower than the original behavior, but should give you the same results.

comment:4 by ChrisInCambo, 12 years ago

Thanks, I tried that but it made no difference to the results for me. "np100_geometrygoesaway" still comes back as false and my actual data still reduces 2120 points down to 2110 rather than 7 as it did previously. Thinking I'll probably have to rollback our project from 2.0.1 to 1.5.5

comment:5 by strk, 12 years ago

Are you still checking for NULL or also for EMPTY ? Another thing you could do is turn polygons with area < a given threshold to a null

comment:6 by pramsey, 11 years ago

Coming back to this, should we have a parameter and the option to return to previous behaviour? Or more to the point, make the previous behavior the default and set the "preserving" behavior to be the new optional one?

comment:7 by strk, 11 years ago

Indeed, we should. I know it may open a can of worms, but yes, we should. Ideally the option "policy" would be extensible to specify as many policies as possible. One of these could be "topology_preserving", for example, which could also mean one day getting rid of the other signature.

Are you feeling you'd want to dig into it ?

comment:8 by strk, 11 years ago

Example policies on line collapse to point:

  • Set to NULL
  • Turn to POINT
  • Set to LINE EMPTY
  • Omit from an outer collection (if any)
  • Only omit from outer collection if doesn't touch its convexhull
  • others ?

comment:9 by strk, 11 years ago

Similarly for polygons collapsing to line:

  • Set to NULL or EMPTY
  • Turn to LINE or POINT
  • Only do any of the above IFF not touching an eventual container's convexhull

comment:10 by robe, 11 years ago

Milestone: PostGIS 2.0.2PostGIS 2.1.0

Well all the things you are discussing don't seem like something we can do in a micro release so I'm pushing this to 2.1

comment:11 by strk, 11 years ago

Milestone: PostGIS 2.1.0PostGIS 2.0.2

Yeah, but the things I'm saying would be enhancements, the default should indeed be the old one. Setting back to 2.0.2.

The discussion above is only to think about a new parameter that is flexible enough to allow for more policies to be added later without changing the function signature.

comment:12 by robe, 11 years ago

No new default args can be introduced in a micro.

Reason being simply because someone can start passing arguments to the default and then your 2.0.1 / 2.0.2 allow different valid inputs. Bad.

or are you saying you'll change 2.0 behavior back to 1.5 behavior (and don't allow overriding?) but for 2.1 you would allow this policy to be passed in?

comment:13 by strk, 11 years ago

Doesn't necessarily need to be a DEFAULT arg. It could be two signatures.

comment:14 by robe, 11 years ago

You can't introduce another signature either. That implies a new function. Not allowed in a micro.

comment:15 by robe, 11 years ago

Milestone: PostGIS 2.0.2PostGIS 2.1.0

strk I'm changing this back to 2.1 until you come up with a better argument why you think this can be allowed in a micro :)

Even the — let's change the behavior back to 1.5 behavior is borderline acceptable because stark behavior changes between majors are always allowed (would require a document fix), but a regression back in a micro is technically only allowed if you feel the new behavior is a bug.

comment:16 by pramsey, 11 years ago

It *IS* a bug, the behavior should be reverted in the minor version and in 2.1 optional additional behaviors can be added.

comment:17 by robe, 11 years ago

Milestone: PostGIS 2.1.0PostGIS 2.0.2

Okay — so we should make a separate ticket for 2.1.0 to introduce this new argument. And this ticket should ONLY fix regress back to old 1.5 behavior

comment:18 by robe, 11 years ago

New related ticket added at #2093

comment:19 by strk, 11 years ago

I was looking at reverting the change and also at adding more testcases when I realized that for LINESTRING objects the same "become EMPTY" behavior doesn't apply (a line never collapses to a point, as the DP algo never drops single segments).

Do we really want exact behavior of 1.5 here ? It's kind of inconsistent with self: polygons can disappear but lines can't…

comment:20 by robe, 11 years ago

Priority: highblocker

It would be really nice if we can come up with some sort of sane solution that doesn't piss everyone off before we release a 2.0.2.

comment:21 by pramsey, 11 years ago

I think we're agreed to revert the behaviour (or maybe strk can confirm w/ the reporter what portion of the behaviour reversion is required). And yeah, we should do this on 2.0.2 so it doesn't leak any further into the future.

comment:22 by strk, 11 years ago

ChrisInCambo, could you tell us more about your dataset and requirements ? Do you think a LINESTRING composed by a single segment of length X should also be set to NULL when the given tolerance is >X ?

comment:23 by strk, 11 years ago

Another question: should ST_Simplify(ANY_EMPTY) return NULL ?

comment:24 by strk, 11 years ago

Keywords: history added
Resolution: fixed
Status: newclosed

Oh well, I restored full compatibility with 1.5 as of r10702 in trunk and r10703 in 2.0 branch. Tested the same testcase (augmented) against 1.5 to verify.

It's still inconsistent (between polys and lines) but it the same old behavior now.

comment:25 by strk, 11 years ago

Chris, are you still reading these notices ? Interested in your use case to decide how to move on with further ST_Simplify changes… See postgis-devel mailing list, if subscribed ( http://lists.osgeo.org/pipermail/postgis-devel/2012-December/023152.html ) , or refer to #2093

comment:26 by jatorre, 11 years ago

It is quite a pity there were inconsistencies between 1.5.2 and 2.0.1, but even worst that we then created inconsistencies between 1.5.2, 2.0.1 and now 2.0.2.

I highly vote for a parameterized ST_Simplify in order to recover the behavior of 2.0.1. We were not prepared to handle the change on the semantics from 2.0.1 to 2.0.2 and broke some of our software by oversimplifying in a way we were not ready. I would have understood a break on 1.5.2 to 2.0, but such a break at a minor revision… uff I understand it could be considered a bug, but uff.

comment:27 by robe, 11 years ago

-1. No parameterized arguments allowed in a micro. Has to wait for 2.1. How would anyone know about this extra arg suddenly appearing in 2.0.3. How would software code around this potential extra argument option when they are assuming they only need to check a minor version number. Sorry to be a stickler about this, but I see adding a parameter in a micro causing more problems than solving. Plus it breaks our promise a long time back to PostgreSQL group that we would stop stop changing user facing apis in a micro.

comment:28 by jatorre, 11 years ago

I was thinking on an optional parameter, but I get your point.

Then I would say it is not been a good idea to include this change neither on a minor version. I therefore would like to consider then roll back to the way the function was working on 2.0.1. I consider the change of the output of this function something that should not be done in a minor version. And for users who started using PostGIS 2.0 this can be a real problem, specially since the change has been done without a way to preserve the previous simplification method.

At the end of the day I can understand both ways can be considered a bug, but the question is to who should we prevent more problems? To users already running 2.0 or those upgrading from 1.5? I think a semantic change like this should be done on major versions. Therefore I propose release of 2.0.3 with the same behavior as 2.0.0 and 2.0.1. And in 2.1 include a parameterized version (with 2.0 behavior as default).

comment:29 by robe, 11 years ago

+1 works for me. I'm even fine with if we revert back to the old 2.0.1 but have some legacy function install for the old behavior. With a note in the docs for those who badly need the old 1.5 to hack their system with it. We do that for the gist index naming issue already. It's too bad we don't have a legacy mode like some other systems e.g. SQL Server where you can turn on a bit and say use old 2005/2008 behavior.

comment:30 by chrisincambo, 11 years ago

Hi Strk, yes I'm still listening. I missed the last update though. Sorry, I had trouble following the thread, am I correct in saying that the 1.5.x behaviour has been restored in the 2.0.2? Or has 2.0.2 introduced a parameter?

Thanks for all your inputs guys, it's much appreciated!

comment:31 by strk, 11 years ago

robe: we _might_ have a "legacy mode" (or bigger set of params) as a GUC. {{{SET postgis_simplify_collapses_as_null TRUE}} kind of thing. Or I could add an st_simplify_nullifcollapsed for a legacy, is that what you had in mind ?

Chris: 2.0.2 restored 1.5.x behavior, but I keep thinking that behavior is suboptimal because it destroys information that could be destroyed by a subsequent function (ie: drop areas smaller than X). I think the most help you can give is by providing us the data that was giving you problem, to better analyze what the problem was.

comment:32 by strk, 11 years ago

Resolution: fixed
Status: closedreopened

Another use case calling for retaining at least the boundary of polygons (ie: never turn to NULLs): https://github.com/mapnik/mapnik/issues/1639

comment:33 by strk, 11 years ago

Milestone: PostGIS 2.0.2PostGIS 2.0.3

comment:34 by strk, 11 years ago

Interesting numbers about simplification capability of old (1.5.x aka 2.0.2) behavior compared with that of new (2.0.1) behavior against a real world dataset: ~8k municipalities of italy, rendered in a 256x256 pixel image enclosing the whole world. https://github.com/Vizzuality/Windshaft-cartodb/issues/65#issuecomment-11331573

comment:35 by robe, 11 years ago

I think trying to introduce a GUC in a micro is also a no-no since it changes the user-front facing api.

However as long as we don't as part of what we call "2.0", introduce a new user facing api we call "part of 2.0", there is nothing that prevents us from say providing a supplemental script — introducing that little flag (which maybe you have in the C api already), and exposing it in the legacy function (as the default behavior). I think it may be so legacy that we wouldn't really have it as part of our core legacy script. More like we have the gist index one as a separate file altogether, to distance ourselves from the dirtiness of it. :)

Keep in mind that whenever people upgrade their scripts, it would modify their install to new behavior (unless it was called something different from ST_Simplify like ST_LegacySimplify). So the only issue is whether people would be satisfied with a legacy function with a different name. I think that is legit because all the legacy functions we have none of them are names we consider part of 2.0 except they were in 1.5. But then it's people's business whether they go in and then rename it the old name so they don't need to change their apps. I'll also just look the other way :).

Hmm I'm beginning to sound like a politician, I might need to take a long hot shower. How do we break/bend the rules without technically breaking the rules.

Then in 2.1 we introduce a GUC that allows people to set all sorts of legacy as their preferred behavior and default args that can be overridden for one-off application requirements.

comment:36 by robe, 11 years ago

Part of this fix, we are going to have to update the docs. Right now nothing has been said of it to warn people of the change in behavior.

http://postgis.net/documentation/manual-2.0SVN/ST_Simplify.html

I'm going to update now just to link to this ticket.

comment:37 by robe, 11 years ago

Partial update of docs in r10823 and r10824. I still need to copy those comments to 2.1 docs and probably more needs to be said once we've come up with some sort of consensus.

comment:38 by strk, 11 years ago

pramsey: anything to add to this thread ? Chris: you're still not showing any data… which is frustrating :/

To be honest I keep thinking that turning geometries into NULL is not nice, as complete location information is gone. I understand the need to reduce as much as possible, and I think "as much as possible" is a single point, not NULL. NULL is really dropping information, not generalizing it.

comment:39 by strk, 11 years ago

Resolution: fixed
Status: reopenedclosed

What's left to do here ? It isn't clear by the comments. I'm closing. Feel free to reopen if anyone knows what this is about.

Note: See TracTickets for help on using tickets.