Opened 6 years ago

Closed 4 years ago

#2227 closed enhancement (fixed)

Visvalingam line generalization function

Reported by: strk Owned by: nicklas
Priority: blocker Milestone: PostGIS 2.2.0
Component: postgis Version: trunk
Keywords: Cc:

Description

See http://www2.dcs.hull.ac.uk/CISRG/publications/DPs/DP10/DP10.html

The algorithm is known allowing "tagging" vertices with their importance and thus allowing run-time simplification by only looking at the tags. For this reason I envision a tag-only function and a trim-by-tag one. Both would be useful.

See also http://bost.ocks.org/mike/simplify/

Change History (29)

comment:1 Changed 6 years ago by robe

Milestone: PostGIS 2.1.0PostGIS 2.2.0

comment:2 Changed 5 years ago by strk

For the record, Dr.JTS is moving his first step toward implementation:

http://sourceforge.net/mailarchive/forum.php?thread_name=CAK2ens1nc46axW16LpLvXCKcjh9-8ywFbAZLEH5uDbeKFO-BWw%40mail.gmail.com&forum_name=jts-topo-suite-user

... if we wait enough time this could be delegated to GEOS (although a tag-based approach would be nice, and I'm not sure we could obtain that from JTS/GEOS)

comment:3 Changed 4 years ago by nicklas

I plan to commit code for this soon (tomorrow or next week).

The effective area is stored as M-value

The function definition is : ST_SetEffectiveArea(geometry, float8 default 0)

Without the second parameter all points is returned.

If the second parameter is set it is used for filtering on effective area. Only higher effective area points is building the resulting geoemtry.

The function supports 3D.

Any comments or protests?

comment:4 Changed 4 years ago by robe

Owner: changed from pramsey to nicklas

comment:5 Changed 4 years ago by nicklas

Added in r13174

1 question/issue.

The first and the last point in an array is special. We don't normally want them to be eliminated. So now I give them the value DBL_MAX since that will prevent any elimination. But that looks very ugly when used with ST_Astext. See the doc entry for ST_SetEffectiveArea for illustration. Should we use something like 1e100 instead?

That should be safe too I guess.

Another ide is to give the start point and end point some value that illustrates how big the whole geometry is. Then the whole geoemtry will be removed when that value is under the requested threashold.

Thoughts?

comment:6 Changed 4 years ago by strk

Should the function be called ST_Visvalingam, for clearness ? Also, couldn't 0 be a valid threshold value for a simplification ? To only remove vertices on the segment defined by their adjacent ones.

About first/last point, we have a similar issue with ST_Simplify, and the latest idea was to use a kind of "policy" argument. I'm on a slow connection now but if you search ST_Simplify in trac you should find the previous discussion.

comment:7 Changed 4 years ago by nicklas

About naming: Maybe ST_Visvalingam is better, but it doesn't say anything about what the function is doing without knowledge of Mr Visvalingam's work. It would be like calling ST_Simplify something with DP. But I also see that EffectiveArea? doesn't say anything if you don't now the concept. I have no real opinion here so we can change it. Shall we?

About 0 a threashold: Yes of course that makes sense. But we also want a way to get back all points. We can make a distinction between gicing 0 as threashold and don't give any second parameter at all (a negative default value is maybe the easiest to avoid 2 signatures)). But to get really 0.0000 units effective area will be quite rare (except vertical and horizontal lines), so giving 0.01 (or whatever depending on what unit is used) will often give more of the desired effect.

About first point last point I think you refered to #1698, #1987 and #2093. I put a comment in #2093 about that.

comment:8 Changed 4 years ago by nicklas

No, I changed my mind. The comment belongs more here:

In #2093 it is discussed when and how to collapse a polygon or linestring. This is a quite complex discussion and I don't think that I see all facets of it.

One thing that makes it even more complicated in real world applications is that you might want different behavior on different geometries in the same data set. For example in a data set with polygons describing different areal uses. Probably you want to preserve smaller city-polygons than swamp-polygons in the middle of a forest.

At least in theory and at least with Visvalingam's algorithm we could get in control over that.

The question is what effective area we give to the first and last point of linestrings, and to the four last points in polygons. As long as we give those points the same value they will collapse and disappear at the same time. So, what we can do is to tell the function in one way or another: 1) Shall the geometries be collapsable ("end points"/"4 last points" in a range that we use when we eliminate) 2) Shall elimination be on a fixed "Effective Area" (Avoid giving any point in the geometry any higher than that fixed value) 3) Shall the elimination be size dependent (Set a value like 1.5 times the biggest effective area on the "end points"/"4 last points")

So in 2) and 3) above it is possible to control the behavior with some weight value from another field that differs between a city and a swamp.

I think the implementation can be done without to much overhead, but the problem is to communicate the above to the function in a clean way.

comment:9 Changed 4 years ago by strk

I've been indeed thinking that ST_Simplify could be extended to accept the algorithm name as a parameter. It could work as long as the same parameter type (a single float?) could be used by all signatures.

ST_Visvalingam also has the advantage of allowing both "tag-only" and "tag-and-drop" behaviors not to conflict with the name (ie: why should a "SetEffectiveArea?" function drop vertices?).

Whatever you decide to do, please provide unit testing and documentation too

comment:10 Changed 4 years ago by nicklas

Yes, that could be a good idea to extend ST_Simplify. Why not? Only thing that might confuse people is that the distance vs area parameter gives very different result with the same value. But as long as the old algorithm is the default they will need to read the doc anyway to find out about the options.

You are right that it doesn't make sense to drop vertices in a function named SetEffectiveArea?. One question is if the Visvalingam version should have an option not to include the effective area but just remove vertices like ST_Simplify? Will there be people using it that want to preserver whatever they have in M? It also uses som space to include an extra dimmension. It is written so it is easy not to write any M-value.

About unit-tests and regress-tests that is on my list. Doc is committed but looks ugly as it stands. I will look at it.

comment:11 Changed 4 years ago by nicklas

About decision: You are the one in the Steering Committee :-) As I said I have no opinion here so I am glad to just do if you steer.

comment:12 Changed 4 years ago by strk

About distance vs. area, what about taking the square root of the area as a parameter ? If the aim is hiding distortions within a device pixel the pixel side would be a valid value for both functions (elevated to a power of 2 for the area). But maybe it is better not to complicate the matter too much, and just let people rely on documentation. NOTE: in order to avoid the overhead of parsing the algorithm name for every record in a table it could be wise to turn ST_Simplify into an inlinable SQL function simply relaying the call to differently-named functions (ST_SimplifyDP, ST_SimplifyVisvalingam); at that point you could just provide the specifically-named one for Visvalingam here.

About retaining the M value, that's surely a useful option to have. The only reason to encode the effective area into M (or, again, in Z) is if that's useful to dynamically simplify on the client. It is still not clear to me how that part works. Is it possible for the client to just use the by-vertex value to decide what to drop and what not to drop or does the client still need to compute further "effective areas" on every vertex drop (for the adjacent ones)?

comment:13 Changed 4 years ago by nicklas

I think that I vote for keeping area "as is" and relying on doc because the area concept is quite intuitive. If you know how the algorithm works you can even check the result by just measuring the area in the client (If you see what will be the next point to remove)

About names I guess you are right. If I understand you right you mean that we have separate ST_SimplifyDP, ST_SimplifyVisvalingam and then we have ST_Simplify with _ST_Simplify that picks the right one? Is the functions picked by some function-ID from the planner so there is no parsing for each row?

About z vs m: Will it make sense to put the area in z? When converted to many formats like geoJSON it will be the third coordinate anyway if there is no z-value. But it is no big deal to implement a choice between z and m. It could also be as an array in a separate column, but that will need some more fiddling to get working.

Yes, the client can use the area-values as is. No more calculations. The area registered is the area each point represents when all with lower area is removed. It is not the area it represents in the untouched geometry.

It is possible to get confused about that "If the recalculated area of an adjacent vertex is smaller than the previous the adjacent vertex gets the same higher area" That is to ensure that the points gets removed in the right order.

comment:14 Changed 4 years ago by strk

Thanks about clarification on the effectiveness of the per-vertex indexing. I'm fine with the tagging-only function being named as it is currently (ST_SetEffectiveArea) or renamed to include some reference to Visvalingam and Whyatt.

The ST_Simplify version picking ST_SimplifyDP or ST_SimplifyVW (Visvalingam and Whyatt) would be in SQL language, thus "inlinable". Basically the planner should directly see the body of the function and since the algorithm parameter would be a constant, it should evaluate the CASE portion of the code and lookup the appropriate lower function just once. Speed comparison should be performed on a large table :)

Would it make sense for ST_SimplifyVW to internally call ST_SetEffectiveArea and then the other still missing function that only filters out vertices where the M value is below a given value ?

M vs Z: I didn't know geoJSON would put M value in the Z field, and what about a 4d geometry ? Which would would be dropped ?

comment:15 Changed 4 years ago by nicklas

Hmm, I think I was wrong about geojson and M-values. The reason I thought so is that I recall seeing somewhere in the source-code that the number of dimensions is decided as 2 + (z or m). But I must have been wrong. It doesn't seem to be possible to pass the m-value through geojson. Then I agree much more with you .

comment:16 Changed 4 years ago by strk

Priority: mediumblocker

I'm marking as blocker as I don't feel well with this change not even having an entry in the NEWS file, and not a single test (unit/regress)

comment:17 Changed 4 years ago by nicklas

I get your point.

Is it a blocker for testing the trunk or blocking release?

I absolutely agree that it is a blocker for a release. Do we have a road map when we are aiming at releasing? I would like to do some more work on twkb with zoom-levels (using Visvalingam'm code internally) before I solve this. Time is very limited.

Is that ok? Or do we start preparing for release soon?

comment:18 Changed 4 years ago by strk

release blocker. no expiration date visible yet.

comment:19 Changed 4 years ago by robe

Nicklas,

We are planning to release before or in September. We need to release before PostgreSQL 9.5 comes out to make our window of opportunity.

I've been trying to keep this updated -- http://trac.osgeo.org/postgis/milestone/PostGIS%202.2.0.

I think we should start buttoning things up a bit in June or so.

comment:20 Changed 4 years ago by nicklas

THank you Regina. That is good info. I will try to do this in Mars or April so it should be no problem.

A small summary about what shall be done:

Questions to answer:

  • Naming
  • How shall this function relate to ST_Simplify? see discussion above
  • What area-value to give the endings of a linestring
  • How to handle polygon collapsing

Other tasks to do

  • CUnit tests
  • Regression tests
  • Better documentation

comment:21 Changed 4 years ago by nicklas

Here is more to do.

I have found that I was terribly naive when I wrote the seteffectivearea-code. It gives the right answer, but is very slow when there is bigger point-arrays.

I have to implement something like Min Heap to make things work smooth.

comment:22 Changed 4 years ago by nicklas

First step against a better world with Visvalingam simplify in r13398

Adds a Heap tree to speed up handling of larger geometries.

Also adds functionality for avoiding collapsing polygons. Keeps the 4 most significant vertex points (first last and two more). The smallest result is a triangle since first and last point is the same.

comment:23 Changed 4 years ago by nicklas

ST_SimplifyVW added in r13417 and r13418

and I found several bugs in the minheap implementation. There is now a test in the code that errors if minheap doesn't return points in right order. But now I have tested in quite a lot of data and it seems to behave as it should.

comment:24 Changed 4 years ago by nicklas

cunit and regression tests added in r13421

So, if debbie and winnie is happy with this I think this ticket can be closed.

Any objections?

There is one discussion left about different behavior between ST_Simplify and ST_Simplifyvw when it comes to collapsing outer rings. I will post at devlist about that

comment:25 Changed 4 years ago by robe

Resolution: fixed
Status: newclosed

debbie looks fine with the changes. winnie is a different issue (but nothing to do with this). I'm going to go ahead and close this and after I fix winnie, if she complains about this I will reopen.

comment:26 Changed 4 years ago by robe

Resolution: fixed
Status: closedreopened

comment:27 Changed 4 years ago by robe

Oops I guess winnie is complaining because of this. I was also testing on my windows mingw64 PostgreSQL 9.5 and get the same failure on cunit

Suite: effectivearea
  Test: do_test_lwgeom_effectivearea_lines ...FAILED
    1. cu_effectivearea.c:35  - CU_ASSERT_EQUAL(ea->res_arealist[i],the_areas[i])
  Test: do_test_lwgeom_effectivearea_polys ...passed

comment:28 Changed 4 years ago by nicklas

Ok, the problem was that qsort gives unpredictable result when comparing identical values. Now, force the ordering to be consistent. I cannot see any performance penalty, so it is probably quite small even if compare-function is called many times.

fixed in r13423

Now let's see if both winnie and debbie is happy.

comment:29 Changed 4 years ago by robe

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.