Opened 7 years ago

Closed 7 years ago

#3712 closed defect (fixed)

Address MVT issues reported by specification author

Reported by: strk Owned by: Björn Harrtell
Priority: blocker Milestone: PostGIS 2.4.0
Component: postgis Version: master
Keywords: Cc: flippmoke

Description

flippmoke pointed out some criticalities in the MVT code here: https://git.osgeo.org/gogs/postgis/postgis/pulls/5#issuecomment-2966

This ticket is to make sure they are addressed before going final.

Change History (26)

comment:1 by strk, 7 years ago

Owner: changed from pramsey to Björn Harrtell

comment:2 by flippmoke, 7 years ago

As you have most all of the tools to make valid OGC geometries - I would assume it would be non-trival but better if you were able to follow v2 of the specification. However, if you wanted you could drop down to the version 1.0 of the specification.

I would much rather all clients get to version 2.0 as soon as possible however, as the less v1 tiles that exist in the wild the better for everyone as the standard was very loose in that original specification.

comment:3 by Björn Harrtell, 7 years ago

Agreed that the goal should be to follow the v2 spec.

I think there is a pretty non-trivial solution if I would just use GEOS and the tools there to simplify and verify topology/validity. The drawback would be that it would probably make the routine much slower.

So I would rather not have to go though GEOS and use a quantization algorithm similar to the one used by OpenLayers (1) to keep topology intact. To avoid having to calculate validity, which AFAIK is a hard problem, I want to assume input is valid and document that as a requirement. Would this be acceptable?

1) https://boundlessgeo.com/2014/03/openlayers-vector-rendering

comment:4 by Björn Harrtell, 7 years ago

Putting some more thought into this I think I'm already doing quantization and it can cause problems in some cases as described at https://github.com/mapbox/vector-tile-spec/issues/58#issuecomment-173717621.

I wonder if it's possible to quickly categorize input that can be problematic and what should be fine with simple quantization. Polygons with inner rings that are close to their exterior it should be detectable. I think polygons small enough to risk revering winding order when rounding should degenerate to a point or line (already done with check_geometry_size but it's too naive).

Last edited 7 years ago by Björn Harrtell (previous) (diff)

comment:5 by flippmoke, 7 years ago

Just as some background, when we started developing version 2.0 of the specification we were doing so because we were having vector tiles sent to clients more often then being rendered on our servers at Mapbox. Part of the problem we were having besides winding order issues was that polygon data was often messy and had many self intersections and other oddities. This was causing a lot of problems our use of triangulation algorithms, specifically in earcut. https://github.com/mapbox/earcut and https://github.com/mapbox/earcut.hpp

In response we put some guidelines in the polygon ring definition in the v2 specification. It is a little loose in its definition because at the time the specification was made we were not certain how well we could always create OGC valid and simple geometries which was our goal. https://github.com/mapbox/vector-tile-spec/blob/master/2.1/README.md#4344-polygon-geometry-type

We have found numerous issues in the use of Vector Tiles were valid geometries always produce correct results. The primary issue is that Vector Tiles are primarily used on clients and typically speed is a requirement. This means that in most situations clients can not do a lot of heavy processing such as polygon correction prior to using the data.

It is for all these reasons that I ended up having to develop https://github.com/mapbox/wagyu/ as a solution for the problems we were experiencing with validity of geometry. I know specifically that you speak about speed as a concern, this is always a concern for me as well and is another reason that I have spent a lot of time developing wagyu — as it is a way for us to have our own "MakeValid" (by doing a union of all existing data).

It is for all these reasons I think you also need to consider always running a "ST_MakeValid" like routine every time for polygon data. The data going into ST_asMVT might or might not be valid so you will likely need to always execute this. I know that this is likely against the style of PostGIS as each tool is doing one specific task, but when dealing with this special encoding of Vector Tiles things get complex. Therefore, I wouldn't leave it up to the user to ST_MakeValid the geometry prior to running this routine. The reason is that the problems might show up in rare cases downstream far away from PostGIS — and manifest in ways that users can not easily debug.

TLDR;

The winding order problems are one issue alone I have found while creating vector tiles, validity of polygons is also very important.

comment:6 by strk, 7 years ago

What about completely avoiding the resize step inside the function ? That'd give callers the "freedom" to decide how to scale and check geometries *before* they end up in MVT. For example they could be building pyramids in a pre-flight so that speed would not be an issue.

comment:7 by Björn Harrtell, 7 years ago

I suggest the following:

  1. Make sure internal transformation does not change validity. I think this should be possible for valid input.
  2. Check for input validity by default but expose an option to turn it off. With a disclaimer.

I don't think removing transformation will be useful for most users but I'll consider if it would also be something that one could opt out with a disclaimer.

comment:8 by flippmoke, 7 years ago

My concern with that is that honestly users struggle at times to even understand how the geometries in vector tiles are even structured. This isn't something simple like GeoJSON or WKT that is very visually easy to understand as vector tiles have their own coordinate systems. I think power users would love to make slices of the pyramid at once, but I think it would be a hard for the average user.

This also makes me think about the best way to deal with projections of the tile system prior to creating the tiles. In general vector tiles have no relationship to any coordinate system and could be used with any rectangular system. Most users however, use EPSG:4326 or EPSG:3857. Therefore ST_asMVT might need a srs passed to it, so that scaling and offsetting could be auto calculated.

comment:9 by flippmoke, 7 years ago

I would be interested if it is faster for you all to check validity and then correct if needed or if always running the validity correction was faster. I would suggest not turning off validity checks/correction with a flag as users, I would suggest limiting options as it would be difficult to explain for users how to properly use the method.

Also as per my previous conversation about adding a srs, I would assume you would default it to 4326 and only if they passed something different it would change.

comment:10 by Björn Harrtell, 7 years ago

I see ST_AsMVT as a low level function for efficient production so we might not agree on the target audience here. I hope we can agree that defaulting to check/correct validity is good enough to advocate the correct use. I'm not certain MakeValid produces useful output at all times though.

I made the function specifically without relationship to srs and tile grid schemes as I think those are better handled externally. I don't quite follow how/why introducing srs will be beneficial without also introducing the concept of tile grid and tile grid coordinates instead of geographic coordinate bounds which will make it difficult to support anything else than 4326.

comment:11 by robe, 7 years ago

ST_MakeValid does not always produce useful output (or at least useful to the user, for example it could turn a polygon into a linestring if it can't make a valid polygon wihout loosing points) all the time and as a general rule. None of our output functions try to make anything valid. That's the user's responsibility. So I think just checking correctness and throwing an error if you think it's unacceptable to output, or just a warning notice is good enough.

comment:12 by Björn Harrtell, 7 years ago

Agreed @robe. I think it can be an error to prefer a transaction rollback in that case.

Again about SRS, as I wrote I think it's a good thing to leave it out for the general case (supporting any SRS), but I see no reason why there could not be another ST_AsMVT signature that assumes 4326 and a tile coordinate based on its ubiquitous tile grid scheme but at this time it's not on my TODO list.

comment:13 by flippmoke, 7 years ago

@robe We ran into of the same issues when we were dealing with output from wagyu (our version of make valid) and in general we just throw out polygons that collapse into lines. I am actually very interested in how make_valid works internally and have been meaning to compare it to some of our methodology for some time.

I am concerned about making it the users responsibility to make VT compliant with 2.0, but I can totally understand your situation. Maybe this is just a terrible oversight in the specification in general, but I am not certain how else to approach the problem other then always making sure the output is compliant. I suppose this is not any different from other geometry file types, but wanted to explain well why we picked put this in the standard (as per explanation earlier).

In short though, I think throwing an error might be good enough.

Thanks again for all your effort here! If you ever need me to review the code please feel free to email me if I don't respond to a @.

comment:14 by Björn Harrtell, 7 years ago

Thanks @flippmoke. I'm all for making the world more OGC valid. :) It's a good thing for other things than triangulation too.

I have just investigated how to make sure "snap rounding" does not risk invalidating polygons and it turns out it's not as easy as I thought and might require turning to GEOS anyway. This function calculates exactly what the tolerance for snap rounding is:

http://postgis.net/docs/ST_MinimumClearance.html

As the documentation explains it involves calculating a value e so that "No vertex is closer than e to a line segment of which it is not an endpoint" and under the hood this is done using JTS/GEOS index/algorithms which might be non-trivial to implement in C.

I'll spend some more time thinking about it and would appreciate further input on the subject but ST_MakeValid is starting to look pretty good after all.

comment:15 by nicklas, 7 years ago

I had a little wild thought from reading this discussion.

If I understand things right, the problem is in validness that turns up fro simplification and scaling.

We also want tile generation to be as fast as possible for "on the fly" creation.

The idea is about the simplification issue. If we can identify vertex points that have to be preserved to not create an invalid geometry then: If using the effective area from Visvalingam-Whyatt algorithm, and manipulating the effective area (stored as m-coordinate, it is possible to both control geometries getting invalid and at the same time get a very fast "on the fly" simplification.

This require doing some preparation, and would be a "professional alternative" when needing fast creation and valid geometries.

The function ST_SimplifyVW do it like this when avoiding collapsing polygons. It identifies the most valuable vertex-points and raises their effective area to very high value.

The hard part is identifying what vertex_points that need to be preserved I guess. Then the m-value of that vertex-points just have to be raised to something insane.

It would mean that m-values had to be stored which will take some space (only in PostGIS, not in the tile of course).

Link to ST_EffectiveArea: http://postgis.net/docs/manual-2.3/ST_SetEffectiveArea.html

What I am talking about, I realise, is actually ST_EffectiveArea with some validity-guarantee for each removed vertex. But I don't know if it is possible in any easy way.

comment:16 by flippmoke, 7 years ago

@nicklas:

If I understand things right, the problem is in validness that turns up fro simplification and scaling.

Yes, but the original data is often invalid to begin with. This is why I currently do the following steps:

Simplification ⇒ Scaling ⇒ Clipping ⇒ Make Valid

In order to speed up processing, I have found that for more then one tile, Simplification and Scaling can be done once for a shape for an entire zoom level, and you can create multiple tiles from that data. This is not ideal for many situations though where you just want to create one.

We have found douglas-peuker to be best at actually preserving the original shape, but your ideas sound interesting but also adding a lot more complexity, will think about it more.


@Björn Harrtell

I have thought about all of this problem probably more then I should as far as the validity and I have found that snap rounding is very important in some ways to prevent intersections in odd ways. The only way I was able to solve this was through creating https://github.com/mapbox/wagyu/. In short it is Snap Rounding + Vatti + Additional Correction and by testing it repeatedly with a fuzzer, I was able to make it produce OGC valid polygon output 100% of the time — no matter the input. It is reasonably fast (considering all it does). If anyone here ever wants a walk through of the algorithm and code — I would love to do it.

comment:17 by strk, 7 years ago

@flippmoke I'd love to see your algorithm in PostGIS !

comment:18 by Björn Harrtell, 7 years ago

@nicklas:

Sounds like a good subject for an essay. ;)

@flippmoke:

I did have a look at wagyu and while my C++ and geometry math skills are too limited to understand it completely it seems to be very well executed. I'm familiar with snap rounding but not the rest of the concepts. I can see how it works with small scale errors that occur because of scaling but I wonder how it can handle severely invalid input, for instance a large bow tie like POLYGON ((0 0, 10 0, 10 5, 0 -5, 0 0)).

I'd also love to know more about the algorithm and perhaps help bring it to PostGIS. One way to do that would be to include it in GEOS. As you might know the upstream of GEOS is JTS and I think there would be the appropriate place to get it implemented first. JTS/GEOS already has snap rounding. I'll bring up the subject on the JTS mailing list and hopefully get some of their excellent geometry math brainpower involved.

Last edited 7 years ago by Björn Harrtell (previous) (diff)

comment:19 by dbaston, 7 years ago

I'm wondering why this function should check and coerce its inputs into valid geometries, when other PostGIS functions do not do this this. It seems like this approach just buries the problem of invalid geometries, and prevents users from actually correcting their geometries. If a user wants to store invalid geometry and doesn't care to inspect how it's been auto-corrected, it should be easy enough to run ST_AsMVT(ST_MakeValid(geom)), no?

comment:20 by Björn Harrtell, 7 years ago

Agreed in principle @dbaston.

But it's not so easy as an initial make valid, as valid geometry might become invalid after scaling to tile pixel coordinate space. But so maybe the suggestion by @strk to break out the scaling step into a separate function is what we should aim for.

Last edited 7 years ago by Björn Harrtell (previous) (diff)

comment:21 by flippmoke, 7 years ago

@Björn Harrtell:

It deals with extremely bad geometries, it would can definitely deal with the bowtie example. The "fuzzer" I am using basically picks random points for a ring, which basically creates a mass of intersections.

https://cloud.githubusercontent.com/assets/1794907/23220907/0b6e1b5a-f8e9-11e6-9744-ac4f0a81749f.png

Wagyu then will make this completely valid OGC output, including dealing with chains of holes that should resolve into new rings etc.

https://cloud.githubusercontent.com/assets/1794907/23220918/138ef714-f8e9-11e6-9cdd-1025e2aec538.png

The latest released version of Wagyu surpassed ~25 million of these tests with out any failures, before I simply stopped the fuzzer.

@dbaston:

If a user wants to store invalid geometry and doesn't care to inspect how it's been auto-corrected, it should be easy enough to run ST_AsMVT(ST_MakeValid(geom)), no?

Please see: https://github.com/mapbox/vector-tile-spec/issues/58#issuecomment-173717621

In general - clipping, simplification, and scaling can all turn a valid geometry into an invalid one as per the specification.

comment:22 by Björn Harrtell, 7 years ago

I'm now leaning towards both suggestions by @flippmoke and @strk i.e providing a function that generates spec compliant output if not forced otherwise and breaking apart the logic assembling the tile binary vs. the geometry translation. I propose the following revised ST_AsMVT and a new function ST_AsMVTGeom as follows:

-- revised ST_AsMVT function
-- assumes geometry already in MVT tile coordinate space
-- moved bounds, buffer and clip_geom to new function ST_AsMVTGeom
bytea ST_AsMVT(text name, integer extent, text geom_name, anyelement row)

-- translates a geometry to MVT tile coordinate space with optional clipping
-- should make sure output is valid according to spec, perhaps by simply using existing ST_MakeValid logic
geometry ST_AsMVTGeom(geometry geom, box2d bounds, integer extent, integer buffer, boolean clip_geom)

@flippmoke:

Would be nice to put ST_MakeValid though that fuzzing test. Revisiting my previous bow tie example shows that it at least does something similar:

select ST_AsText(ST_MakeValid(ST_GeomFromText('POLYGON ((0 0, 10 0, 10 5, 0 -5, 0 0))')))
-- Output: "MULTIPOLYGON(((0 0,5 0,0 -5,0 0)),((5 0,10 5,10 0,5 0)))"
Last edited 7 years ago by Björn Harrtell (previous) (diff)

comment:23 by Björn Harrtell, 7 years ago

Initial implementation of the proposal above is at https://git.osgeo.org/gogs/postgis/postgis/pulls/14. Seems pretty promising to me.

comment:24 by Björn Harrtell, 7 years ago

I'm pretty sure that the implementation of ST_AsMVT and ST_AsMVTGeom that I've got now is better and the way forward. The limited tests I've got passes mapnik validation. I would like to put it through some more testing but I think it should not delay the replacement of the old ST_AsMVT. See the referenced pull request for more details.

It remains to fixup the documentation, commit and announce changes on the users list as the API/usage has changed. I will try to get that done in the days to follow if there are no objections.

comment:25 by Björn Harrtell, 7 years ago

In 15323:

Reworked ST_AsMVT and new ST_AsMVTGeom implementation
References #3712

comment:26 by Björn Harrtell, 7 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.