#1217 closed defect (fixed)
[PATCH] Polygon with 4 nested rings not broken up properly
Reported by: | Owned by: | warmerdam | |
---|---|---|---|
Priority: | high | Milestone: | 1.5.3 |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | Shapefile geometry polygon ring |
Cc: | Even Rouault, Mateusz Łoskot |
Description (last modified by )
I have a shapefile which has a set of 4 nested rings.
This polygon displays as a donut inside a donut in ArcMap, but as a disk inside a donut in my app.
The geometry says this feature has 2 parts. The first part is a polygon with 2 inner rings and the 2nd part is a polygon with just an exterior ring.
I would have expected each part to have one inner ring.
Attachments (12)
Change History (35)
by , 18 years ago
Attachment: | stacks.tar added |
---|
comment:1 by , 18 years ago
I have confirmed the incorrect handling of this geometry. I'm not sure when I'll get to fixing the problem though. I didn't write the ring classifying code in the OGR shapefile driver, and it might be quite hard to fix up.
comment:2 by , 18 years ago
Mateusz, Could you look into this when you get a chance. It is the inner/outer ring classification logic in the OGR Shapefile driver that is wonky. This was written by Radim so I am not too sure how it works. It might make sense to factor it out of the shapefile driver in a form that could be used in other places at the same time it is fixed. I would call this issue medium priority.
comment:3 by , 18 years ago
Last days I analysed the problem and consulted with Frank. Unfortunately, the Bug 1217 has not been fixed. Here I'd like to present my (and Frank's) thoughts and notes I gathered about this issue. The general conclusion is the SHPReadOGRObject function in the shape2ogr.cpp file does not run tests required to correctly classify inner/outer rings in multipart polygons. Second issue is that the SHPReadOGRObject function classifies polygon rings according to coordinates orientation (CW - outer or CCW - inner). This assumption is incorrect when dealing with "dirty" shapefiles. Example, the Stacks.shp file attached to this report by Richard *may*be* seen as such dirty file. Rings orientation is as follows: 1 - clockwise (biggest outer) 2 - counter-clockwise (biggest inner) 3 - counter-clockwise (smallest inner) 4 - clockwise (smallest outer) Here, current classification algorithm assigns the ring 3 as an inner ring of the biggest outer ring 1, instead of classifying it as an inner of outer 4. Also, in the Stacks example above, we can not be sure if ring 4 is outer or inner, similarly we can not be sure about ring 3, without checking geometric relation between those rings. Next conclusion is that current algorithm depends on how inner rings are ordered in a shapefile. The solution is to add heavy gemetric checks and also, if needed, sort inner/outer rings accordint the extent. Here is Frank's idea discussed on the #gdal channel: 1. My hope was that outer rings (rings not contained in any other ring) would be seen as outer rings. 2. Anything within an outer ring, but not inside any other ring would be an inner ring. 3. Any ring completely inside an inner ring, but with no other unclassified ring around it would be an outer ring, and so forth. 4. Sort of a winding number approach to inner/outer assignment. But that approach assumes a full polygon-in-polygon test for all polygons against each other. The step 4 requires potentially heavy calculations. The question is if OGR should expose users on low performance tests :-) I also considered sorting rings against area and/or spatial extent. Then the classification should run from smallest to biggest ring in the collection to avoid situation that the outer ring of middle/smaller polygon will be assigned as the inner ring of the biggest polygon (biggest outer ring). This situation may occure when rings have incorrect orientation, in example outer ring is CCW. If anything above needs more clarification, please give me a note and I'll try to explain it better.
comment:4 by , 18 years ago
I think the handling of nested polygons should be geared towards proper files. I actually have 2 shapefiles, one which contains the 4 level polygon and some other polygons for hole testing. The 2nd shapefile was extracted from the first using ogr2ogr and became 'dirty' there. I used shpdump to examine both files. FOr the polygon in question the original file (created in ArcMap) had the order of the 3rd and 4 rings reversed: 1 - clockwise (biggest outer) 2 - counter-clockwise (biggest inner) 3 - clockwise (smallest outer) 4 - counter-clockwise (smallest inner) However the results are the same.
comment:6 by , 17 years ago
Description: | modified (diff) |
---|---|
Keywords: | geometry polygon ring added |
comment:7 by , 17 years ago
I'm attaching a patch that fixes the issue. It's essentially a copy&paste of work I've previously done with the BNA driver. See my comment from 08/09/07 in ticket #1735 for a few explanations. (If OGR is built without GEOS, the existing algorithm is taken.) With the patch, the polygon from Stacks.shp is displayed correctly with QGIS. I could also do ogr2ogr conversion to BNA and back to SHAPE.
As far as performance is concerned, I've tested it on a 23 MB SHP containing 1726 polygonal features, 184 of them being multipolygons and 75 of them being polygons with inner rings (so the percentage of complex polygons is 15%, which must be a quite high percentage for typical use cases). Without the patch a ogrinfo -ro -al >/dev/null takes ~ 9s. With the patch, it takes ~ 9.5s. So I think we can infer it doesn't slow down processing significantly.
A good idea would be to put the MakeMultiPolygon method somewhere upper in the OGR source tree, so that BNA and Shapefile driver can use it, and eventually other OGR drivers. This would avoid the code duplication.
by , 17 years ago
Attachment: | gdal_svn_trunk_ogr_shape_bug1217.patch added |
---|
comment:8 by , 17 years ago
Cc: | added |
---|---|
Milestone: | → 1.5.0 |
comment:9 by , 16 years ago
Summary: | Polygon with 4 nested rings not broken up properly → [PATCH] Polygon with 4 nested rings not broken up properly |
---|
comment:10 by , 16 years ago
Keywords: | Shapefile added |
---|
comment:11 by , 16 years ago
I tested the patch and it works fine. It seems like MakeMultiPolygon could be a more generic function, perhaps in ogr_geometry.h and ogrgeometry.cpp so that other possibly problematic polygons could be checked/fixed.
comment:12 by , 16 years ago
Owner: | changed from | to
---|
comment:13 by , 16 years ago
Cc: | added |
---|
comment:14 by , 16 years ago
Cc: | added; removed |
---|---|
Owner: | changed from | to
Severity: | minor → normal |
comment:15 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I have added a test with geos and non-geos cases in r13323 (trunk).
comment:16 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Ahmet Murat Özdemiray reports that shapefile reading performance is painfully impacted in at least some cases by time spent in GEOS due to the new ring relationship computations. I'm reopening this bug report as we clearly need a more efficient approach for 1.5.1.
comment:17 by , 16 years ago
Frank,
A dataset with performance problem is North America's country boundries dataset and can be downloaded from http://www.cipotato.org/diva/data/MoreData.htm. And the feature which takes most time is Canada which has approx. 70 polygons.
Best Regards, Murat Ozdemiray
comment:18 by , 16 years ago
Milestone: | 1.5.0 → 1.5.1 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
I've commited a performance fix in trunk in r13551 and in r13552 and in branches/1.5 in r13553 and r13554.
The good news is that we don't need GEOS anymore. The key point of the speed up is the following approximation : "polygon A is inside polygon B" if "a point of A is inside B". Although it's clearly an approximation, this should be correct in normal use cases (islands and lakes) of OGRGeometryFactory::organizePolygons. For the same reason, we don't test if polygons overlaps too.
As the ogr/ogr_shape.py autotest has been modified (results don't depend whether GEOS is available or not), a new autotest archive for GDAL 1.5.1 will be needed.
For debugging purposes, especially in case of broken data files in terms of polygon topology, I've introduced a OGR_DEBUG_ORGANIZE_POLYGONS configuration option that takes the slower path if GEOS is available.
Please note that the execution time of the isPointInRing method that is used now is linear in the number of points of the ring. For huge rings (> 100,000 points probably), there could be still performance problems. I've some ideas to speed up such cases, but they rely on being able to compute external (let call it E) and internal (let call it I) simplified coutours of a polygon, but I don't know implementations of such algorithms (there's a geos::simplify::TopologyPreservingSimplifier class in GEOS, but it doesn't do what would be needed, and it's not exposed in GEOS C API). Let's assume that we can - quickly - compute E and I, the idea is that if a point is outside E, it's also outside the polygon. And if a point is inside I, it is also inside the polygon. In practise, this should let only very few cases (when the point is inside E and outside I) where we must test really if the point is inside the big polygon.
comment:19 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Problem 1217 ("dirty" polygons) can be fixed by changing line 582 to
if ( RingInRing( psShape, outside[ iRing ], outer[ oRing ] ) )
and resolving bug 2174 (please see shape2ogr.cpp attached).
Current solution seriously (>1000 %) hurts the performace for really large figures (eg. shapes from vmap0 for europe and asia esp. for Scandinavia vegetation). The patch proposed reverts various changes related with organizePolygons and better conforms to ESRI Shape specification.
I do not want to merge ogr2shape.cpp to
with current revision (patch is for the official 1.52 release), test extensively, and commit the changes. Nevertheless, please consider reverting changes related with old classification code because it is much faster than current solution.
by , 16 years ago
Attachment: | shape2ogr.cpp added |
---|
comment:20 by , 16 years ago
Marek,
I've had a look at your change and indeed you seem to have fixed what was wrong (well I did only code review. no actual testing).
However I'd prefer and would be willing to improve if possible the behaviour or oganizePolygons() in OGR_ORGANIZE_POLYGONS=ONLY_CCW mode (that is to say we detect outer and inner rings according to CW/CCW). My feeling is that the current behaviour is maybe a bit suboptimal in ONLY_CCW mode (we could separate the CW and CCW) and some code rearrangement could lead to someting equivalent to the old code in shape2ogr, while making it benefit in other contexts.
I've made a few tests with vmap0 eurnasia. I've translated the cropa and treesa layers in shapefiles, and then did ogr2ogr (to shapefiles again or to BNA) on them, but didn't really see that the current organizePolygons() was that slow (indeed what showed in the profiling was the SHPRewind call when translating into shapefiles). Could you indicate the precise operations you made and some timings with the current code and with the old code with your fix, so that I can benchmark future improvements in organizePolygons(). I really need precise facts. Did you play with the OGR_ORGANIZE_POLYGONS environment variable ?
I think that we should set the ONLY_CCW mode by default for shapefile reading (this imply that the shapefiles that are read are not broken and we can rely safely on winding order), so that we go back to previous performances. I'd imagine that to be done by a new call : OGRGeometry* OGRGeometryFactory::organizePolygons( OGRGeometry papoPolygons,
int nPolygonCount, int *pbIsValidGeometry, char papszOptions )
so that we don't have to play with the environment variable from shapefile driver.
I won't be able to look at this next week, but I'd work on that after.
comment:21 by , 16 years ago
Hi,
The results for veg_tundraa_bj110.zip (attached) are below. Please notice, that I cannot set OGR_ORGANIZE_POLYGONS=ONLY_CCW because there is nothing like that in the official 1.5.2 release.
I also propose to not make a mess of gdal user's life with additional deeply hidden in the sources environment variables. I'd rather expect a shape driver to just follow ESRI specification as accurately and quickly as possible. I really appreciate the robustnes of various procedures w.r.t. broken geometries, but the performace and standard compatibility shall rather have a higher priority.
It is also a nice idea to have a more general procedure (organizePolygons) suitable for various context, but once again: specification conformance and performance are a must and cannot be lost because of some design simplifications.
Please feel free to follow the old classification code in organizePolygons to achieve these results. This should be rather straightforward, as organizePolygons also uses extents to preliminary classify the figures.
BTW. I am not sure if th geometries in veg_tundraa_bj110 are valid. This shapefile was obtained using pyGdal 1.2.6 reom fwtools 1.x with odgi driver.
Results:
GDAL debug version (Release 1.5.2 with patch) Old version with proposed 1217 bug fix: bash-2.05b$ time ogr2ogr -f "Memory" a veg_tundraa_BJ110.shp real 0m4.037s user 0m0.015s sys 0m0.015s GDAL debug: organizePolygons (Release 1.5.2) bash-2.05b$ time ogr2ogr -f "Memory" a veg_tundraa_BJ110.shp real 0m55.108s user 0m0.015s sys 0m0.015s
comment:22 by , 15 years ago
Marek,
In fact by looking closely at the revision log, a big performance fix to OrganizePolygons() was done in r14734, after 1.5.2 release. I tried with your data with latest trunk and branches/1.5 svn versions and both run in about 4 seconds. So, the fix will be 1.5.3 and 1.6.0. In the meantime you must live with your own fix or pull one of the svn versions.
I agree with your observations, and I'm still considering to make the ONLY_CCW mode the default for the shapefile driver, and possibly make the more robust methods w.r.t broken shapefiles still available through environmnent variable documented in the document of the shapefile driver
comment:23 by , 15 years ago
Milestone: | 1.5.1 → 1.5.3 |
---|---|
Resolution: | → fixed |
Status: | reopened → closed |
comment:24 by , 15 years ago
Hi,
That's fine, I can live with my patches until the next release, though nothing will bring back 2 days of digging from high level of application down to gdal routines.. But, nevermind.
Thanks
tar of shapefile exhibiting problem