Opened 18 years ago

Closed 15 years ago

Last modified 15 years ago

#1217 closed defect (fixed)

[PATCH] Polygon with 4 nested rings not broken up properly

Reported by: rengelkemeir@… Owned by: warmerdam
Priority: high Milestone: 1.5.3
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Shapefile geometry polygon ring
Cc: Even Rouault, Mateusz Łoskot

Description (last modified by Mateusz Łoskot)

I have a shapefile which has a set of 4 nested rings.

This polygon displays as a donut inside a donut in ArcMap, but as a disk inside a donut in my app.

The geometry says this feature has 2 parts. The first part is a polygon with 2 inner rings and the 2nd part is a polygon with just an exterior ring.

I would have expected each part to have one inner ring.

Attachments (12)

stacks.tar (10.0 KB ) - added by rengelkemeir@… 18 years ago.
tar of shapefile exhibiting problem
bug-1217-tests.tar.gz (6.0 KB ) - added by Mateusz Łoskot 18 years ago.
Bug 1217 tests
gdal_svn_trunk_ogr_shape_bug1217.patch (11.4 KB ) - added by Even Rouault 17 years ago.
shape2ogr.cpp (50.4 KB ) - added by mbrudka 16 years ago.
veg_tundraa_bj110.zip.crc (62 bytes ) - added by mbrudka 16 years ago.
CRC for splitted zip
veg_tundraa_bj110.zip.001 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.002 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.003 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.004 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.005 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.006 (713.0 KB ) - added by mbrudka 16 years ago.
splitted zip
veg_tundraa_bj110.zip.007 (610.6 KB ) - added by mbrudka 16 years ago.
splitted zip

Change History (35)

by rengelkemeir@…, 18 years ago

Attachment: stacks.tar added

tar of shapefile exhibiting problem

comment:1 by warmerdam, 18 years ago

I have confirmed the incorrect handling of this geometry.  I'm not sure 
when I'll get to fixing the problem though.  I didn't write the ring
classifying code in the OGR shapefile driver, and it might be quite hard
to fix up. 


comment:2 by warmerdam, 18 years ago

Mateusz, 

Could you look into this when you get a chance.  It is the inner/outer ring
classification logic in the OGR Shapefile driver that is wonky.  This was
written by Radim so I am not too sure how it works.  

It might make sense to factor it out of the shapefile driver in a form that
could be used in other places at the same time it is fixed. 

I would call this issue medium priority.

by Mateusz Łoskot, 18 years ago

Attachment: bug-1217-tests.tar.gz added

Bug 1217 tests

comment:3 by Mateusz Łoskot, 18 years ago

Last days I analysed the problem and consulted with Frank.
Unfortunately, the Bug 1217 has not been fixed.
Here I'd like to present my (and Frank's) thoughts and notes I gathered about this issue.

The general conclusion is the SHPReadOGRObject function in the shape2ogr.cpp file does not run tests required to correctly classify inner/outer rings in multipart polygons.

Second issue is that the SHPReadOGRObject function classifies polygon rings according to coordinates orientation (CW - outer or CCW - inner). This assumption is incorrect when dealing with "dirty" shapefiles.
Example, the Stacks.shp file attached to this report by Richard *may*be* seen as  such dirty file.
Rings orientation is as follows:
1 - clockwise (biggest outer)
2 - counter-clockwise (biggest inner)
3 - counter-clockwise (smallest inner)
4 - clockwise (smallest outer)

Here, current classification algorithm assigns the ring 3 as an inner ring of the biggest outer ring 1, instead of classifying it as an inner of outer 4.
Also, in the Stacks example above, we can not be sure if ring 4 is outer or inner, similarly we can not be sure about ring 3, without checking geometric relation between those rings.

Next conclusion is that current algorithm depends on how inner rings are ordered in a shapefile.

The solution is to add heavy gemetric checks and also, if needed, sort inner/outer rings accordint the extent.

Here is Frank's idea discussed on the #gdal channel:

1. My hope was that outer rings (rings not contained in any other ring) would be seen as outer rings. 
2. Anything within an outer ring, but not inside any other ring would be an inner ring.
3. Any ring completely inside an inner ring, but with no other unclassified ring around it would be an outer ring, and so forth.
4. Sort of a winding number approach to inner/outer assignment. But that approach assumes a full polygon-in-polygon test for all polygons against each other.

The step 4 requires potentially heavy calculations.
The question is if OGR should expose users on low performance tests :-)

I also considered sorting rings against area and/or spatial extent.
Then the classification should run from  smallest to biggest ring in the collection to avoid situation that the outer ring of middle/smaller polygon will be assigned as the inner ring of the biggest polygon (biggest outer ring).
This situation may occure when rings have incorrect orientation, in example outer ring is CCW.

If anything above needs more clarification, please give me a note and I'll try to explain it better.

comment:4 by rengelkemeir@…, 18 years ago

I think the handling of nested polygons should be geared towards proper files.
I actually have 2 shapefiles, one which contains the 4 level polygon and some other polygons for hole testing.  The 2nd shapefile was extracted from the first using ogr2ogr and became 'dirty' there.
I used shpdump to examine both files.  FOr the polygon in question the original file (created in ArcMap) had the order of the 3rd and 4 rings reversed:
1 - clockwise (biggest outer)
2 - counter-clockwise (biggest inner)
3 - clockwise (smallest outer)
4 - counter-clockwise (smallest inner)
However the results are the same.

comment:6 by Mateusz Łoskot, 17 years ago

Description: modified (diff)
Keywords: geometry polygon ring added

comment:7 by Even Rouault, 17 years ago

I'm attaching a patch that fixes the issue. It's essentially a copy&paste of work I've previously done with the BNA driver. See my comment from 08/09/07 in ticket #1735 for a few explanations. (If OGR is built without GEOS, the existing algorithm is taken.) With the patch, the polygon from Stacks.shp is displayed correctly with QGIS. I could also do ogr2ogr conversion to BNA and back to SHAPE.

As far as performance is concerned, I've tested it on a 23 MB SHP containing 1726 polygonal features, 184 of them being multipolygons and 75 of them being polygons with inner rings (so the percentage of complex polygons is 15%, which must be a quite high percentage for typical use cases). Without the patch a ogrinfo -ro -al >/dev/null takes ~ 9s. With the patch, it takes ~ 9.5s. So I think we can infer it doesn't slow down processing significantly.

A good idea would be to put the MakeMultiPolygon method somewhere upper in the OGR source tree, so that BNA and Shapefile driver can use it, and eventually other OGR drivers. This would avoid the code duplication.

by Even Rouault, 17 years ago

comment:8 by Even Rouault, 17 years ago

Cc: warmerdam added
Milestone: 1.5.0

comment:9 by Even Rouault, 16 years ago

Summary: Polygon with 4 nested rings not broken up properly[PATCH] Polygon with 4 nested rings not broken up properly

comment:10 by Mateusz Łoskot, 16 years ago

Keywords: Shapefile added

comment:11 by richengel, 16 years ago

I tested the patch and it works fine. It seems like MakeMultiPolygon could be a more generic function, perhaps in ogr_geometry.h and ogrgeometry.cpp so that other possibly problematic polygons could be checked/fixed.

comment:12 by Mateusz Łoskot, 16 years ago

Owner: changed from Mateusz Łoskot to Even Rouault

comment:13 by Mateusz Łoskot, 16 years ago

Cc: Mateusz Łoskot added

comment:14 by warmerdam, 16 years ago

Cc: Even Rouault added; warmerdam removed
Owner: changed from Even Rouault to warmerdam
Severity: minornormal

I have moved the new method to OGRGeometryFactory::orgnizePolygons() (r13316) and modified the shapefile driver to use if if haveGEOS() is TRUE at runtime (r13317).

We still need a test case for this as part of the test suite.

comment:15 by warmerdam, 16 years ago

Resolution: fixed
Status: newclosed

I have added a test with geos and non-geos cases in r13323 (trunk).

comment:16 by warmerdam, 16 years ago

Resolution: fixed
Status: closedreopened

Ahmet Murat Özdemiray reports that shapefile reading performance is painfully impacted in at least some cases by time spent in GEOS due to the new ring relationship computations. I'm reopening this bug report as we clearly need a more efficient approach for 1.5.1.

comment:17 by warmerdam, 16 years ago

Frank,

A dataset with performance problem is North America's country boundries dataset and can be downloaded from http://www.cipotato.org/diva/data/MoreData.htm. And the feature which takes most time is Canada which has approx. 70 polygons.

Best Regards, Murat Ozdemiray

comment:18 by Even Rouault, 16 years ago

Milestone: 1.5.01.5.1
Resolution: fixed
Status: reopenedclosed

I've commited a performance fix in trunk in r13551 and in r13552 and in branches/1.5 in r13553 and r13554.

The good news is that we don't need GEOS anymore. The key point of the speed up is the following approximation : "polygon A is inside polygon B" if "a point of A is inside B". Although it's clearly an approximation, this should be correct in normal use cases (islands and lakes) of OGRGeometryFactory::organizePolygons. For the same reason, we don't test if polygons overlaps too.

As the ogr/ogr_shape.py autotest has been modified (results don't depend whether GEOS is available or not), a new autotest archive for GDAL 1.5.1 will be needed.

For debugging purposes, especially in case of broken data files in terms of polygon topology, I've introduced a OGR_DEBUG_ORGANIZE_POLYGONS configuration option that takes the slower path if GEOS is available.

Please note that the execution time of the isPointInRing method that is used now is linear in the number of points of the ring. For huge rings (> 100,000 points probably), there could be still performance problems. I've some ideas to speed up such cases, but they rely on being able to compute external (let call it E) and internal (let call it I) simplified coutours of a polygon, but I don't know implementations of such algorithms (there's a geos::simplify::TopologyPreservingSimplifier class in GEOS, but it doesn't do what would be needed, and it's not exposed in GEOS C API). Let's assume that we can - quickly - compute E and I, the idea is that if a point is outside E, it's also outside the polygon. And if a point is inside I, it is also inside the polygon. In practise, this should let only very few cases (when the point is inside E and outside I) where we must test really if the point is inside the big polygon.

comment:19 by mbrudka, 16 years ago

Resolution: fixed
Status: closedreopened

Problem 1217 ("dirty" polygons) can be fixed by changing line 582 to

if ( RingInRing( psShape, outside[ iRing ], outer[ oRing ] ) )

and resolving bug 2174 (please see shape2ogr.cpp attached). Current solution seriously (>1000 %) hurts the performace for really large figures (eg. shapes from vmap0 for europe and asia esp. for Scandinavia vegetation). The patch proposed reverts various changes related with organizePolygons and better conforms to ESRI Shape specification. I do not want to merge ogr2shape.cpp to with current revision (patch is for the official 1.52 release), test extensively, and commit the changes. Nevertheless, please consider reverting changes related with old classification code because it is much faster than current solution.

by mbrudka, 16 years ago

Attachment: shape2ogr.cpp added

comment:20 by Even Rouault, 16 years ago

Marek,

I've had a look at your change and indeed you seem to have fixed what was wrong (well I did only code review. no actual testing).

However I'd prefer and would be willing to improve if possible the behaviour or oganizePolygons() in OGR_ORGANIZE_POLYGONS=ONLY_CCW mode (that is to say we detect outer and inner rings according to CW/CCW). My feeling is that the current behaviour is maybe a bit suboptimal in ONLY_CCW mode (we could separate the CW and CCW) and some code rearrangement could lead to someting equivalent to the old code in shape2ogr, while making it benefit in other contexts.

I've made a few tests with vmap0 eurnasia. I've translated the cropa and treesa layers in shapefiles, and then did ogr2ogr (to shapefiles again or to BNA) on them, but didn't really see that the current organizePolygons() was that slow (indeed what showed in the profiling was the SHPRewind call when translating into shapefiles). Could you indicate the precise operations you made and some timings with the current code and with the old code with your fix, so that I can benchmark future improvements in organizePolygons(). I really need precise facts. Did you play with the OGR_ORGANIZE_POLYGONS environment variable ?

I think that we should set the ONLY_CCW mode by default for shapefile reading (this imply that the shapefiles that are read are not broken and we can rely safely on winding order), so that we go back to previous performances. I'd imagine that to be done by a new call : OGRGeometry* OGRGeometryFactory::organizePolygons( OGRGeometry papoPolygons,

int nPolygonCount, int *pbIsValidGeometry, char papszOptions )

so that we don't have to play with the environment variable from shapefile driver.

I won't be able to look at this next week, but I'd work on that after.

comment:21 by mbrudka, 16 years ago

Hi,

The results for veg_tundraa_bj110.zip (attached) are below. Please notice, that I cannot set OGR_ORGANIZE_POLYGONS=ONLY_CCW because there is nothing like that in the official 1.5.2 release.

I also propose to not make a mess of gdal user's life with additional deeply hidden in the sources environment variables. I'd rather expect a shape driver to just follow ESRI specification as accurately and quickly as possible. I really appreciate the robustnes of various procedures w.r.t. broken geometries, but the performace and standard compatibility shall rather have a higher priority.

It is also a nice idea to have a more general procedure (organizePolygons) suitable for various context, but once again: specification conformance and performance are a must and cannot be lost because of some design simplifications.

Please feel free to follow the old classification code in organizePolygons to achieve these results. This should be rather straightforward, as organizePolygons also uses extents to preliminary classify the figures.

BTW. I am not sure if th geometries in veg_tundraa_bj110 are valid. This shapefile was obtained using pyGdal 1.2.6 reom fwtools 1.x with odgi driver.

Results:

GDAL debug version (Release 1.5.2 with patch)
Old version with proposed 1217 bug fix:
bash-2.05b$ time ogr2ogr -f "Memory" a veg_tundraa_BJ110.shp
real    0m4.037s
user    0m0.015s
sys     0m0.015s

GDAL debug: organizePolygons (Release 1.5.2)
bash-2.05b$ time ogr2ogr -f "Memory" a veg_tundraa_BJ110.shp
real    0m55.108s
user    0m0.015s
sys     0m0.015s

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.crc added

CRC for splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.001 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.002 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.003 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.004 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.005 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.006 added

splitted zip

by mbrudka, 16 years ago

Attachment: veg_tundraa_bj110.zip.007 added

splitted zip

comment:22 by Even Rouault, 15 years ago

Marek,

In fact by looking closely at the revision log, a big performance fix to OrganizePolygons() was done in r14734, after 1.5.2 release. I tried with your data with latest trunk and branches/1.5 svn versions and both run in about 4 seconds. So, the fix will be 1.5.3 and 1.6.0. In the meantime you must live with your own fix or pull one of the svn versions.

I agree with your observations, and I'm still considering to make the ONLY_CCW mode the default for the shapefile driver, and possibly make the more robust methods w.r.t broken shapefiles still available through environmnent variable documented in the document of the shapefile driver

comment:23 by Even Rouault, 15 years ago

Milestone: 1.5.11.5.3
Resolution: fixed
Status: reopenedclosed

Now the shapefile driver assumes by default that the polygons that are read respect the specification. Done in trunk in r15457 and in branches/1.5 in r15458

comment:24 by mbrudka, 15 years ago

Hi,

That's fine, I can live with my patches until the next release, though nothing will bring back 2 days of digging from high level of application down to gdal routines.. But, nevermind.

Thanks

Note: See TracTickets for help on using tickets.