Ticket #1217 (closed defect: fixed)

Opened 2 years ago

Last modified 4 months ago

[PATCH] Polygon with 4 nested rings not broken up properly

Reported by: rengelkemeir@slb.com Assigned to: warmerdam
Priority: high Milestone: 1.5.1
Component: OGR_SF Version: unspecified
Severity: normal Keywords: Shapefile geometry polygon ring
Cc: rouault, mloskot

Description (Last modified by mloskot)

I have a shapefile which has a set of 4 nested rings.

This polygon displays as a donut inside a donut in ArcMap?, but as a disk inside a donut in my app.

The geometry says this feature has 2 parts. The first part is a polygon with 2 inner rings and the 2nd part is a polygon with just an exterior ring.

I would have expected each part to have one inner ring.

Attachments

stacks.tar (10.0 kB) - added by rengelkemeir@slb.com on 06/23/06 19:36:58.
tar of shapefile exhibiting problem
bug-1217-tests.tar.gz (6.0 kB) - added by mloskot on 07/20/06 09:46:28.
Bug 1217 tests
gdal_svn_trunk_ogr_shape_bug1217.patch (11.4 kB) - added by rouault on 09/22/07 07:59:09.

Change History

06/23/06 19:36:58 changed by rengelkemeir@slb.com

  • attachment stacks.tar added.

tar of shapefile exhibiting problem

06/23/06 20:07:21 changed by warmerdam

I have confirmed the incorrect handling of this geometry.  I'm not sure 
when I'll get to fixing the problem though.  I didn't write the ring
classifying code in the OGR shapefile driver, and it might be quite hard
to fix up. 


06/23/06 20:53:48 changed by warmerdam

Mateusz, 

Could you look into this when you get a chance.  It is the inner/outer ring
classification logic in the OGR Shapefile driver that is wonky.  This was
written by Radim so I am not too sure how it works.  

It might make sense to factor it out of the shapefile driver in a form that
could be used in other places at the same time it is fixed. 

I would call this issue medium priority.

07/20/06 09:46:28 changed by mloskot

  • attachment bug-1217-tests.tar.gz added.

Bug 1217 tests

07/20/06 10:10:58 changed by mloskot

Last days I analysed the problem and consulted with Frank.
Unfortunately, the Bug 1217 has not been fixed.
Here I'd like to present my (and Frank's) thoughts and notes I gathered about this issue.

The general conclusion is the SHPReadOGRObject function in the shape2ogr.cpp file does not run tests required to correctly classify inner/outer rings in multipart polygons.

Second issue is that the SHPReadOGRObject function classifies polygon rings according to coordinates orientation (CW - outer or CCW - inner). This assumption is incorrect when dealing with "dirty" shapefiles.
Example, the Stacks.shp file attached to this report by Richard *may*be* seen as  such dirty file.
Rings orientation is as follows:
1 - clockwise (biggest outer)
2 - counter-clockwise (biggest inner)
3 - counter-clockwise (smallest inner)
4 - clockwise (smallest outer)

Here, current classification algorithm assigns the ring 3 as an inner ring of the biggest outer ring 1, instead of classifying it as an inner of outer 4.
Also, in the Stacks example above, we can not be sure if ring 4 is outer or inner, similarly we can not be sure about ring 3, without checking geometric relation between those rings.

Next conclusion is that current algorithm depends on how inner rings are ordered in a shapefile.

The solution is to add heavy gemetric checks and also, if needed, sort inner/outer rings accordint the extent.

Here is Frank's idea discussed on the #gdal channel:

1. My hope was that outer rings (rings not contained in any other ring) would be seen as outer rings. 
2. Anything within an outer ring, but not inside any other ring would be an inner ring.
3. Any ring completely inside an inner ring, but with no other unclassified ring around it would be an outer ring, and so forth.
4. Sort of a winding number approach to inner/outer assignment. But that approach assumes a full polygon-in-polygon test for all polygons against each other.

The step 4 requires potentially heavy calculations.
The question is if OGR should expose users on low performance tests :-)

I also considered sorting rings against area and/or spatial extent.
Then the classification should run from  smallest to biggest ring in the collection to avoid situation that the outer ring of middle/smaller polygon will be assigned as the inner ring of the biggest polygon (biggest outer ring).
This situation may occure when rings have incorrect orientation, in example outer ring is CCW.

If anything above needs more clarification, please give me a note and I'll try to explain it better.

07/21/06 20:21:02 changed by rengelkemeir@slb.com

I think the handling of nested polygons should be geared towards proper files.
I actually have 2 shapefiles, one which contains the 4 level polygon and some other polygons for hole testing.  The 2nd shapefile was extracted from the first using ogr2ogr and became 'dirty' there.
I used shpdump to examine both files.  FOr the polygon in question the original file (created in ArcMap) had the order of the 3rd and 4 rings reversed:
1 - clockwise (biggest outer)
2 - counter-clockwise (biggest inner)
3 - clockwise (smallest outer)
4 - counter-clockwise (smallest inner)
However the results are the same.

04/07/07 08:34:23 changed by mloskot

  • keywords set to geometry polygon ring.
  • description changed.

09/22/07 07:58:48 changed by rouault

I'm attaching a patch that fixes the issue. It's essentially a copy&paste of work I've previously done with the BNA driver. See my comment from 08/09/07 in ticket #1735 for a few explanations. (If OGR is built without GEOS, the existing algorithm is taken.) With the patch, the polygon from Stacks.shp is displayed correctly with QGIS. I could also do ogr2ogr conversion to BNA and back to SHAPE.

As far as performance is concerned, I've tested it on a 23 MB SHP containing 1726 polygonal features, 184 of them being multipolygons and 75 of them being polygons with inner rings (so the percentage of complex polygons is 15%, which must be a quite high percentage for typical use cases). Without the patch a ogrinfo -ro -al >/dev/null takes ~ 9s. With the patch, it takes ~ 9.5s. So I think we can infer it doesn't slow down processing significantly.

A good idea would be to put the MakeMultiPolygon? method somewhere upper in the OGR source tree, so that BNA and Shapefile driver can use it, and eventually other OGR drivers. This would avoid the code duplication.

09/22/07 07:59:09 changed by rouault

  • attachment gdal_svn_trunk_ogr_shape_bug1217.patch added.

09/22/07 08:06:02 changed by rouault

  • cc set to warmerdam.
  • milestone set to 1.5.0.

10/12/07 13:09:30 changed by rouault

  • summary changed from Polygon with 4 nested rings not broken up properly to [PATCH] Polygon with 4 nested rings not broken up properly.

10/29/07 01:32:36 changed by mloskot

  • keywords changed from geometry polygon ring to Shapefile geometry polygon ring.

11/07/07 15:15:56 changed by richengel

I tested the patch and it works fine. It seems like MakeMultiPolygon? could be a more generic function, perhaps in ogr_geometry.h and ogrgeometry.cpp so that other possibly problematic polygons could be checked/fixed.

11/15/07 11:13:39 changed by mloskot

  • owner changed from mloskot to rouault.

11/15/07 11:14:25 changed by mloskot

  • cc changed from warmerdam to warmerdam, mloskot.

12/11/07 00:05:09 changed by warmerdam

  • owner changed from rouault to warmerdam.
  • cc changed from warmerdam, mloskot to rouault, mloskot.
  • severity changed from minor to normal.

I have moved the new method to OGRGeometryFactory::orgnizePolygons() (r13316) and modified the shapefile driver to use if if haveGEOS() is TRUE at runtime (r13317).

We still need a test case for this as part of the test suite.

12/11/07 11:39:13 changed by warmerdam

  • status changed from new to closed.
  • resolution set to fixed.

I have added a test with geos and non-geos cases in r13323 (trunk).

01/14/08 04:00:54 changed by warmerdam

  • status changed from closed to reopened.
  • resolution deleted.

Ahmet Murat Ă–zdemiray reports that shapefile reading performance is painfully impacted in at least some cases by time spent in GEOS due to the new ring relationship computations. I'm reopening this bug report as we clearly need a more efficient approach for 1.5.1.

01/14/08 12:13:10 changed by warmerdam

Frank,

A dataset with performance problem is North America's country boundries dataset and can be downloaded from http://www.cipotato.org/diva/data/MoreData.htm. And the feature which takes most time is Canada which has approx. 70 polygons.

Best Regards, Murat Ozdemiray

01/20/08 07:09:31 changed by rouault

  • status changed from reopened to closed.
  • resolution set to fixed.
  • milestone changed from 1.5.0 to 1.5.1.

I've commited a performance fix in trunk in r13551 and in r13552 and in branches/1.5 in r13553 and r13554.

The good news is that we don't need GEOS anymore. The key point of the speed up is the following approximation : "polygon A is inside polygon B" if "a point of A is inside B". Although it's clearly an approximation, this should be correct in normal use cases (islands and lakes) of OGRGeometryFactory::organizePolygons. For the same reason, we don't test if polygons overlaps too.

As the ogr/ogr_shape.py autotest has been modified (results don't depend whether GEOS is available or not), a new autotest archive for GDAL 1.5.1 will be needed.

For debugging purposes, especially in case of broken data files in terms of polygon topology, I've introduced a OGR_DEBUG_ORGANIZE_POLYGONS configuration option that takes the slower path if GEOS is available.

Please note that the execution time of the isPointInRing method that is used now is linear in the number of points of the ring. For huge rings (> 100,000 points probably), there could be still performance problems. I've some ideas to speed up such cases, but they rely on being able to compute external (let call it E) and internal (let call it I) simplified coutours of a polygon, but I don't know implementations of such algorithms (there's a geos::simplify::TopologyPreservingSimplifier? class in GEOS, but it doesn't do what would be needed, and it's not exposed in GEOS C API). Let's assume that we can - quickly - compute E and I, the idea is that if a point is outside E, it's also outside the polygon. And if a point is inside I, it is also inside the polygon. In practise, this should let only very few cases (when the point is inside E and outside I) where we must test really if the point is inside the big polygon.