Opened 15 years ago
Closed 6 years ago
#2563 closed defect (fixed)
[PATCH] Fix 'Writing huge polygon can be very time consuming'
Reported by: | richengel | Owned by: | warmerdam |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | OGR_SF | Version: | unspecified |
Severity: | normal | Keywords: | shape writing |
Cc: |
Description
I tested a version of ogr2ogr which had elapsed time timers at key points in the code. I tested converting the nos80k shapefile (see issue #2428 - multipolygon with 43K parts and 2.5M point). I used the organizePolygons from r14734 with OGR_ORGANIZE_POLYGONS set to ONLY_CCW & get: Overall time: 1376.6 s Read time : 5.0 s (organizePolygons : 1.0 s) coordinate transform time : 1.7 s Write time : 1369.6 s - SHPRewindObject : 1368.5 s If one can assert that the geometry is correct is it necessary to perform SHPRewindObject?
Attachments (1)
Change History (7)
comment:1 by , 15 years ago
Component: | default → OGR_SF |
---|---|
Keywords: | shape writing added |
Summary: | Writing huge polygon can be very time consuming → [PATCH] Fix 'Writing huge polygon can be very time consuming' |
by , 15 years ago
Attachment: | shape2ogr_bug2563.patch added |
---|
comment:2 by , 9 years ago
I can't test how GDAL v. 2.0-dev behaves because the nos80k.zip file has disappeared.
comment:4 by , 9 years ago
If you upload it I can test how it works with new GDAL version. Or then you can test it yourself and report about your findings.
comment:6 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I took the the shapefile from https://pubs.usgs.gov/of/2014/1012/data/basemaps/nos80k/nos80k.zip. Despite the name it is not the same data, this had 8543 polygons and 759000 vertices. It is still a big multipolygon that I made from the data.
With GDAL 2.3-dev it takes about ten seconds to run
ogr2ogr -f "ESRI Shapefile" test.shp nos80k_as_multipolygon.shp --config OGR_ORGANIZE_POLYGONS DEFAULT
Not even near to original 1376.6 seconds. The issue seems to be fixed.
I consider that this ticket can be closed.
FrankW, richengel,
yes I can reproduce the long time spent in SHPRewindObject when writing the huge multipolygon of http://coastalmap.marine.usgs.gov/GISdata/basemaps/coastlines/nos80k/nos80k.zip. SHPRewindObject has an algorithmic complexity of O(N x N x V) where N is the number of rings, and V the average number of vertices of a ring.
But it came to my mind that we don't need to call SHPRewindObject when we want to write a OGRPolygon or OGRMultiPolygon. Those polygons follow the OGC SF model, and thus we know that the first ring of a OGRPolygon is the outer ring, and all other following rings are inner rings. Thus the only thing to do is to ensure that the winding order is correct and if not reverting it.
Below, I'm attaching a patch that implements that idea and that I've tested successfully. To sum up, this is very similar to what SHPRewindObject() does, but where determining which is inner/outer comes for free.
I'd like to get FrankW's opinion on this before commiting because the key point of the patch is that all input polygons provided to SHPWriteOGRObject() follow the convention of first ring == outer ring. It might not be the case if some (broken) OGR drivers don't make that effort while reading multipart polygons.