Opened 16 years ago

Closed 6 years ago

#2563 closed defect (fixed)

[PATCH] Fix 'Writing huge polygon can be very time consuming'

Reported by: richengel Owned by: warmerdam
Priority: normal Milestone:
Component: OGR_SF Version: unspecified
Severity: normal Keywords: shape writing
Cc:

Description

I tested a version of ogr2ogr which had elapsed time timers at key points in the code. I tested converting the nos80k shapefile (see issue #2428 - multipolygon with 43K parts and 2.5M point). I used the organizePolygons from r14734 with OGR_ORGANIZE_POLYGONS set to ONLY_CCW & get: Overall time: 1376.6 s Read time : 5.0 s (organizePolygons : 1.0 s) coordinate transform time : 1.7 s Write time : 1369.6 s - SHPRewindObject : 1368.5 s If one can assert that the geometry is correct is it necessary to perform SHPRewindObject?

Attachments (1)

shape2ogr_bug2563.patch (4.6 KB ) - added by Even Rouault 15 years ago.

Download all attachments as: .zip

Change History (7)

comment:1 by Even Rouault, 15 years ago

Component: defaultOGR_SF
Keywords: shape writing added
Summary: Writing huge polygon can be very time consuming[PATCH] Fix 'Writing huge polygon can be very time consuming'

FrankW, richengel,

yes I can reproduce the long time spent in SHPRewindObject when writing the huge multipolygon of http://coastalmap.marine.usgs.gov/GISdata/basemaps/coastlines/nos80k/nos80k.zip. SHPRewindObject has an algorithmic complexity of O(N x N x V) where N is the number of rings, and V the average number of vertices of a ring.

But it came to my mind that we don't need to call SHPRewindObject when we want to write a OGRPolygon or OGRMultiPolygon. Those polygons follow the OGC SF model, and thus we know that the first ring of a OGRPolygon is the outer ring, and all other following rings are inner rings. Thus the only thing to do is to ensure that the winding order is correct and if not reverting it.

Below, I'm attaching a patch that implements that idea and that I've tested successfully. To sum up, this is very similar to what SHPRewindObject() does, but where determining which is inner/outer comes for free.

I'd like to get FrankW's opinion on this before commiting because the key point of the patch is that all input polygons provided to SHPWriteOGRObject() follow the convention of first ring == outer ring. It might not be the case if some (broken) OGR drivers don't make that effort while reading multipart polygons.

by Even Rouault, 15 years ago

Attachment: shape2ogr_bug2563.patch added

comment:2 by Jukka Rahkonen, 9 years ago

I can't test how GDAL v. 2.0-dev behaves because the nos80k.zip file has disappeared.

comment:3 by richengel, 9 years ago

I have a copy of nos80k.zip do you want me to upload it (27 M zip) ?

comment:4 by Jukka Rahkonen, 9 years ago

If you upload it I can test how it works with new GDAL version. Or then you can test it yourself and report about your findings.

comment:5 by Jukka Rahkonen, 8 years ago

If I ever received the nos80k.zip then I have lost it. Can we try again?

comment:6 by Jukka Rahkonen, 6 years ago

Resolution: fixed
Status: newclosed

I took the the shapefile from https://pubs.usgs.gov/of/2014/1012/data/basemaps/nos80k/nos80k.zip. Despite the name it is not the same data, this had 8543 polygons and 759000 vertices. It is still a big multipolygon that I made from the data.

With GDAL 2.3-dev it takes about ten seconds to run

ogr2ogr -f "ESRI Shapefile" test.shp nos80k_as_multipolygon.shp --config OGR_ORGANIZE_POLYGONS DEFAULT

Not even near to original 1376.6 seconds. The issue seems to be fixed.

I consider that this ticket can be closed.

Note: See TracTickets for help on using tickets.