Opened 8 years ago
Closed 7 years ago
#3321 closed defect (fixed)
Performance regression in topology load
|Reported by:||strk||Owned by:||strk|
I've tried loading some of the polygons from the famous 'million_poly_topo1' dataset which was shared some time ago on the mailing list: https://lists.osgeo.org/pipermail/postgis-devel/2014-January/024085.html
What I found is that trunk (2.2.0dev) is slower at loading them than 2.1 branch. I'm still trying to figure out why.
If you want to move this to a different milestone please use 2.2.1 (not available right now)
Change History (16)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
|Milestone:||PostGIS 2.2.0 → PostGIS 2.2.1|
strk sadly I confirmed but I'm pushing this. But my windows box is faster than your Linux box )—
-- POSTGIS="2.1.8 r13780" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" TOPOLOGY RASTER Timing is: -- 4MB work_mem SELECT 155590/1000/60.0 - 2 min
-- POSTGIS="2.2.0dev r14180" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel. 4.9.1, 04 March 2015" GDAL="GDAL 2.0.1, released 2015/09/15" LIBXML="2.7.8" LIBJSON="0.12" TOPOLOGY RASTER (raster lib from "2.2.0dev r14198" need upgrade) SELECT 391802/1000/60.0 6.5min -- work_mem 4MB -- 2.2.0 SELECT 387853/1000/60.0 - 6.45 min -- work_mem 500MB -- 2.2.0
so sadly even upping the work mem didn't seem to help things mcuh.
comment:3 by , 8 years ago
For the record, my times were taken on a machine with an Intel® Core™ i7-4750HQ CPU @ 2.00GHz and using
time psql -f test.sql
comment:4 by , 8 years ago
The regression is in TopoGeo_addLinestring, same timing balance (pretty much same timing in general) with same input (via ST_ExteriorRing).
comment:5 by , 8 years ago
The time difference becomes worst when the topology is pre-populated. Here's the case after the first 20k polygons loaded:
-- limit 10k offset 20k real 8m17.509s [2.2.0dev] addLinestring -- limit 10k offset 20k real 1m53.791s [2.1.9dev] addLinestring
The 2.1.9 version loads 10k polygons in about half the time it loaded 20k before. The 2.2.0 version loads 10k polygons in more time than it loaded 20k before.
It might have to do with index misuse.
comment:6 by , 8 years ago
Performance regression seems to be in ST_AddEdgeModFace. Not easy to reproduce if not by touching the code to skip the call. By skipping the ST_AddEdgeModFace call 2.2.0 is actually faster than the equivalent code from 2.1.9
comment:7 by , 8 years ago
Culprit is the update of edges bounding the old face on face split
comment:8 by , 8 years ago
Ok here's the thing. The query for updating edges bounding the old face in 2.1 is a complex one involving both a geographical filter (edges that are contained [or not] in the new face) and an integer filter (edges that have left or right face matching). Doing so lets the planner pick the most selective of the two.
In 2.2 the current callbacks only allow for simpler queries so we're fetching _all_ edges having the face on a side and then doing the geographical filter "manually".
For this case, all polygons are disjoint and so they split the universe face. The query in 2.2 returns all previously added edges as they all have the universe face on the side. This explains the raising time problem.
comment:9 by , 8 years ago
I see a similar issue exists for the query updating nodes. Also there there's no spatial filter now, whereas there was before. The new callbacks used to fetch edges and nodes are "_getNodeByFace" and "_getEdgeByFace". Adding an optional bbox filter there might greatly help.
comment:10 by , 8 years ago
I've a patch ready. Seems to work fine. Requires adding an argument to one of the liblwgeom-topo callbacks. The only user is ok with that (and we never released 2.2.0 final anyway). Please give me some time to run checks on all bots and more manual ones.
comment:11 by , 8 years ago
With the patch the performance between 2.2.0 and 2.1.9 are back comparable (2.2.0 being actually faster):
-- limit 10k offset 20k real 3m48.034s [2.1.9dev] addLinestring postpatch -- limit 10k offset 20k real 3m3.227s [2.2.0dev] addLinestring postpatch
Don't ask me how I previously handled to see 2.1.9 taking ~2 minutes, maybe I was just tired and did something stupid like running a modified version.
comment:12 by , 8 years ago
r14225 in trunk (2.3.0, I'm afraid). I'd backport to 2.2 branch now, but.. it changes the liblwgeom ABI/API
comment:13 by , 8 years ago
Ok I've backported with r14226 contextually changing the liblwgeom soname to correctly reflect the breakage of ABI.
Note that this isn't different from previous versions where the liblwgeom soname always included up to the patch-level version.
comment:14 by , 7 years ago
strk sounds like you've fixed this. Can we close it?
comment:15 by , 7 years ago
comment:16 by , 7 years ago
|Status:||new → closed|
Now that #3416 is sorted out I believe we can close this.