Opened 9 years ago
Closed 9 years ago
#2933 closed enhancement (fixed)
Speed up ST_GeomFromWKB
|Reported by:||strk||Owned by:||pramsey|
ST_GeomFromWKB takes 10+ secs to parse what ST_AsBinary output in <1 sec
The relation between output and input times seem exagerated to me. I could be wrong, but more than 10 times to parse than to produce sounds unbalanced. For comparison, Mapnik WKB parser takes half a second to parse the same WKB.
Change History (15)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
I am a little curious about this since I think I have seen some strange behaviors too but much smaler scale.
Could you describe what you are doing by showing the queries? In what cases is disc writing involved and in which is the output to the client involved?
comment:3 by , 9 years ago
The queries I'm running are EXPLAIN ANALYZE: [wkt]
From what I understand EXPLAIN ANALYZE does not run the canonical output functions so there's no cost computed for the output of final value.
comment:4 by , 9 years ago
Got an example of a monster WKB to try this out on?
comment:5 by , 9 years ago
Nevermind, I made a synthetic one. But I'm not seeing the same kind of behaviour.
-- make a 10M point linestring create table bigassgeom as select 1 as id, st_makeline(st_setsrid(st_makepoint(random()*10000, random()*10000),26910)) as geom from generate_series(1,10000000); -- how long is a basic access? select st_numpoints(geom) from bigassgeom; -- about 190ms -- how long is an output to wkb? select length(st_asbinary(geom)) from bigassgeom; -- about 380ms -- how long is a roundtrip through wkb? select st_numpoints(st_geomfromwkb(st_asbinary(geom))) from bigassgeom; -- about 850ms
So, parsing takes maybe 3-4 times as long as outputting, which seems fair, since it has to build structure on the way in, which doesn't happen on the way out. Also I'm not seeing anything like a 10s timing on anything, how big *are* these geometries you're producing?
comment:6 by , 9 years ago
It's a multipolygon with ~250,000 components with total number of vertices of ~1.5M. The biggest component polygon has ~1.2K vertices, only 6 have more than 400 vertices, the vast majority is below 10 vertices (it's coming from a polygonization of a raster)
comment:7 by , 9 years ago
Well, given that the linestring is "faster" my guess would be that if you put it in a profiler you'll see a lot of time spend in allocations, since the multipolygon is going to have lots of pointarray and lwgeom objects getting instantiated to hold the bits and pieces of your complex input. The linestring is far more pure, since it just needs to allocate the point storage buffer once and then just a few more reallocations as it doubles it each time it overflows. I should be able to duplicate your result by testing a multilinestring, will check.
comment:8 by , 9 years ago
Found the problem, it's a Schlemeil the Painter algorithm http://www.joelonsoftware.com/articles/fog0000000319.html in the geometry adding code for collections. http://trac.osgeo.org/postgis/browser/trunk/liblwgeom/lwcollection.c?rev=12198#L202
Every new geometry added to the collection requires revisiting all the previously added ones to check for duplicates. We either add a smarter structure to make that search faster or remove the duplicate check.
comment:9 by , 9 years ago
The check is to protect from coding errors as it ensures caller did not pass the same pointer twice (which would result in an ownership conflict). Can be safely disabled by default, maybe based on a PARANOIA_LEVEL macro so that debuggers can raise the panic.
comment:10 by , 9 years ago
NOTE: the check for subtype compatibility is also of this kind (paranoia)
comment:11 by , 9 years ago
|Milestone:||→ PostGIS 2.2.0|
|Status:||new → closed|
Check for duplicated pointer only activated with PARANOIA_LEVEL > 1 as of r12993 — subtype compatibility isn't that expensive to skip…
comment:12 by , 9 years ago
|Milestone:||PostGIS 2.2.0 → PostGIS 2.1.5|
|Version:||trunk → 2.1.x|
Backported to 2.1 branch with r12994
comment:13 by , 9 years ago
NOTE: this fix greately affected ST_Simplify (among who knows what else) so that simplifying a collection of ~250k components with a low tolerance (not much components being removed) now takes 1.3 seconds instead of 13 seconds.
comment:14 by , 9 years ago
|Status:||closed → reopened|
When I tried the example sql query in the doc for ST_GeomFromWKB it takes about 350 ms.
Something must be wrong, or I have a corrupt db in someway.
I have r13013
SELECT ST_AsEWKT( ST_GeomFromWKB(E'\\001\\002\\000\\000\\000\\002\\000\\000\\000\\037\\205\\353Q\\270~\\\\\\300\\323Mb\\020X\\231C@\\020X9\\264\\310~\\\\\\300)\\\\\\217\\302\\365\\230C@') );
Maybe it should be a new ticket, but I thought it might be related to the fix?
comment:15 by , 9 years ago
|Status:||reopened → closed|
I have compiled with debugging
For comparison, the same geometry as WKB takes 1.8 to write, 13.6 secs to parse (ST_AsText, ST_GeomFromText)