Opened 5 years ago

Closed 5 years ago

#2933 closed enhancement (fixed)

Speed up ST_GeomFromWKB

Reported by: strk Owned by: pramsey
Priority: medium Milestone: PostGIS 2.1.5
Component: postgis Version: 2.1.x
Keywords: history Cc:

Description

ST_GeomFromWKB takes 10+ secs to parse what ST_AsBinary output in <1 sec

The relation between output and input times seem exagerated to me. I could be wrong, but more than 10 times to parse than to produce sounds unbalanced. For comparison, Mapnik WKB parser takes half a second to parse the same WKB.

Change History (15)

comment:1 Changed 5 years ago by strk

For comparison, the same geometry as WKB takes 1.8 to write, 13.6 secs to parse (ST_AsText, ST_GeomFromText)

comment:2 Changed 5 years ago by nicklas

I am a little curious about this since I think I have seen some strange behaviors too but much smaler scale.

Could you describe what you are doing by showing the queries? In what cases is disc writing involved and in which is the output to the client involved?

comment:3 Changed 5 years ago by strk

The queries I'm running are EXPLAIN ANALYZE: [wkt]

ST_AsText(g) ST_GeomFromText(ST_AsText(g))

[wkb]

ST_AsBinary(g) ST_GeomFromWKB(ST_AsBinary(g))

From what I understand EXPLAIN ANALYZE does not run the canonical output functions so there's no cost computed for the output of final value.

comment:4 Changed 5 years ago by pramsey

Got an example of a monster WKB to try this out on?

comment:5 Changed 5 years ago by pramsey

Nevermind, I made a synthetic one. But I'm not seeing the same kind of behaviour.

-- make a 10M point linestring
create table bigassgeom as 
  select 
    1 as id,
    st_makeline(st_setsrid(st_makepoint(random()*10000, random()*10000),26910)) as geom 
  from generate_series(1,10000000);

-- how long is a basic access?
select st_numpoints(geom) from bigassgeom;
-- about 190ms

-- how long is an output to wkb?
select length(st_asbinary(geom)) from bigassgeom;
-- about 380ms

-- how long is a roundtrip through wkb?
select st_numpoints(st_geomfromwkb(st_asbinary(geom))) from bigassgeom;
-- about 850ms

So, parsing takes maybe 3-4 times as long as outputting, which seems fair, since it has to build structure on the way in, which doesn't happen on the way out. Also I'm not seeing anything like a 10s timing on anything, how big *are* these geometries you're producing?

comment:6 Changed 5 years ago by strk

It's a multipolygon with ~250,000 components with total number of vertices of ~1.5M. The biggest component polygon has ~1.2K vertices, only 6 have more than 400 vertices, the vast majority is below 10 vertices (it's coming from a polygonization of a raster)

comment:7 Changed 5 years ago by pramsey

Well, given that the linestring is "faster" my guess would be that if you put it in a profiler you'll see a lot of time spend in allocations, since the multipolygon is going to have lots of pointarray and lwgeom objects getting instantiated to hold the bits and pieces of your complex input. The linestring is far more pure, since it just needs to allocate the point storage buffer once and then just a few more reallocations as it doubles it each time it overflows. I should be able to duplicate your result by testing a multilinestring, will check.

comment:8 Changed 5 years ago by pramsey

Found the problem, it's a Schlemeil the Painter algorithm http://www.joelonsoftware.com/articles/fog0000000319.html in the geometry adding code for collections. http://trac.osgeo.org/postgis/browser/trunk/liblwgeom/lwcollection.c?rev=12198#L202

Every new geometry added to the collection requires revisiting all the previously added ones to check for duplicates. We either add a smarter structure to make that search faster or remove the duplicate check.

comment:9 Changed 5 years ago by strk

The check is to protect from coding errors as it ensures caller did not pass the same pointer twice (which would result in an ownership conflict). Can be safely disabled by default, maybe based on a PARANOIA_LEVEL macro so that debuggers can raise the panic.

comment:10 Changed 5 years ago by strk

NOTE: the check for subtype compatibility is also of this kind (paranoia)

comment:11 Changed 5 years ago by strk

Milestone: PostGIS 2.2.0
Resolution: fixed
Status: newclosed

Check for duplicated pointer only activated with PARANOIA_LEVEL > 1 as of r12993 -- subtype compatibility isn't that expensive to skip...

comment:12 Changed 5 years ago by strk

Milestone: PostGIS 2.2.0PostGIS 2.1.5
Version: trunk2.1.x

Backported to 2.1 branch with r12994

comment:13 Changed 5 years ago by strk

Keywords: history added

NOTE: this fix greately affected ST_Simplify (among who knows what else) so that simplifying a collection of ~250k components with a low tolerance (not much components being removed) now takes 1.3 seconds instead of 13 seconds.

comment:14 Changed 5 years ago by nicklas

Resolution: fixed
Status: closedreopened

When I tried the example sql query in the doc for ST_GeomFromWKB it takes about 350 ms.

Something must be wrong, or I have a corrupt db in someway.

I have r13013

SELECT ST_AsEWKT(
ST_GeomFromWKB(E'\\001\\002\\000\\000\\000\\002\\000\\000\\000\\037\\205\\353Q\\270~\\\\\\300\\323Mb\\020X\\231C@\\020X9\\264\\310~\\\\\\300)\\\\\\217\\302\\365\\230C@')
);

Maybe it should be a new ticket, but I thought it might be related to the fix?

comment:15 Changed 5 years ago by nicklas

Resolution: fixed
Status: reopenedclosed

Sorry

Forget it

I have compiled with debugging

Note: See TracTickets for help on using tickets.