Context Navigation

← Previous Ticket
Next Ticket →

#1165 closed defect (fixed)

performance degradation between 1.5.3 and 2.0.0SVN

Reported by:	edoardopanfili	Owned by:	strk
Priority:	blocker	Milestone:	PostGIS 2.0.0
Component:	postgis	Version:	master
Keywords:		Cc:

Description

In am using postgres 8.4.8 and postgis 1.5.3 on Mac OS X 10.7.1

Using this query [q1]

SELECT specienomi.nome, cartellino.cont_nome,
    ST_AsGML(cartellino.the_geom)
FROM cartellino, specienomi, confini_regioni
WHERE confini_regioni.regione='UMBRIA'
    AND specienomi.nome like 'Quercus%'
    AND cartellino.idspecie=specienomi.id
    AND ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326);

this is the plan:

 Sort  (cost=20.45..20.46 rows=1 width=931) (actual time=4457.775..4457.786 rows=76 loops=1)
   Sort Key: cartellino.id
   Sort Method:  quicksort  Memory: 74kB
   ->  Hash Join  (cost=8.32..20.44 rows=1 width=931) (actual time=243.679..4457.658 rows=76 loops=1)
         Hash Cond: (cartellino.idspecie = principale.id)
         ->  Nested Loop  (cost=0.00..9.81 rows=614 width=886) (actual time=4.094..4439.024 rows=18370 loops=1)
               Join Filter: _st_intersects(cartellino.the_geom, confini_regioni.the_geom4326)
               ->  Seq Scan on confini_regioni  (cost=0.00..1.25 rows=1 width=1473036) (actual time=0.017..0.021 rows=1 loops=1)
                     Filter: ((regione)::text = 'UMBRIA'::text)
               ->  Index Scan using cartellino_punto_geom_4326 on cartellino  (cost=0.00..8.30 rows=1 width=886) (actual time=0.059..94.148 rows=32200 loops=1)
                     Index Cond: (cartellino.the_geom && confini_regioni.the_geom4326)
         ->  Hash  (cost=8.28..8.28 rows=3 width=57) (actual time=0.392..0.392 rows=74 loops=1)
               ->  Index Scan using i_specie_nome_specie_like on specie principale  (cost=0.01..8.28 rows=3 width=57) (actual time=0.034..0.348 rows=74 loops=1)
                     Index Cond: ((esterna_nome(ibrido, proparte, (genere)::text, [...]  (cultivar)::text) ~>=~ 'Quercus'::text) AND (esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~<~ 'Quercut'::text))
                     Filter: (esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~~ 'Quercus%'::text)
 Total runtime: 4481.933 ms

My target is to obtain (cartellino join specie) join confini_regioni and not (cartellino join confini_regioni) join specie

and this is possible with [q2]

WITH temp_que AS (
    SELECT specienomi.nome AS nome,
        cartellino.cont_nome AS cont_nome,
        cartellino.id AS id, the_geom
    FROM cartellino, specienomi
    WHERE specienomi.nome like 'Quercus %'
        AND cartellino.idspecie=specienomi.id
)
SELECT temp_que.nome, temp_que.cont_nome, ST_AsGML(temp_que.the_geom)
FROM temp_que, confini_regioni
WHERE confini_regioni.regione='UMBRIA'
AND ST_Intersects(temp_que.the_geom,confini_regioni.the_geom4326);

The time is 45.026 ms

If I try with

set join_collapse_limit=1;
SELECT specienomi.nome, ST_AsGML(cartellino.the_geom)
FROM confini_regioni full JOIN (
    cartellino full JOIN specienomi ON
    (cartellino.idspecie=specienomi.id)) ON
    ST_Intersects(cartellino.the_geom,confini_regioni.the_geom4326)
WHERE confini_regioni.regione='UMBRIA'
AND specienomi.nome like 'Quercus%'

is slow: 5750.499 ms and NOTICE: LWGEOM_gist_joinsel called with incorrect join type

I did a try with 2.0SVN but I think that there are problems with my database conversion because all the queryes are slower (q1=85931.595 ms and q2=589.595 ms)

This is the thread on postgresl mailing list: http://archives.postgresql.org/pgsql-general/2011-08/msg00664.php

Change History (23)

comment:1 by robe, 13 years ago

Edoardo, join selectivity with LEFT JOINs is a known issue. FULL JOINS on a spatial function aren't supported at all. As far as the join selectivity issues, I'll let Paul or Mark comment on those since I'm not sure how all that works. I do know I've had a lot of selectivity issues (particularly with tiger geocoder) which I have worked around by using WITH queries to control the plan as you have. I wasn't really sure where to put the blame since a lot of those were just plain attribute queries.

I am more concerned about your 2.0 issues. Which subversion revision number are you using of PostGIS 2.0? Are the plans the same as on your 1.5? Is your 2.0 installed on the same version of PostgreSQL as your 1.5?

I've loaded a couple of my production databases using PostGIS 2.0 (about 2-3 week old release on Windows), and haven't noticed any major noticeable differences. Some things are a slower but not by much and some things are faster (which seems to be more to do with the higher accuracy of the bounding box in 2.0)

Regina

comment:2 by edoardopanfili, 13 years ago

I redo a fresh postgis 2.0.0SVN 7774 installation. After installation I did an hard-upgrade to update my db then a "vacuum analyze". If it is usefull I can post the single steps.

All the part (potgres 8.4.8, postgis 1.5.3, postgis 2.0.0SVN 7774) are compiled in Mac OS X 10.7.1 using (gcc -version)
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)

the plan for the query in postgis 2.0.0SVN 7774 is:

 Sort  (cost=32.66..32.73 rows=29 width=944) (actual time=89371.383..89371.389 rows=76 loops=1)
   Sort Key: cartellino.id
   Sort Method:  quicksort  Memory: 74kB
   ->  Hash Join  (cost=12.27..31.95 rows=29 width=944) (actual time=1780.217..89371.271 rows=76 loops=1)
         Hash Cond: (cartellino.idspecie = principale.id)
         ->  Nested Loop  (cost=0.00..9.80 rows=605 width=899) (actual time=7.109..89348.115 rows=18370 loops=1)
               Join Filter: _st_intersects(cartellino.the_geom, confini_regioni.the_geom4326)
               ->  Seq Scan on confini_regioni  (cost=0.00..1.25 rows=1 width=1473539) (actual time=0.014..0.018 rows=1 loops=1)
                     Filter: ((regione)::text = 'UMBRIA'::text)
               ->  Index Scan using cartellino_punto_geom_4326 on cartellino  (cost=0.00..8.29 rows=1 width=899) (actual time=0.118..100.269 rows=32200 loops=1)
                     Index Cond: (cartellino.the_geom && confini_regioni.the_geom4326)
         ->  Hash  (cost=8.28..8.28 rows=319 width=57) (actual time=0.595..0.595 rows=74 loops=1)
               ->  Index Scan using i_specie_nome_specie_like on specie principale  (cost=0.01..8.28 rows=319 width=57) (actual time=0.064..0.526 rows=74 loops=1)
                     Index Cond: ((esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~<~ 'Quercut'::text))
                     Filter: (esterna_nome(ibrido, proparte, (genere)::text, [...] (cultivar)::text) ~~ 'Quercus%'::text)
Total runtime: 89371.646 ms

comment:3 by robe, 13 years ago

The plans look pretty much the same except in the 2.0 the cost estimate for this part is much worse than 1.5. I assume you already did a vacuum analyze of the tables.

               ->  Index Scan using i_specie_nome_specie_like on specie principale  (cost=0.01..8.28 rows=319 width=57) (actual time=0.064..0.526 rows=74 loops=1)

Can you remove the ST_AsGML just to rule out a memory leak in that function and tell me what the timings are without that.

Also would it be possible to provide us the data. If its private, you can send me download info at lr at pcorp.us and I'll test it against my 1.5 and 2.0 to see if I see similar issues or if its specific to your platform.

Thanks, Regina

comment:4 by edoardopanfili, 13 years ago

In the meanwhile I discovered a bug on my application: in some case "the_geom" column contains invalid data: a circle that extends over a radius of 100 degree, now it covers 100 metres. I did a try to simplify the query (remove some parts unusefull for the test case).

This test does not depends on my database structure:
1- create a new table

create table gtest (id int);
SELECT AddGeometryColumn('gtest', 'the_geom', 4326, 'POLYGON', 2);

2- populate it with a simple sequence of pseudo-random values (I used this script):

#!/usr/bin/perl

$radius = 4; #radius is "3", something more -> 4

for($n=0; $n<10000 ; $n++){
	$x = 10 + ((rand()*$radius*2) - $radius);
	$y = 42 + ((rand()*$radius*2) - $radius);
	print "INSERT INTO gtest (id,the_geom) VALUES ($n,ST_Buffer(ST_GeomFromText('POINT($x $y)',4326),0.01,10));\n";
}

3- execute this query:

EXPLAIN ANALYZE
SELECT count(*)
FROM gtest
WHERE ST_Intersects(gtest.the_geom,ST_Buffer(ST_GeomFromText('POINT(10 42)',4326),3,2));

Without any index

POSTGIS="2.0.0SVN" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.7.3" USE_STATS

 Aggregate  (cost=4448.70..4448.71 rows=1 width=0) (actual time=307.998..307.998 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..4444.00 rows=1880 width=0) (actual time=4.435..307.260 rows=4012 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 308.047 ms

POSTGIS="1.5.3" GEOS="3.2.2-CAPI-1.6.2" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.7.3" USE_STATS

 Aggregate  (cost=3539.70..3539.71 rows=1 width=0) (actual time=52.061..52.061 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1880 width=0) (actual time=0.188..51.524 rows=4012 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 53.451 ms

My system uname: Darwin shadowfax.local 11.1.0 Darwin Kernel Version 11.1.0: Tue Jul 26 16:07:11 PDT 2011; root:xnu-1699.22.81~1/RELEASE_X86_64 x86_64

Tomorrow I can repeat the test using Linux.

comment:5 by robe, 13 years ago

Milestone:	PostGIS 1.5.4 → PostGIS 2.0.0
Priority:	medium → blocker

Okay I'm seeing a problem too. I tried setting up your test but using pure sql and things seem much worse with my last night build than my 8/3/2011 build.

Here is what I have (note I have postgis 1.5.2, 2.0.0 on same server cluster in same table space with all same memory configurations:

 create table gtest (id int);
 SELECT AddGeometryColumn('gtest', 'the_geom', 4326, 'POLYGON', 2);

INSERT INTO gtest (id,the_geom) 
   SELECT n,ST_Buffer(ST_SetSRID(ST_Point(10 + ((random()*radius*2) - radius),42 + ((random()*radius*2) - radius)),4326),0.01,10)
   FROM generate_series(0, 10000 - 1) As n CROSS JOIN (SELECT 4 As radius) As foo;

vacuum analyze gtest;
 EXPLAIN ANALYZE 
 SELECT count(*)
 FROM gtest
 WHERE ST_Intersects(gtest.the_geom,ST_Buffer(ST_GeomFromText('POINT(10
 42)',4326),3,2));

-- my 1.5.2: POSTGIS="1.5.2" GEOS="3.3.1dev-CAPI-1.7.1" PROJ="Rel. 4.6.1, 21 August 2008" LIBXML="2.7.6" USE_STATS	PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 32-bit
Aggregate  (cost=7179.77..7179.78 rows=1 width=0) (actual time=178.601..178.602 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..7175.00 rows=1908 width=0) (actual time=0.717..175.485 rows=4021 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 178.740 ms

-- This is my 8/3/2011 build
-- POSTGIS="2.0.0SVN" GEOS="3.3.1dev-CAPI-1.7.1" PROJ="Rel. 4.6.1, 21 August 2008" LIBXML="2.7.8" USE_STATS PostgreSQL 9.0.4, compiled by Visual C++ build 1500, 32-bit
-- before I upgraded to trunk--
Aggregate  (cost=7179.78..7179.79 rows=1 width=0) (actual time=174.178..174.179 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..7175.00 rows=1912 width=0) (actual time=0.254..170.930 rows=4049 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 174.295 ms

— my 8/22 build — yikes

Aggregate  (cost=7179.75..7179.76 rows=1 width=0) (actual time=760.503..760.504 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..7175.00 rows=1899 width=0) (actual time=0.242..756.195 rows=4064 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 760.648 ms

comment:6 by robe, 13 years ago

To rule out the random data issue I swapped my dlls and my 8/03 with the random generated 8/22

finishes in: so it seems something has gone horribly wrong between 8/3 and 8/21

-- 8/3/2011 compiled version --
Aggregate  (cost=7179.77..7179.78 rows=1 width=0) (actual time=190.204..190.205 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..7175.00 rows=1907 width=0) (actual time=0.523..186.938 rows=4000 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 190.296 ms

— 8/21/ compiled version —

Aggregate  (cost=7179.77..7179.78 rows=1 width=0) (actual time=774.536..774.537 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..7175.00 rows=1907 width=0) (actual time=0.523..770.348 rows=4000 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 774.601 ms

perhaps a memory leak somewhere.

comment:7 by pramsey, 13 years ago

Can you confirm that both your 8/03 and 8/22 builds are running with GSERIALIZED_ON?

comment:8 by robe, 13 years ago

what's the easiest way to tell once its compiled. I'm pretty sure its ON since its the one I've been testing with where I said the size of the geometries was bigger. Wouldn't a bigger geometry suggest GSERIALIZED ON or is there a better way.

I also assume it is because I remember when I swapped binaries with one GSERIALIZE ON or off I get weird errors since the two geometry types are incompatible.

comment:9 by pramsey, 13 years ago

Yes, the mem size of a point should be larger under gserialized than under serialized_form.

comment:10 by robe, 13 years ago

Paul or anyone for that matter, have you compared 1.5 to current 2.0 to determine it's not just a platform issue on my end.

Like I mentioned things were good around 8/3 and when I did my 1.5 / PostGIS 2.0 (gserialized compare), so I don't think gserialized is to blame here.

Things went downhill I would estimate around the time when we decided to break everything into separate folders and people started having trouble compiling.

I would backtrack my steps if I could, but unfortunately I think between 8/3 and 8/17 or so I wasn't able to compile trunk at all.

comment:11 by robe, 13 years ago

Guys,

Can other people PLEASE run the above simple test to confirm we have significantly slowed PostGIS 2.0 down before we do anything else.

I want the speed back we enjoyed after we had switched to gserialized and I did my preliminary gserialized on benchmarks. Like I said — I blame this slowness on all the liblwgeom/libpgcommon shuffling.

I just retested on my local workstation (1.5.3 vs. my newly compiled 2.0 on the same PostGIS 9.0 instances and PostGIS 2.0 is 3 times slower).

I had also tested my more intensive database test after all the shuffling and speed was similarly much slower for my benchmarks. My reference database backup has grown to about 30GB now and I'm not so kin on doing 2 200GB parallel tests until this simple test passes.

So I reiterate

My 1.5.3 install on same PostgreSQL 9.0 instance:
Aggregate  (cost=3539.62..3539.63 rows=1 width=0) (actual time=133.856..133.856 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1847 width=0) (actual time=1.645..132.722 rows=3987 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 133.911 ms

PostGIS 2.0 r7840

 Aggregate  (cost=3539.72..3539.74 rows=1 width=0) (actual time=389.212..389.212 rows=1 loops=1)
  ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1889 width=0) (actual time=0.567..387.914 rows=3998 loops=1)
        Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454037DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
Total runtime: 389.256 ms

comment:12 by strk, 13 years ago

I blame this slowness on all the liblwgeom/libpgcommon shuffling.

This is false advertisement ! Go back to a pre-libpgcommon/liblwgeom snapshot and banchmark it. Then, if still possible, try with and w/out GSERIALIZED.

Even better, bisect the repository (git has a nice support for this).

I'll bet on forced recomputation of bounding box takin up the most time (I belive I've seen a spot where box3d are continuously computed rather than cached and I reported on list IIRC w/out replies)

comment:13 by robe, 13 years ago

Like I said strk. The last successful build I was able to do after gserialized and before this liblgeom/libpgco… was around the 8/3 one I quoted and that was about the same speed as the 1.5 builds. For 2 weeks after I was unable to compile and when I came out the first test I did was this one which proved much slower. If I swap back to my 8/3 build things are fast again. I'll see if I can make a lter build to see if I can get closer in time when this switch in performance happened.

I fear though that it was at a time when I couldn't compile, so I really count on one of you to do this since you all were able to compile during this period of flux.

strk — I ask the question have you tested your 1.5 against current and do you see hte same effect or is it something isolated that this reporter and I see? or do you not ahve a parallel environment to test.

comment:14 by strk, 13 years ago

Owner:	changed from pramsey to strk
Status:	new → assigned
Summary:	Selectivity extimation of && operator → performance degradation between 1.5.3 and 2.0.0SVN

I changed the ticket summary as the estimation in original report seems to be the same, so shouldnt' have _any_ role in the slowdown.

comment:15 by darkblueb, 13 years ago

confirm test results

Linux tone 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64 GNU/Linux
AMD Opteron(tm) Processor 244
Debian 6

=========================
POSTGIS="1.5.3" GEOS="3.4.0dev-CAPI-1.8.0" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.7.8" USE_STATS (procs from 1.5 r5976 need upgrade)
--
 Aggregate  (cost=3539.74..3539.75 rows=1 width=0) (actual time=116.831..116.832 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1895 width=0) (actual time=0.306..110.354 rows=4063 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 116.941 ms

============
POSTGIS="2.0.0SVN" GEOS="3.4.0dev-CAPI-1.8.0" PROJ="Rel. 4.7.1, 23 September 2009" LIBXML="2.7.8" USE_STATS
--
 Aggregate  (cost=3539.73..3539.74 rows=1 width=0) (actual time=556.648..556.650 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1890 width=0) (actual time=0.267..549.223 rows=4067 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 556.732 ms
 
 ======

comment:16 by strk, 13 years ago

r7719 introduced the slowdown. It was my commit. I'm on it.

comment:17 by strk, 13 years ago

Getting a confirmation from others would help. You should test that r7718 is fast and r7719 is slow.

comment:18 by darkblueb, 13 years ago

testing revs 7718 and 7719

 r7718
---
  Aggregate  (cost=3539.65..3539.66 rows=1 width=0) (actual time=69.815..69.815 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1858 width=0) (actual time=0.160..68.687 rows=3986 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 69.857 ms
===
r7719
--
 Aggregate  (cost=3539.68..3539.69 rows=1 width=0) (actual time=332.743..332.744 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1870 width=0) (actual time=0.202..330.895 rows=3990 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 332.779 ms

comment:19 by edoardopanfili, 13 years ago

testing revs 7718 and 7719 (Darwin shadowfax.local 11.1.0 Darwin Kernel Version 11.1.0: Tue Jul 26 16:07:11 PDT 2011; root:xnu-1699.22.81~1/RELEASE_X86_64 x86_64)

 r7718
---
 Aggregate  (cost=3539.70..3539.71 rows=1 width=0) (actual time=53.953..53.953 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1878 width=0) (actual time=2.045..53.295 rows=4012 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 55.411 ms
===
r7719
---
 Aggregate  (cost=3539.70..3539.71 rows=1 width=0) (actual time=300.967..300.967 rows=1 loops=1)
   ->  Seq Scan on gtest  (cost=0.00..3535.00 rows=1878 width=0) (actual time=0.272..300.245 rows=4012 loops=1)
         Filter: ((the_geom && '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry) AND _st_intersects(the_geom, '0103000020E610000001000000090000000000000000002A40000000000000454038DB37B31D3E28403309329378F043400300000000002440000000000080434099499099C4831F403209329378F043400000000000001C40FFFFFFFFFFFF44408A499099C4831F40CCF6CD6C870F4640F8FFFFFFFFFF2340000000000080464030DB37B31D3E2840CFF6CD6C870F46400000000000002A400000000000004540'::geometry))
 Total runtime: 301.024 ms

comment:20 by robe, 13 years ago

strk,

Not sure it's relevant but just noticed between r7718 and r7719 the include lwgeom_geos_prepared.h was removed.

Did that get reshuffled to other files or is just obsolete?

comment:21 by strk, 13 years ago

You may actually have found the culprit. Re-add that include and try it. I'll be off for ~ 7 hours…

If confirmed we should move the PREPARED_GEOM define from that include to a postgis_config.h or something like that (driven by configure) to reduce future cases of this.

Beside, if this is confirmed it means prepared geoms will only be available in postgis, not in liblwgeom.

comment:22 by strk, 13 years ago

I tested it. That was the real cause indeed. I'm doing last tests and will commit.

comment:23 by strk, 13 years ago

Resolution:	→ fixed
Status:	assigned → closed
Version:	1.5.X → trunk

There, r7843 is back to speed. Now I can sleep better.

Note: See TracTickets for help on using tickets.

Download in other formats: