Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#4139 closed defect (fixed)

Unstable behavior of gist nd indexes

Reported by: ezimanyi Owned by: pramsey
Priority: medium Milestone: PostGIS 2.4.6
Component: postgis Version: 2.4.x
Keywords: Cc: me@…, pramsey@…

Description

I have created a table containing random geometries of mixed dimensions

select count(*) from tbl_geomcollection_nd
-- 2400

select distinct substring(st_astext(g) for position('(' in st_astext(g)) - 1) from tbl_geomcollection_nd order by 1

"LINESTRING"
"LINESTRING M "
"LINESTRING Z "
"LINESTRING ZM "
"MULTILINESTRING"
"MULTILINESTRING M "
"MULTILINESTRING Z "
"MULTILINESTRING ZM "
"MULTIPOINT"
"MULTIPOINT M "
"MULTIPOINT Z "
"MULTIPOINT ZM "
"MULTIPOLYGON"
"MULTIPOLYGON M "
"MULTIPOLYGON Z "
"MULTIPOLYGON ZM "
"POINT"
"POINT M "
"POINT Z "
"POINT ZM "
"POLYGON"
"POLYGON M "
"POLYGON Z "
"POLYGON ZM "

Then I create an n-dimensional index with the gist_geometry_ops_nd operator class

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

However, the index shows an unstable behavior obtaining different number of answers each time I drop/create a new index.

set enable_indexscan = off;
set enable_bitmapscan = off;
set enable_seqscan = on;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4470359

set enable_indexscan = on;
set enable_bitmapscan = off;
set enable_seqscan = off;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4470258
-- 4470594
-- 4470658
-- 4470303
-- 4469945

The 5 numbers above are five different answers to the same query after droping/creating the index as done in the two lines above.

I can make available the file containing the test data I used if needed.

Attachments (2)

random_nd.sql (25.6 KB) - added by ezimanyi 3 years ago.
SQL scripts for generating random geometries of mixed dimensions (2D, 3Z, 3M, 4D)
create_test_tables_nd.sql (6.3 KB) - added by ezimanyi 3 years ago.
SQL scripts for creating tables with random geometries of mixed dimensions (2D, 3Z, 3M, 4D)

Download all attachments as: .zip

Change History (14)

comment:1 Changed 3 years ago by komzpa

Please share the file. Is the query minimizable? Can you limit the table to only the differing geometries and still catch the issue?

Changed 3 years ago by ezimanyi

Attachment: random_nd.sql added

SQL scripts for generating random geometries of mixed dimensions (2D, 3Z, 3M, 4D)

Changed 3 years ago by ezimanyi

Attachment: create_test_tables_nd.sql added

SQL scripts for creating tables with random geometries of mixed dimensions (2D, 3Z, 3M, 4D)

comment:2 Changed 3 years ago by ezimanyi

I have made available the SQL scripts for generating the tables.

The problem does not appear when creating the table tbl_geomcollection_nd by setting limit to 10 or 20. It started to appear when limit is set to 30

drop table if exists tbl_geomcollection_nd;
create table tbl_geomcollection_nd (
	k serial,
	g geometry
);
insert into tbl_geomcollection_nd (g)
(select g from tbl_geompoint limit 30) union
(select g from tbl_geompointz limit 30) union
(select g from tbl_geompointm limit 30) union
(select g from tbl_geompointzm limit 30) union
(select g from tbl_geomlinestring limit 30) union
(select g from tbl_geomlinestringz limit 30) union
(select g from tbl_geomlinestringm limit 30) union
(select g from tbl_geomlinestringzm limit 30) union
(select g from tbl_geompolygon limit 30) union
(select g from tbl_geompolygonz limit 30) union
(select g from tbl_geompolygonm limit 30) union
(select g from tbl_geompolygonzm limit 30) union
(select g from tbl_geommultipoint limit 30) union
(select g from tbl_geommultipointz limit 30) union
(select g from tbl_geommultipointm limit 30) union
(select g from tbl_geommultipointzm limit 30) union
(select g from tbl_geommultilinestring limit 30) union
(select g from tbl_geommultilinestringz limit 30) union
(select g from tbl_geommultilinestringm limit 30) union
(select g from tbl_geommultilinestringzm limit 30) union
(select g from tbl_geommultipolygon limit 30) union
(select g from tbl_geommultipolygonz limit 30) union
(select g from tbl_geommultipolygonm limit 30) union
(select g from tbl_geommultipolygonzm limit 30) ;

select count(*) from tbl_geomcollection_nd
-- 720 

set enable_indexscan = off;
set enable_bitmapscan = off;
set enable_seqscan = on;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396140

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

set enable_indexscan = on;
set enable_bitmapscan = off;
set enable_seqscan = off;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396111
-- 396140
-- 396119
-- 396072
-- 396129
Last edited 3 years ago by ezimanyi (previous) (diff)

comment:3 Changed 3 years ago by ezimanyi

As an aside comment, shouldn't the function geometrytype report both the Z and M dimensions instead of being forced to write an elaborate query such as the following one for obtaining the type ?

select distinct substring(st_astext(g) for position('(' in st_astext(g)) - 1) from tbl_geomcollection_nd order by 1

"LINESTRING"
"LINESTRING M "
"LINESTRING Z "
"LINESTRING ZM "
"MULTILINESTRING"
"MULTILINESTRING M "
"MULTILINESTRING Z "
"MULTILINESTRING ZM "
"MULTIPOINT"
"MULTIPOINT M "
"MULTIPOINT Z "
"MULTIPOINT ZM "
"MULTIPOLYGON"
"MULTIPOLYGON M "
"MULTIPOLYGON Z "
"MULTIPOLYGON ZM "
"POINT"
"POINT M "
"POINT Z "
"POINT ZM "
"POLYGON"
"POLYGON M "
"POLYGON Z "
"POLYGON ZM "

select distinct geometrytype(g) from tbl_geomcollection_nd order by 1

"LINESTRING"
"LINESTRINGM"
"MULTILINESTRING"
"MULTILINESTRINGM"
"MULTIPOINT"
"MULTIPOINTM"
"MULTIPOLYGON"
"MULTIPOLYGONM"
"POINT"
"POINTM"
"POLYGON"
"POLYGONM"

comment:4 Changed 3 years ago by komzpa

You are expected to use ST_HasM / ST_HasZ to get the type. "LINESRTINGM" is only required as it reinterprets third coordinate as M instead of Z.

LINESTRING has X Y [Z [M]].

LINESTRINGM has X Y [M].

comment:5 Changed 3 years ago by Algunenano

I'm not able to reproduce it with trunk:

index_test=# select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g;
  count  
---------
 4451734
(1 row)

index_test=# select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g;
  count  
---------
 4451734
(1 row)

index_test=# EXPLAIN ANALYZE select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g;
                                                                                   QUERY PLAN                                              
                                     
-------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------
 Aggregate  (cost=10000001376.15..10000001376.16 rows=1 width=8) (actual time=1889.556..1889.556 rows=1 loops=1)
   ->  Nested Loop  (cost=10000000000.15..10000001364.00 rows=4860 width=0) (actual time=0.085..1705.876 rows=4451734 loops=1)
         ->  Seq Scan on tbl_geomcollection_nd t1  (cost=10000000000.00..10000000186.00 rows=2400 width=596) (actual time=0.014..0.338 rows
=2400 loops=1)
         ->  Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2  (cost=0.14..0.48 rows=1 width=596) (actual tim
e=0.019..0.603 rows=1855 loops=2400)
               Index Cond: (t1.g &&& g)
 Planning Time: 0.424 ms
 Execution Time: 1889.612 ms
(7 rows)

index_test=# 

Can you report what Postgis and Postgresql versions are you using? Also an `EXPLAIN ANALYZE` of the query.

comment:6 Changed 3 years ago by ezimanyi

I downloaded and installed yesterday the last bundle available here http://download.osgeo.org/postgis/windows/pg10/

SELECT version() || ' ' || postgis_full_version();
-- "PostgreSQL 10.1, compiled by Visual C++ build 1800, 64-bit POSTGIS="2.4.4 r16526" PGSQL="100" GEOS="3.6.2-CAPI-1.10.2 4d2925d" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.4, released 2018/03/19" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER"

drop table if exists tbl_geomcollection_nd;
create table tbl_geomcollection_nd (
	k serial,
	g geometry
);
insert into tbl_geomcollection_nd (g)
(select g from tbl_geompoint limit 30) union
(select g from tbl_geompointz limit 30) union
(select g from tbl_geompointm limit 30) union
(select g from tbl_geompointzm limit 30) union
(select g from tbl_geomlinestring limit 30) union
(select g from tbl_geomlinestringz limit 30) union
(select g from tbl_geomlinestringm limit 30) union
(select g from tbl_geomlinestringzm limit 30) union
(select g from tbl_geompolygon limit 30) union
(select g from tbl_geompolygonz limit 30) union
(select g from tbl_geompolygonm limit 30) union
(select g from tbl_geompolygonzm limit 30) union
(select g from tbl_geommultipoint limit 30) union
(select g from tbl_geommultipointz limit 30) union
(select g from tbl_geommultipointm limit 30) union
(select g from tbl_geommultipointzm limit 30) union
(select g from tbl_geommultilinestring limit 30) union
(select g from tbl_geommultilinestringz limit 30) union
(select g from tbl_geommultilinestringm limit 30) union
(select g from tbl_geommultilinestringzm limit 30) union
(select g from tbl_geommultipolygon limit 30) union
(select g from tbl_geommultipolygonz limit 30) union
(select g from tbl_geommultipolygonm limit 30) union
(select g from tbl_geommultipolygonzm limit 30) ;
-- Query returned successfully: 720 rows affected, 662 msec execution time.

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

set enable_indexscan = off;
set enable_bitmapscan = off;
set enable_seqscan = on;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396140

explain analyze
select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g

"Aggregate  (cost=7897.56..7897.57 rows=1 width=8) (actual time=1684.293..1684.293 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..7892.20 rows=2145 width=0) (actual time=0.078..1609.226 rows=396140 loops=1)"
"        Join Filter: (t1.g &&& t2.g)"
"        Rows Removed by Join Filter: 122260"
"        ->  Seq Scan on tbl_geomcollection_nd t1  (cost=0.00..57.20 rows=720 width=625) (actual time=0.029..0.697 rows=720 loops=1)"
"        ->  Materialize  (cost=0.00..60.80 rows=720 width=625) (actual time=0.000..0.098 rows=720 loops=720)"
"              ->  Seq Scan on tbl_geomcollection_nd t2  (cost=0.00..57.20 rows=720 width=625) (actual time=0.016..0.354 rows=720 loops=1)"
"Planning time: 0.403 ms"
"Execution time: 1684.731 ms"

set enable_indexscan = on;
set enable_bitmapscan = off;
set enable_seqscan = off;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396100

explain analyze
select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g

"Aggregate  (cost=10000000420.96..10000000420.97 rows=1 width=8) (actual time=737.204..737.205 rows=1 loops=1)"
"  ->  Nested Loop  (cost=10000000000.14..10000000415.60 rows=2145 width=0) (actual time=0.329..657.469 rows=396100 loops=1)"
"        ->  Seq Scan on tbl_geomcollection_nd t1  (cost=10000000000.00..10000000057.20 rows=720 width=625) (actual time=0.034..0.339 rows=720 loops=1)"
"        ->  Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2  (cost=0.14..0.49 rows=1 width=625) (actual time=0.107..0.796 rows=550 loops=720)"
"              Index Cond: (t1.g &&& g)"
"Planning time: 1.061 ms"
"Execution time: 737.288 ms"

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396140

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396124

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 396128

In our development machine I needed to increase the size of the tables to reproduce the problem

SELECT version() || ' '  || postgis_full_version();
"PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by clang version 5.0.0 (tags/RELEASE_500/final), 64-bit POSTGIS="2.5.0beta1dev r16611" [EXTENSION] PGSQL="110" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.2.3, release (...)"

drop table if exists tbl_geomcollection_nd;
create table tbl_geomcollection_nd (
	k serial,
	g geometry
);
insert into tbl_geomcollection_nd (g)
(select g from tbl_geompoint limit 100) union
(select g from tbl_geompointz limit 100) union
(select g from tbl_geompointm limit 100) union
(select g from tbl_geompointzm limit 100) union
(select g from tbl_geomlinestring limit 100) union
(select g from tbl_geomlinestringz limit 100) union
(select g from tbl_geomlinestringm limit 100) union
(select g from tbl_geomlinestringzm limit 100) union
(select g from tbl_geompolygon limit 100) union
(select g from tbl_geompolygonz limit 100) union
(select g from tbl_geompolygonm limit 100) union
(select g from tbl_geompolygonzm limit 100) union
(select g from tbl_geommultipoint limit 100) union
(select g from tbl_geommultipointz limit 100) union
(select g from tbl_geommultipointm limit 100) union
(select g from tbl_geommultipointzm limit 100) union
(select g from tbl_geommultilinestring limit 100) union
(select g from tbl_geommultilinestringz limit 100) union
(select g from tbl_geommultilinestringm limit 100) union
(select g from tbl_geommultilinestringzm limit 100) union
(select g from tbl_geommultipolygon limit 100) union
(select g from tbl_geommultipolygonz limit 100) union
(select g from tbl_geommultipolygonm limit 100) union
(select g from tbl_geommultipolygonzm limit 100) ;
-- Query returned successfully: 2400 rows affected, 94 msec execution time.

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

set enable_indexscan = off;
set enable_bitmapscan = off;
set enable_seqscan = on;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4455622

explain analyze
select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g

"Aggregate  (cost=86804.40..86804.41 rows=1 width=8) (actual time=10509.189..10509.189 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.00..86790.00 rows=5760 width=0) (actual time=0.019..10117.923 rows=4455622 loops=1)"
"        Join Filter: (t1.g &&& t2.g)"
"        Rows Removed by Join Filter: 1304378"
"        ->  Seq Scan on tbl_geomcollection_nd t1  (cost=0.00..192.00 rows=2400 width=32) (actual time=0.009..0.987 rows=2400 loops=1)"
"        ->  Materialize  (cost=0.00..204.00 rows=2400 width=32) (actual time=0.000..0.136 rows=2400 loops=2400)"
"              ->  Seq Scan on tbl_geomcollection_nd t2  (cost=0.00..192.00 rows=2400 width=32) (actual time=0.002..0.742 rows=2400 loops=1)"
"Planning Time: 0.096 ms"
"Execution Time: 10509.285 ms"

set enable_indexscan = on;
set enable_bitmapscan = off;
set enable_seqscan = off;

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4455551

explain analyze
select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g

"Aggregate  (cost=10000001408.47..10000001408.48 rows=1 width=8) (actual time=3889.217..3889.217 rows=1 loops=1)"
"  ->  Nested Loop  (cost=10000000000.15..10000001394.00 rows=5788 width=0) (actual time=0.082..3470.596 rows=4455551 loops=1)"
"        ->  Seq Scan on tbl_geomcollection_nd t1  (cost=10000000000.00..10000000192.00 rows=2400 width=626) (actual time=0.018..0.427 rows=2400 loops=1)"
"        ->  Index Scan using tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd t2  (cost=0.14..0.49 rows=1 width=626) (actual time=0.045..1.265 rows=1856 loops=2400)"
"              Index Cond: (t1.g &&& g)"
"Planning Time: 0.268 ms"
"Execution Time: 3889.255 ms"

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4455562

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4455504

drop index if exists tbl_geomcollection_nd_gist_nd_idx;
create index tbl_geomcollection_nd_gist_nd_idx on tbl_geomcollection_nd using gist(g gist_geometry_ops_nd);

select count(*) from tbl_geomcollection_nd t1, tbl_geomcollection_nd t2 where t1.g &&& t2.g
-- 4455537
Last edited 3 years ago by ezimanyi (previous) (diff)

comment:7 Changed 3 years ago by pramsey

Milestone: PostGIS 2.4.5PostGIS 2.4.6

comment:8 Changed 3 years ago by nextstopsun

I'm experiencing same unstable behaviour for st_coveredby function on 2D GIST indexed geometry column. Seems like 2D is also affected.

comment:9 Changed 3 years ago by komzpa

In 17070:

Stabilize GiST ND indexes for mixed dimensions

Patch by Darafei Praliaskouski

Thanks to

Arthur Lesuisse for synthesizing test case,
Andrew Gierth for finding runaway memcpy,
Raúl Marín for pointing to memory problem in index construction.

This is not backpatchable to 2.x, that requires separate solution.

References #4139
Closes https://github.com/postgis/postgis/pull/336

comment:10 Changed 3 years ago by komzpa

Resolution: fixed
Status: newclosed

In 17074:

Stabilize GiST ND indexes for mixed dimensions

Patch by Darafei Praliaskouski

Thanks to

Arthur Lesuisse for synthesizing test case,
Andrew Gierth for finding runaway memcpy,
Raúl Marín for pointing to memory problem in index construction.

This is not backpatchable to 2.x, that requires separate solution.

Closes #4139

comment:11 Changed 3 years ago by komzpa

In 17075:

Stabilize GiST ND indexes for mixed dimensions

Patch by Darafei Praliaskouski

Thanks to

Arthur Lesuisse for synthesizing test case,
Andrew Gierth for finding runaway memcpy,
Raúl Marín for pointing to memory problem in index construction.

Closes #4139

comment:12 Changed 3 years ago by komzpa

In 17076:

Stabilize GiST ND indexes for mixed dimensions

Patch by Darafei Praliaskouski

Thanks to

Arthur Lesuisse for synthesizing test case,
Andrew Gierth for finding runaway memcpy,
Raúl Marín for pointing to memory problem in index construction.

Closes #4139

Note: See TracTickets for help on using tickets.