Opened 2 years ago

Closed 20 months ago

Last modified 20 months ago

#5025 closed defect (fixed)

Inconsistent behavior with ST_Within across versions

Reported by: byrman Owned by: pramsey
Priority: medium Milestone: PostGIS 3.2.3
Component: postgis Version: 3.1.x
Keywords: Cc:

Description

I noticed inconsistent behavior with ST_Within across different PostgreSQL-PostGIS versions. To demonstrate the issue, I used postgis/postgis Docker images and a minimal example.

Steps to reproduce the problem

-- Create table "a" having both a 2D and 3D spatial index on its geometry field.
CREATE TABLE a (id serial, geom geometry(PointZ, 4326) NOT NULL);
CREATE INDEX a_2d_idx ON a USING GIST(geom);
CREATE INDEX a_3d_idx ON a USING GIST(geom gist_geometry_ops_nd);

-- Create table "b" having both a 2D and 3D spatial index on its geometry field.
CREATE TABLE b (id serial, geom geometry(PolygonZ, 4326) NOT NULL);
CREATE INDEX b_2d_idx ON b USING GIST (geom);
CREATE INDEX b_3d_idx ON b USING GIST (geom gist_geometry_ops_nd);

-- Insert a polygon (z=0) into table "b".
INSERT INTO b (geom) SELECT ST_Force3D(ST_MakeEnvelope(-1, -1, 1, 1, 4326));

-- Generate 100 pseudo-random points (z=0) which lie within the polygon and insert them into table "a".
INSERT INTO a (geom) SELECT ST_Force3D((ST_Dump(ST_GeneratePoints((SELECT geom FROM b), 100))).geom);

-- Change the z coordinate of half of the points from 0 into 1.
UPDATE a SET geom=ST_SetSRID(ST_MakePoint(ST_x(geom), ST_y(geom), 1), 4326) WHERE MOD(id, 2)=0;

-- Ensure up-to-date statistics.
VACUUM ANALYZE;

-- Finally, run the query below.
SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom);

Results

postgis/postgis:11-2.5-alpine: 100
postgis/postgis:11-3.0-alpine: 100
postgis/postgis:11-3.1-alpine: 100
postgis/postgis:12-2.5-alpine: 100
postgis/postgis:12-3.0-alpine:  50
postgis/postgis:12-3.1-alpine:  50
postgis/postgis:13-3.0-alpine:  50
postgis/postgis:13-3.1-alpine:  50
postgis/postgis:14-3.1-alpine:  50
  1. Clearly, 50 and 100 is the result of a 3D and 2D calculation, respectively. For a user, however, it's unclear what calculation to expect. Documentation on ST_Within does not help.
  1. Even more confusing is that the result also depends on the indexes present. For example, dropping index a_3d_idx in 14-3.1-alpine, changes the result from 50 into 100. I expect the presence of an index to affect the performance of a query, not its result.

Change History (16)

comment:1 by byrman, 2 years ago

PostGIS 3.2 has been released. I have updated my results accordingly.

postgis/postgis:11-2.5-alpine: 100
postgis/postgis:11-3.0-alpine: 100
postgis/postgis:11-3.1-alpine: 100
postgis/postgis:11-3.2-alpine: 100
postgis/postgis:12-2.5-alpine: 100
postgis/postgis:12-3.0-alpine:  50
postgis/postgis:12-3.1-alpine:  50
postgis/postgis:12-3.2-alpine:  50
postgis/postgis:13-3.0-alpine:  50
postgis/postgis:13-3.1-alpine:  50
postgis/postgis:13-3.2-alpine:  50
postgis/postgis:14-3.1-alpine:  50
postgis/postgis:14-3.2-alpine:  50

comment:2 by robe, 2 years ago

Milestone: PostGIS PostgreSQLPostGIS 3.0.6

comment:3 by byrman, 2 years ago

Imre Samu (docker-postgis contributor) recommended me on the postgis-users mailing list to remove the older docker images from my test cases and to add postgres / postgis / alpine versions:

TAG: 11-2.5-alpine
IMAGE ID: 10c7c19a572c
CREATED: 4 days ago
VERSION: PostgreSQL 11.14 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
POSTGIS_FULL_VERSION: POSTGIS="2.5.5" [EXTENSION] PGSQL="110" GEOS="3.8.1-CAPI-1.13.3" PROJ="Rel. 7.2.1, January 1st, 2021" GDAL="GDAL 3.2.3, released 2021/04/27" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" TOPOLOGY RASTER
OS VERSION_ID: 3.15.0
COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 100
COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100

TAG: 11-3.2-alpine
IMAGES ID: 0303403af49f
CREATED: 4 days ago
VERSION: PostgreSQL 11.14 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="110" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY
OS VERSION_ID: 3.15.0
COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 100
COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100

TAG: 12-3.2-alpine
IMAGE ID: 964255f0cf36
CREATED: 4 days ago
VERSION: PostgreSQL 12.9 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="120" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY
OS VERSION_ID: 3.15.0
COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50
COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100

TAG: 13-3.2-alpine
IMAGE ID: 5136446fc2a7
CREATED: 4 days ago
VERSION: PostgreSQL 13.5 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="130" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY
OS VERSION_ID: 3.15.0
COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50
COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100

TAG: 14-3.2-alpine
IMAGE ID: 171454c0721a
CREATED: 4 days ago
VERSION: PostgreSQL 14.1 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
POSTGIS_FULL_VERSION: POSTGIS="3.2.0 0" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.0" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.4.0" WAGYU="0.5.0 (Internal)" TOPOLOGY
OS VERSION_ID: 3.15.0
COUNT(*) BEFORE DROPPING INDEX a_3d_idx: 50
COUNT(*) AFTER DROPPING INDEX a_3d_idx: 100

comment:4 by robe, 21 months ago

For POSTGIS="3.2.1 3.2.1" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="7.2.1" GDAL="GDAL 3.4.2, released 2022/03/08" LIBXML="2.9.9" LIBJSON="0.12" LIBPROTOBUF="1.2.1" WAGYU="0.5.0 (Internal)" RASTER PostgreSQL 14.1, compiled by Visual C++ build 1914, 64-bit

My answers agree with yours, 50 with an index and 100 if I drop the a_3d_idx. I also get 100 if I keep the a_3d_idx and drop the b_2d_idx.

For

POSTGIS="2.5.7dev r17978" [EXTENSION] PGSQL="120" GEOS="3.8.3-CAPI-1.13.4" PROJ="Rel. 7.2.1, January 1st, 2021" GDAL="GDAL 3.3.3, released 2021/10/25" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit

I am getting 100 for both cases.

I suspect this has to do with the SUPPORT functions use for PostgreSQL 12 and above introduced in PostGIS 3.0.

Last edited 21 months ago by robe (previous) (diff)

comment:5 by robe, 21 months ago

Milestone: PostGIS 3.0.6PostGIS 3.0.7

comment:6 by pramsey, 21 months ago

Dispensing with all the index creation above simplifies things a little and just on my latest pg14/postgis3.2 installation we can at least get a workable issue that doesn't involve juggling docker images.

pramsey=# explain SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom);
                           QUERY PLAN                           
----------------------------------------------------------------
 Aggregate  (cost=2505.26..2505.27 rows=1 width=8)
   ->  Nested Loop  (cost=0.00..2505.01 rows=100 width=0)
         Join Filter: st_within(a.geom, b.geom)
         ->  Seq Scan on b  (cost=0.00..1.01 rows=1 width=168)
         ->  Seq Scan on a  (cost=0.00..3.00 rows=100 width=40)
(5 rows)

pramsey=# CREATE INDEX a_3d_idx ON a USING GIST(geom gist_geometry_ops_nd);
CREATE INDEX
pramsey=# explain SELECT COUNT(*) FROM a, b WHERE ST_Within(a.geom, b.geom);
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Aggregate  (cost=34.42..34.43 rows=1 width=8)
   ->  Nested Loop  (cost=0.14..34.17 rows=100 width=0)
         ->  Seq Scan on b  (cost=0.00..1.01 rows=1 width=168)
         ->  Index Scan using a_3d_idx on a  (cost=0.14..33.16 rows=1 width=40)
               Index Cond: (geom @@ b.geom)
               Filter: st_within(geom, b.geom)
(6 rows)

There was some changes to the way 3D indexes worked in the last few revisions, from komzpa, I cannot recall when though, or even why. The question of "is something within in 3d" is a little open, particularly if the two operands are of different dimensionality.

comment:7 by pramsey, 21 months ago

The operator and functions clearly disagree.

WITH g AS ( 
    SELECT 
        'POLYGON Z ((-1 -1 0,-1 1 0,1 1 0,1 -1 0,-1 -1 0))'::geometry AS ply0, 
        'POINT Z (0 0 1)'::geometry AS pt1,
        'POINT Z (0 0 0)'::geometry AS pt0
    ) 
SELECT 
    ST_Contains(ply0, pt0) AS contains_00,
    ST_Within(pt0, ply0)  AS within_00,
    ST_Contains(ply0, pt1) AS contains_01,
    ST_Within(pt1, ply0)  AS within_10,
    pt0 @@ ply0 AS op_within_00,
    pt1 @@ ply0 AS op_within_10,
    ply0 @@ pt1 AS op_within_01
FROM g;

comment:8 by pramsey, 21 months ago

Ran this through the debugger, and it's definitely the support function, and also definitely really hard to fix. Higher up the line, PostgreSQL is (somehow) identifying the index of interest and sending it into the support function, and then the support function chooses an appropriate operator to append to the query based on that. However, when the index is a gist_geometry_ops_nd the operator chosen is the 3d operator (@@) which has strict 3d semantics. Meanwhile the function ST_Within has 2d semantics. And thus we get the "add an index, get fewer things" behaviour.

The support function is trying to be really general, so it doesn't care what kind of index it gets and generally that works, in that it can add index clauses for SPGIST, BRIN and GIST indexes without caring much what kind of opfamily is coming in. Recognizing that ST_Within requires a 2d operator is a little hard…

comment:10 by pramsey, 21 months ago

Yes, no, maybe so?

comment:11 by robe, 21 months ago

just commit

comment:12 by Paul Ramsey <pramsey@…>, 21 months ago

In 49f24d5/git:

Ensure that additional operators are not appended when the function and opfamily disagree about dimensionality. References #5025

comment:13 by Regina Obe <lr@…>, 21 months ago

In ca37a62/git:

Add missing NEWS. References #5025 for PostGIS 3.3.0rc1

comment:14 by pramsey, 20 months ago

Resolution: fixed
Status: newclosed

comment:15 by Paul Ramsey <pramsey@…>, 20 months ago

In 69c924d/git:

Ensure that additional operators are not appended when the function and opfamily disagree about dimensionality. References #5025

comment:16 by robe, 20 months ago

Milestone: PostGIS 3.0.7PostGIS 3.2.3

@pramsey, I'm switching this to 3.2.3 since I assume you are not planning to back port to 3.1 or 3.0

Note: See TracTickets for help on using tickets.