Opened 7 years ago

Closed 7 years ago

Last modified 6 years ago

#3675 closed defect (fixed)

Relationship functions not using an index in some cases

Reported by: robe Owned by: robe
Priority: blocker Milestone: PostGIS 2.3.2
Component: postgis Version: 2.3.x
Keywords: Cc:

Description

As noted in

https://lists.osgeo.org/pipermail/postgis-users/2016-December/041779.html

The spatial index usage doesn't seem to kick in at least in PostgreSQL 9.6 2.3 (I have checked 9.5 with 2.3, but 9.5 2.2 is fine).

The issue only arises when you use a longitude / latitude from a table.

As I noted in email:

https://lists.osgeo.org/pipermail/postgis-users/2016-December/041789.html

Spatial index kicks in for common cases of:

SELECT e.gid
FROM e1 , e2 
WHERE ST_Intersects(e1.geom, e2.geom);



SELECT e.gid
FROM e1 
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(-10,
30),4326)  );

But doesn't in case where longitude and latitude are coming from a table

SELECT e.gid
FROM e1 , e2 
WHERE ST_Intersects(e1.geom, ST_SetSRID(ST_MakePoint(e2.longitude,
e2.latitude),4326)  );

Change History (19)

comment:1 by robe, 7 years ago

okay on closer inspection, the ST_SetSRID seems to be at fault and I was close before with costing. It's the cost we put on ST_SetSRID of 10 that is screwing us. If I switch it back to 1 all is good.

I tested on both my PostgreSQL 9.6.1 and PostgreSQL 9.5.2 64-bit windows builds with PostGIS 2.3.1 loaded and both have the same issue.

Here is the full test to exercise issue:

CREATE TABLE e AS SELECT row_number() OVER() As gid, ST_SetSRID( ST_Point(x,y) ,4326) As geom, x as longitude, y as latitude
FROM generate_series(-90,90) As y, generate_series(-179,179) As x;

CREATE INDEX idx_e_geom_gist ON e USING gist(geom);

-- good
SELECT gid 
FROM  e
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( 151.27544336, -33.87718472), 4326), 1);

-- bad
SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), 1);


-- good
SELECT gid 
FROM  e, (SELECT longitude, latitude, geom
    FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,c.geom, 1);

-- I'm really puzzled this one is good


WITH c AS (SELECT ST_SetSRID(ST_Point(-1,-1),4326) As geom)
SELECT gid 
FROM  e, c
WHERE   ST_DWithin(e.geom ,c.geom, 1);

Now if I do this:

CREATE OR REPLACE FUNCTION public.st_setsrid(
    geometry,
    integer)
  RETURNS geometry AS
'$libdir/postgis-2.3', 'LWGEOM_set_srid'
  LANGUAGE c IMMUTABLE STRICT
  COST 1;

The bad now uses an index.

Version 1, edited 7 years ago by robe (previous) (next) (diff)

comment:2 by robe, 7 years ago

In 15267:

Cost on ST_SetSRID is preventing index use with relation functions. Change from 10 to 1.
References #3675 for PostGIS 2.4

comment:3 by robe, 7 years ago

In 15268:

Cost on ST_SetSRID is preventing index use with relation functions. Change from 10 to 1.
References #3675 for PostGIS 2.3

comment:4 by robe, 7 years ago

I think we'll need to do this for other functions as well. ST_Transform is next on my list. So i'll keep this open till I'm finished.

comment:5 by robe, 7 years ago

Owner: changed from pramsey to robe

comment:6 by jdsterritt, 7 years ago

After upgrading our test environment to PostgreSQL 9.6 and PostGIS 2.3. We noticed numerous issues where the indexes are not being utilized. I wanted to provide these additional test cases. We run these statements on large datasets, and the processing time has significantly increased.

CREATE TABLE parent_geom(id serial); 
SELECT AddGeometryColumn('parent_geom','geom','3857','polygon',2);
CREATE INDEX idx_parent_geom ON parent_geom USING gist(geom);

CREATE TABLE child_geom(id serial, parent_id bigint);
SELECT AddGeometryColumn('child_geom','geom','3857','polygon',2);
CREATE INDEX idx_child_geom ON child_geom USING gist(geom);

CREATE TABLE track(id serial);
SELECT AddGeometryColumn('track','geom','3857','linestring',2);
CREATE INDEX idx_track_geom ON track USING gist(geom);

CREATE TABLE stop(id serial);
SELECT AddGeometryColumn('stop','geom','3857','point',2);
CREATE INDEX idx_point_geom ON stop USING gist(geom);


INSERT INTO parent_geom (geom) VALUES (ST_SetSRID(ST_GeomFromText('POLYGON((-9201000 5180000,-9201000 5181000,-9200000 5181000,-9200000 5180000,-9201000 5180000))'),3857));
INSERT INTO child_geom (geom) VALUES (ST_SetSRID(ST_GeomFromText('POLYGON((-9200500 5180400,-9200500 5180500,-9200400 5180500,-9200400 5180400,-9200500 5180400))'),3857));
INSERT INTO track(geom) VALUES (ST_SetSRID(ST_GeomFromText('LINESTRING(-9200744 5180268, -9200117 5181610)'),3857));
INSERT INTO stop(geom) VALUES (ST_SetSRID(ST_GeomFromText('POINT(-9200744 5180268)'),3857));

Using ST_Centroid

EXPLAIN
UPDATE child_geom as c SET parent_id = p.id FROM parent_geom as p WHERE ST_Intersects(ST_Centroid(c.geom), p.geom);

Query Plans
PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit
PostGIS - 2.3 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
Update on child_geom c (cost=0.00..736731.49 rows=478367 width=56)
→ Nested Loop (cost=0.00..736731.49 rows=478367 width=56)
Join Filter: st_intersects(st_centroid(c.geom), p.geom)
→ Seq Scan on parent_geom p (cost=0.00..22.70 rows=1270 width=42)
→ Materialize (cost=0.00..26.95 rows=1130 width=42)
→ Seq Scan on child_geom c (cost=0.00..21.30 rows=1130 width=42)

PostgreSQL 9.5.5 on x86_64-pc-linux-gnu, (Red Hat 4.8.5-4), 64-bit
PostGIS - 2.2 USE_GEOS=1 USE_PROJ=1 USE_STATS=1

Update on child_geom c (cost=0.15..549.57 rows=478 width=52)
→ Nested Loop (cost=0.15..549.57 rows=478 width=52)
→ Seq Scan on child_geom c (cost=0.00..21.30 rows=1130 width=42)
→ Index Scan using idx_parent_geom on parent_geom p (cost=0.15..0.46 rows=1 width=42)
Index Cond: (st_centroid(c.geom) && geom)
Filter: _st_intersects(st_centroid(c.geom), geom)

Using ST_SnapToGrid

EXPLAIN 
SELECT * FROM track as t, parent_geom as p WHERE ST_Intersects(ST_SnapToGrid(t.geom, 0.1),p.geom);

Query Plans

PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit
2.3 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
Nested Loop (cost=0.00..524241.08 rows=537633 width=72)
Join Filter: st_intersects(st_snaptogrid(t.geom, '0'::double precision, '0'::double precision, '0.1'::double precision, '0.1'::double precision), p.geom)
→ Seq Scan on track t (cost=0.00..22.70 rows=1270 width=36)
→ Materialize (cost=0.00..29.05 rows=1270 width=36)
→ Seq Scan on parent_geom p (cost=0.00..22.70 rows=1270 width=36)

PostgreSQL 9.5.5 on x86_64-pc-linux-gnu (Red Hat 4.8.5-4), 64-bit
2.2 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
Nested Loop (cost=0.15..609.63 rows=538 width=72)
→ Seq Scan on track t (cost=0.00..22.70 rows=1270 width=36)
→ Index Scan using idx_parent_geom on parent_geom p (cost=0.15..0.45 rows=1 width=36)
Index Cond: (st_snaptogrid(t.geom, '0'::double precision, '0'::double precision, '0.1'::double precision, '0.1'::double precision) && geom)
Filter: _st_intersects(st_snaptogrid(t.geom, '0'::double precision, '0'::double precision, '0.1'::double precision, '0.1'::double precision), geom)

comment:7 by robe, 7 years ago

Thanks for the extra examples.

If you set the cost of ST_Centroid, ST_SnapToGrid back to 1, does the issue go away?

comment:8 by jdsterritt, 7 years ago

Setting the cost on ST_Centroid to 1.0 corrects the index usage. I've have verified this in both the query plan and in our test environment.

Setting the cost on ST_SnapToGrid to 1.0 doesn't help and it appears the indexes are missed. This was verified in both the query plan and test environments.

comment:9 by robe, 7 years ago

There are 4 variants of ST_SnapToGrid for geometry. I suspect you might have only set the sql one which is just a wrapper for one of the others.

Try setting the cost for all the other 3 to one.

Anyway before next release, I'll take off all this costing of functions that output a geometry, so will be all fixed in next micro release.

comment:10 by jdsterritt, 7 years ago

Setting the cost on the other 3 corrected the issue. Thanks for your help, and I'm looking forward to the next release.

comment:11 by robe, 7 years ago

I'm definitely becoming more dyslexic, I labeled the commit # wrong on last commit. Should have gone to this one.

In 15280:

(The changeset message doesn't reference this ticket)

comment:12 by robe, 7 years ago

In 15281:

Relationship functions not using an index in some cases
Set cost back to 1 for functions that return a geometry
A higher cost has the side-effect of making the planner
think when used in conjunction with relate constructs like ST_Intersects
it's cheaper to compute the geometry once rather
than compute twice and break into an index and non-index call.
References #3675 for PostGIS 2.4

comment:13 by robe, 7 years ago

Resolution: fixed
Status: newclosed

comment:14 by pramsey, 6 years ago

OK, I tried again to validate this and I cannot. In my case the "bad" case in your test suite doesn't flip over to an index scan *regardless* of what I do to costing ST_SetSRID.

comment:15 by robe, 6 years ago

I reconfirmed with 2.4.1 and changing ST_SetSRID cost from 1 to 100 makes my bad case not use an index and switching it back to COST 1 makes it use an index.

This is using:

PostgreSQL 9.6.1, compiled by Visual C++ build 1800, 64-bit POSTGIS="2.4.1 r16012" PGSQL="96" GEOS="3.6.1-CAPI-1.10.1 r4317" PROJ="Rel. 4.9.3, 15 August 2016" GDAL="GDAL 2.1.3, released 2017/20/01" LIBXML="2.7.8" LIBJSON="0.12" LIBPROTOBUF="1.2.1" RASTER

Also ran same test on:

PostgreSQL 10.0 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit POSTGIS="2.4.0 r15853" PGSQL="100" GEOS="3.6.2-CAPI-1.10.2 4d2925d6" PROJ="Rel. 4.9.2, 08 September 2015" GDAL="GDAL 2.2.0, released 2017/04/28" LIBXML="2.9.4" LIBJSON="0.12.1" RASTER

and got same behavior

SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), 1);

Plan with ST_SetSRID COST 1 yields

Nested Loop
  ->  Seq Scan on e e_1
        Filter: ((longitude = '-1'::integer) AND (latitude = '-1'::integer))
  ->  Bitmap Heap Scan on e
        Recheck Cond: (geom && st_expand(st_setsrid(st_point((e_1.longitude)::double precision, (e_1.latitude)::double precision), 4326), '1'::double precision))
        Filter: ((st_setsrid(st_point((e_1.longitude)::double precision, (e_1.latitude)::double precision), 4326) && st_expand(geom, '1'::double precision)) AND _st_dwithin(geom, st_setsrid(st_point((e_1.longitude)::double precision, (e_1.latitude)::double precision), 4326), '1'::double precision))
        ->  Bitmap Index Scan on idx_e_geom_gist
              Index Cond: (geom && st_expand(st_setsrid(st_point((e_1.longitude)::double precision, (e_1.latitude)::double precision), 4326), '1'::double precision))

then changing:

CREATE OR REPLACE FUNCTION st_setsrid(
    geometry,
    integer)
  RETURNS geometry AS
'$libdir/postgis-2.4', 'LWGEOM_set_srid'
  LANGUAGE c IMMUTABLE STRICT PARALLEL SAFE
  COST 100;

Changes the plan to this

Nested Loop
  Join Filter: st_dwithin(e.geom, st_setsrid(st_point((e_1.longitude)::double precision, (e_1.latitude)::double precision), 4326), '1'::double precision)
  ->  Seq Scan on e e_1
        Filter: ((longitude = '-1'::integer) AND (latitude = '-1'::integer))
  ->  Seq Scan on e

comment:16 by pramsey, 6 years ago

Just for fun, here's the test function but with the SQL inlining removed and replaced with the literal contents of ST_DWithin(). Annoyingly, it plans out fine, regardless of the cost of ST_SetSRID.

explain SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE  e.geom OPERATOR(&&) ST_Expand(ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326),1) 
AND ST_SetSRID(ST_Point( c.longitude, c.latitude),4326) OPERATOR(&&) ST_Expand(e.geom,1) 
AND _ST_DWithin(ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), e.geom, 1);

comment:17 by pramsey, 6 years ago

Our demo SQL has gotten a little spread out. Here's a complete set of SQL that demonstrates the condition:

-- Create data table and index. Analyze.
DROP TABLE IF EXISTS e;
CREATE TABLE e AS 
  SELECT row_number() OVER() As gid, 
    ST_SetSRID( ST_Point(x,y) ,4326) As geom, 
    x as longitude, y as latitude
  FROM generate_series(-90,90) As y, generate_series(-179,179) As x;
CREATE INDEX idx_e_geom_gist ON e USING gist(geom);
ANALYZE e;

-- Start with a low cost ST_SetSRID
ALTER FUNCTION ST_SetSRID(geometry, integer) COST 1;

-- [A] Query plan hits index
EXPLAIN SELECT gid 
FROM  e
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( 151.27544336, -33.87718472), 4326), 1);

-- [B] Query plan hits index
EXPLAIN SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), 1);

-- Set ST_SetSRID cost to a high value
ALTER FUNCTION ST_SetSRID(geometry, integer) COST 100;

-- [B] Query plan now misses index!!
EXPLAIN SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), 1);

-- Can we fix it by changing the ST_DWithin function into 
-- undecorated (no strict, no immutable) version?
CREATE OR REPLACE FUNCTION ST_DWithin(geom1 geometry, geom2 geometry, float8)
RETURNS boolean
AS 'SELECT $1 OPERATOR(&&) ST_Expand($2,$3) AND $2 OPERATOR(&&) ST_Expand($1,$3) AND _ST_DWithin($1, $2, $3)'
LANGUAGE 'sql';

-- [B] Query plan still misses index!!
EXPLAIN SELECT gid 
FROM  e, (SELECT longitude, latitude FROM e WHERE longitude = -1 and latitude = -1) As c
WHERE   ST_DWithin(e.geom ,
ST_SetSRID(ST_Point( c.longitude, c.latitude), 4326), 1);
Last edited 6 years ago by pramsey (previous) (diff)

comment:18 by pramsey, 6 years ago

OK, our plan fails when we hit an examination of the costs of the parameters to the function being inlined. Our high-cost ST_SetSRID kicks things out, so we don't get in-lined.

https://github.com/postgres/postgres/blob/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584

This seems to be generically true, as an example can be constructed in PgSQL native terms:

-- Create data table and index. Analyze.
DROP TABLE IF EXISTS boxen;
CREATE TABLE boxen AS 
  SELECT row_number() OVER() As gid, 
  box(point(x, y),point(x+1, y+1)) AS b, x, y
  FROM generate_series(-100,100) As y, generate_series(-100,100) As x;
CREATE INDEX idx_b_geom_gist ON boxen USING gist(b);
ANALYZE boxen;

-- inlined function
-- when set 'STRICT' it breaks index access
-- 'IMMUTABLE' doesn't seem to bother it
CREATE OR REPLACE FUNCTION good_box(box, box)
RETURNS boolean
AS 'SELECT $1 OPERATOR(&&) $2 AND length(lseg(point($1),point($2))) < 3'
LANGUAGE 'sql';

-- Start with a low cost circle()
ALTER FUNCTION circle(point, double precision) COST 1;

-- [A] Query plan hits index
EXPLAIN SELECT gid 
FROM boxen
WHERE good_box(
    boxen.b, 
    box(circle(point(20.5, 20.5), 2))
    );

-- [B] Query plan hits index
EXPLAIN SELECT gid 
FROM boxen, 
    (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c 
WHERE good_box(
    boxen.b, 
    box(circle(point(c.x, c.y), 2))
    );

-- Increase cost of circle
ALTER FUNCTION circle(point, double precision) COST 100;

-- [B] Query plan does not hit index
EXPLAIN SELECT gid 
FROM boxen, 
    (SELECT x, y FROM boxen WHERE x < 0 and y < 0) AS c 
WHERE good_box(
    boxen.b, 
    box(circle(point(c.x, c.y), 2))
    );

comment:19 by pramsey, 6 years ago

The pgsql inlining cost test is quite old (15 years), probably not ready for the Brave New World of expensive costed functions.

https://github.com/postgres/postgres/blame/ae20b23a9e7029f31ee902da08a464d968319f56/src/backend/optimizer/util/clauses.c#L4581-L4584

Note: See TracTickets for help on using tickets.