Opened 8 years ago

Closed 7 years ago

#3418 closed defect (fixed)

KNN recheck in 9.5 fails with index returned tuples in wrong order

Reported by: robe Owned by: pramsey
Priority: medium Milestone: PostGIS 2.2.5
Component: postgis Version: 2.2.x
Keywords: Cc: pramsey

Description (last modified by robe)

As Stephen Mathers noted on the list, https://lists.osgeo.org/pipermail/postgis-devel/2016-January/025559.html

he was able to break our precious KNN recheck code. I'm hoping this is something we can push upstream. Will try to replicate with build in PostgreSQL geometry types.

— to replicate

DROP TABLE IF EXISTS knn_recheck_geom;
CREATE TABLE knn_recheck_geom(gid serial primary key, geom geometry);
INSERT INTO knn_recheck_geom(gid,geom)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*0.777,y*0.887) As geom
FROM generate_series(-100,1000, 9) AS x CROSS JOIN generate_series(-300,1000,9) As y;

CREATE OR REPLACE FUNCTION zz_2nn_angle(geometry) RETURNS float AS $$
-- Here are my wonderful points to KNN search:
WITH index_query AS (

	SELECT edge.geom AS geom
	FROM (SELECT * FROM knn_recheck_geom) AS edge
-- This is my query point
	ORDER BY $1
		<->
	edge.geom LIMIT 2
	),
templine AS (
	SELECT ST_MakeLine(geom) AS geom FROM index_query
),
angle1 AS (
	SELECT ST_Azimuth(ST_StartPoint(geom), $1) angle FROM templine
),
angle2 AS (
	SELECT ST_Azimuth(ST_EndPoint(geom), $1) angle FROM templine
)
SELECT a1.angle - a2.angle FROM angle1 a1, angle2 a2
$$ LANGUAGE SQL;

CREATE INDEX idx_knn_recheck_geom on knn_recheck_geom using gist(geom);


SELECT zz_2nn_angle(geom)
FROM knn_recheck_geom;

Returns:

ERROR:  index returned tuples in wrong order
CONTEXT:  SQL function "zz_2nn_angle" statement 1
********** Error **********

Change History (20)

comment:1 by robe, 8 years ago

Description: modified (diff)

comment:2 by robe, 8 years ago

Description: modified (diff)

comment:3 by pramsey, 8 years ago

Milestone: PostGIS 2.2.2PostGIS 2.2.3

comment:4 by robe, 8 years ago

Description: modified (diff)

comment:5 by robe, 8 years ago

still an issue in 2.3. We might need to push this to 2.2.4 unless pramsey has another hammer lying around like he had at r13588.

I created a version for geography and geography doesn't exhibit the same error. I know I have fewer records here, so I retested geometry with my new smaller

generate_series(-100,100, 9) AS x CROSS JOIN generate_series(-175,100,9)

and was still able to trigger the error.

DROP TABLE IF EXISTS knn_recheck_geog ;
CREATE TABLE knn_recheck_geog(gid serial primary key, geom geography);
INSERT INTO knn_recheck_geog(gid,geom)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*0.777,y*0.887)::geography As geom
FROM generate_series(-100,100, 9) AS x CROSS JOIN generate_series(-175,100,9) As y;

DROP FUNCTION IF EXISTS zz_2nn_angle(geometry);
CREATE OR REPLACE FUNCTION zz_2nn_angle(geography) RETURNS float AS $$
-- Here are my wonderful points to KNN search:
WITH index_query AS (

	SELECT edge.geom AS geom
	FROM (SELECT * FROM knn_recheck_geog) AS edge
-- This is my query point
	ORDER BY $1
		<->
	edge.geom LIMIT 2
	),
templine AS (
	SELECT ST_MakeLine(geom::geometry)::geography AS geom FROM index_query
),
angle1 AS (
	SELECT ST_Azimuth(ST_StartPoint(geom::geometry)::geography, $1) angle FROM templine
),
angle2 AS (
	SELECT ST_Azimuth(ST_EndPoint(geom::geometry)::Geography, $1) angle FROM templine
)
SELECT a1.angle - a2.angle FROM angle1 a1, angle2 a2
$$ LANGUAGE SQL;

CREATE INDEX idx_knn_recheck_geog on knn_recheck_geog using gist(geom);

set enable_seqscan = true; 
set enable_indexscan = true;
SELECT zz_2nn_angle(geom)
FROM knn_recheck_geog;

comment:6 by robe, 8 years ago

Cc: pramsey added
Milestone: PostGIS 2.2.3PostGIS 2.4.0

bah pramsey I guess you won't be checking this anytime soon.

comment:7 by nwilson5, 7 years ago

I am getting this error as well. It exists for postgresql 9.5 and 9.6 for me, but not on 9.4 (even using same postgis version).

easily replicated doing this on a fresh database:

test=# create extension postgis;
test=# create table test (geo geometry);
test=# insert into test values 
  ('0101000020E61000007D91D0967329E4BF6631B1F9B8D64A40'::geometry), 
  ('0101000020E6100000E2AFC91AF510C1BFCDCCCCCCCCAC4A40'::geometry);
test=# create index on test using gist (geo);
test=# vacuum analyze test;
test=# set enable_seqscan = false;
test=# select * from test ORDER BY geo::geometry <->
('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geometry);

ERROR:  index returned tuples in wrong order

I would get this error on:
Postgres 9.5.5 + Postgis 2.2.2
Postgres 9.6.1 + Postgis 2.3.1

I did not get this error on:
Postgres 9.4.6 + Postgis 2.1.8
Postgres 9.4.10 + Postgis 2.3.1

Also reported here with no interest https://www.postgresql.org/message-id/20161212221748.14892.42910%40wrigleys.postgresql.org

comment:8 by robe, 7 years ago

It makes sense you wouldn't get it on 9.4, since the KNN recheck (true distance) feature was introduced in 9.5. So the KNN recheck in 2.3.1 is conditionally left out when compiling against a PostgreSQL 9.4 or lower .

Last edited 7 years ago by robe (previous) (diff)

comment:9 by robe, 7 years ago

nwilson5,

Thanks for the simple example. I'm surprised this triggered the error, but it does for me too. Last time I saw such simple a test trigger this was one I had reported upstream earlier on in KNN development.

I was hoping I could also make the built in PostgreSQL point datatype trigger the problem like I was able to with earlier issue, but unfortunately that works fine which suggests it could be a postgis only bug:

CREATE TABLE test_pt(geo point);

INSERT INTO test_pt values ( point(-0.63006, 53.67752) ),
    ( point(-0.1333, 53.35) );


create index on test_pt using gist (geo);
select * from test_pt ORDER BY geo::point <->
'(-0.68333, 53.35)'::point;

Anyrate this example you provided is simpler than the original so hopefully easier to debug.

Version 0, edited 7 years ago by robe (next)

comment:10 by robe, 7 years ago

As before, this issue only affects geometry. Our geography implementation seems to be okay.

create table test_geog (geo geography);
 insert into test_geog values 
  ('0101000020E61000007D91D0967329E4BF6631B1F9B8D64A40'::geography), 
  ('0101000020E6100000E2AFC91AF510C1BFCDCCCCCCCCAC4A40'::geography);


create index on test_geog using gist (geo);
vacuum analyze test_geog;
set enable_seqscan = false;
select * from test_geog ORDER BY geo <->
('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geography);

comment:11 by robe, 7 years ago

FYI - asked upstream for some helpful suggestions. Will post back if I hear anything note-worthy.

https://www.postgresql.org/message-id/000001d26260%24d4a2d7b0%247de88710%24%40pcorp.us

comment:12 by robe, 7 years ago

Response from Robert Haas in above thread:


On Fri, Dec 30, 2016 at 12:51 AM, Regina Obe <lr(at)pcorp(dot)us> wrote:
> I've been trying to troubleshoot the cause of this PostGIS recheck bug we
> have reported by two people so far.  The last test was a nice simple
> repeatable one that triggered the issue:
>
> https://trac.osgeo.org/postgis/ticket/3418
>
> from what I have seen this only affects cases where we are doing a distance
> check between two points, which we actually don't need to enable recheck for
> anyway, but trying to disable that seems like just shoving the real problem
> under the covers.

Agreed.

> If things are out of order, why isn't just going to was_exact = false good
> enough?
>
> I'm not sure if the mistake is in our PostGIS code or something in
> PostgreSQL recheck logic.
> If I change the elog(ERROR ...) to a elog(NOTICE, the answers  are correct
> and sort order is right.
>
> Under what conditions would cmp return less than 0?  I tried following the
> code in cmp_orderbyvals, but got lost
> and trying to put elog notices in to see what the distance is returning (I
> probably did it wrong), just ended up crashing by backend.

cmp would return 0 if the estimated distance returned by the index AM
were greater than the actual distance.  The estimated distance can be
less than the actual distance, but it isn't allowed to be more.  See
gist_bbox_distance for an example of a "lossy" distance calculation,
and more generally "git show
35fcb1b3d038a501f3f4c87c05630095abaaadab".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Just for reference, my full post was this:


I've been trying to troubleshoot the cause of this PostGIS recheck bug we
have reported by two people so far.  The last test was a nice simple
repeatable one that triggered the issue:

https://trac.osgeo.org/postgis/ticket/3418


from what I have seen this only affects cases where we are doing a distance
check between two points, which we actually don't need to enable recheck for
anyway, but trying to disable that seems like just shoving the real problem
under the covers.
Where it errors is this line 272 in src/backend/executor/nodeIndexscan

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/exe
cutor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l27
2


  /*
 259              * Was the ORDER BY value returned by the index accurate?
The
 260              * recheck flag means that the index can return inaccurate
values,
 261              * but then again, the value returned for any particular
tuple
 262              * could also be exactly correct.  Compare the value
returned by
 263              * the index with the recalculated value.  (If the value
returned
 264              * by the index happened to be exact right, we can often
avoid
 265              * pushing the tuple to the queue, just to pop it back out
again.)
 266              */
 267             cmp = cmp_orderbyvals(node->iss_OrderByValues,
 268                                   node->iss_OrderByNulls,
 269                                   scandesc->xs_orderbyvals,
 270                                   scandesc->xs_orderbynulls,
 271                                   node);
 272             if (cmp < 0)
 273                 elog(ERROR, "index returned tuples in wrong order");
 274             else if (cmp == 0)
 275                 was_exact = true;
 276             else
 277                 was_exact = false;

If things are out of order, why isn't just going to was_exact = false good
enough?

I'm not sure if the mistake is in our PostGIS code or something in
PostgreSQL recheck logic.
If I change the elog(ERROR ...) to a elog(NOTICE, the answers  are correct
and sort order is right.

Under what conditions would cmp return less than 0?  I tried following the
code in cmp_orderbyvals, but got lost
and trying to put elog notices in to see what the distance is returning (I
probably did it wrong), just ended up crashing by backend.


Thanks for any thoughts,
Regina

The git commit he references we can look for for guidance is this one: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=35fcb1b3d038a501f3f4c87c05630095abaaadab

Allow GiST distance function to return merely a lower-bound.

The distance function can now set *recheck = false, like index quals. The
executor will then re-check the ORDER BY expressions, and use a queue to
reorder the results on the fly.

This makes it possible to do kNN-searches on polygons and circles, which
don't store the exact value in the index, but just a bounding box.

Alexander Korotkov and me
Last edited 7 years ago by robe (previous) (diff)

comment:13 by robe, 7 years ago

Milestone: PostGIS 2.4.0PostGIS 2.2.5

comment:14 by pramsey, 7 years ago

Try trunk at r15283

comment:15 by robe, 7 years ago

okay that works for me and Steve's original query doesn't trigger an error anymore, though it returns all NULLS. That is probably expected though.

I wonder if this will fix our PostgreSQL 10 KNN regressions too, probably not but would be nice. I've been afraid to look at why those are failing or if they are still failing.

I'll do one final check with some real data and then we can commit to 2.3 and 2.2.

comment:16 by robe, 7 years ago

Seems fine to me, I'm going to go ahead and add a regress test and then backport.

comment:17 by robe, 7 years ago

In 15284:

Add regress check for ERROR: index returned tuples in wrong order
references #3418

comment:18 by robe, 7 years ago

Summary: KNN recheck in 9.5 fails with index returned tuples in wrong order when used in functionKNN recheck in 9.5 fails with index returned tuples in wrong order

comment:19 by robe, 7 years ago

In 15285:

KNN recheck in 9.5+ fails with index returned tuples in wrong order
references #3418 for PostGIS 2.3

comment:20 by robe, 7 years ago

Resolution: fixed
Status: newclosed

In 15286:

KNN recheck in 9.5+ fails with index returned tuples in wrong order
closes #3418 for PostGIS 2.2
add NEWS note for missing GRANT in upgrade
references #3680

Note: See TracTickets for help on using tickets.