Opened 7 years ago
Closed 6 years ago
#3418 closed defect (fixed)
KNN recheck in 9.5 fails with index returned tuples in wrong order
|Reported by:||robe||Owned by:||pramsey|
Description (last modified by )
As Stephen Mathers noted on the list, https://lists.osgeo.org/pipermail/postgis-devel/2016-January/025559.html
he was able to break our precious KNN recheck code. I'm hoping this is something we can push upstream. Will try to replicate with build in PostgreSQL geometry types.
— to replicate
DROP TABLE IF EXISTS knn_recheck_geom; CREATE TABLE knn_recheck_geom(gid serial primary key, geom geometry); INSERT INTO knn_recheck_geom(gid,geom) SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*0.777,y*0.887) As geom FROM generate_series(-100,1000, 9) AS x CROSS JOIN generate_series(-300,1000,9) As y; CREATE OR REPLACE FUNCTION zz_2nn_angle(geometry) RETURNS float AS $$ -- Here are my wonderful points to KNN search: WITH index_query AS ( SELECT edge.geom AS geom FROM (SELECT * FROM knn_recheck_geom) AS edge -- This is my query point ORDER BY $1 <-> edge.geom LIMIT 2 ), templine AS ( SELECT ST_MakeLine(geom) AS geom FROM index_query ), angle1 AS ( SELECT ST_Azimuth(ST_StartPoint(geom), $1) angle FROM templine ), angle2 AS ( SELECT ST_Azimuth(ST_EndPoint(geom), $1) angle FROM templine ) SELECT a1.angle - a2.angle FROM angle1 a1, angle2 a2 $$ LANGUAGE SQL; CREATE INDEX idx_knn_recheck_geom on knn_recheck_geom using gist(geom); SELECT zz_2nn_angle(geom) FROM knn_recheck_geom;
ERROR: index returned tuples in wrong order CONTEXT: SQL function "zz_2nn_angle" statement 1 ********** Error **********
Change History (20)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
comment:3 by , 7 years ago
|Milestone:||PostGIS 2.2.2 → PostGIS 2.2.3|
comment:4 by , 7 years ago
comment:5 by , 7 years ago
comment:6 by , 7 years ago
|Milestone:||PostGIS 2.2.3 → PostGIS 2.4.0|
bah pramsey I guess you won't be checking this anytime soon.
comment:7 by , 6 years ago
I am getting this error as well. It exists for postgresql 9.5 and 9.6 for me, but not on 9.4 (even using same postgis version).
easily replicated doing this on a fresh database:
test=# create extension postgis; test=# create table test (geo geometry); test=# insert into test values ('0101000020E61000007D91D0967329E4BF6631B1F9B8D64A40'::geometry), ('0101000020E6100000E2AFC91AF510C1BFCDCCCCCCCCAC4A40'::geometry); test=# create index on test using gist (geo); test=# vacuum analyze test; test=# set enable_seqscan = false; test=# select * from test ORDER BY geo::geometry <-> ('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geometry); ERROR: index returned tuples in wrong order
I would get this error on:
Postgres 9.5.5 + Postgis 2.2.2
Postgres 9.6.1 + Postgis 2.3.1
I did not get this error on:
Postgres 9.4.6 + Postgis 2.1.8
Postgres 9.4.10 + Postgis 2.3.1
Also reported here with no interest https://www.postgresql.org/message-id/20161212221748.14892.42910%40wrigleys.postgresql.org
comment:8 by , 6 years ago
It makes sense you wouldn't get it on 9.4, since the KNN recheck (true distance) feature was introduced in 9.5. So the KNN recheck in 2.3.1 is conditionally left out when compiling against a PostgreSQL 9.4 or lower .
comment:9 by , 6 years ago
Thanks for the simple example. I'm surprised this triggered the error, but it does for me too. Last time I saw such simple a test trigger this was one I had reported upstream earlier on in KNN development.
I was hoping I could also make the built in PostgreSQL point datatype trigger the problem like I was able to with earlier issue, but unfortunately that works fine. Looking at the code of point to point in postgresql, they have recheck set to false for points since it's an exact match, so that doesn't rule out upstream as culprit.
CREATE TABLE test_pt(geo point); INSERT INTO test_pt values ( point(-0.63006, 53.67752) ), ( point(-0.1333, 53.35) ); create index on test_pt using gist (geo); select * from test_pt ORDER BY geo::point <-> '(-0.68333, 53.35)'::point;
Anyrate this example you provided is simpler than the original so hopefully easier to debug.
comment:10 by , 6 years ago
As before, this issue only affects geometry. Our geography implementation seems to be okay.
create table test_geog (geo geography); insert into test_geog values ('0101000020E61000007D91D0967329E4BF6631B1F9B8D64A40'::geography), ('0101000020E6100000E2AFC91AF510C1BFCDCCCCCCCCAC4A40'::geography); create index on test_geog using gist (geo); vacuum analyze test_geog; set enable_seqscan = false; select * from test_geog ORDER BY geo <-> ('0101000020E610000092054CE0D6DDE5BFCDCCCCCCCCAC4A40'::geography);
comment:11 by , 6 years ago
FYI - asked upstream for some helpful suggestions. Will post back if I hear anything note-worthy.
comment:12 by , 6 years ago
Response from Robert Haas in above thread:
On Fri, Dec 30, 2016 at 12:51 AM, Regina Obe <lr(at)pcorp(dot)us> wrote: > I've been trying to troubleshoot the cause of this PostGIS recheck bug we > have reported by two people so far. The last test was a nice simple > repeatable one that triggered the issue: > > https://trac.osgeo.org/postgis/ticket/3418 > > from what I have seen this only affects cases where we are doing a distance > check between two points, which we actually don't need to enable recheck for > anyway, but trying to disable that seems like just shoving the real problem > under the covers. Agreed. > If things are out of order, why isn't just going to was_exact = false good > enough? > > I'm not sure if the mistake is in our PostGIS code or something in > PostgreSQL recheck logic. > If I change the elog(ERROR ...) to a elog(NOTICE, the answers are correct > and sort order is right. > > Under what conditions would cmp return less than 0? I tried following the > code in cmp_orderbyvals, but got lost > and trying to put elog notices in to see what the distance is returning (I > probably did it wrong), just ended up crashing by backend. cmp would return 0 if the estimated distance returned by the index AM were greater than the actual distance. The estimated distance can be less than the actual distance, but it isn't allowed to be more. See gist_bbox_distance for an example of a "lossy" distance calculation, and more generally "git show 35fcb1b3d038a501f3f4c87c05630095abaaadab". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Just for reference, my full post was this:
I've been trying to troubleshoot the cause of this PostGIS recheck bug we have reported by two people so far. The last test was a nice simple repeatable one that triggered the issue: https://trac.osgeo.org/postgis/ticket/3418 from what I have seen this only affects cases where we are doing a distance check between two points, which we actually don't need to enable recheck for anyway, but trying to disable that seems like just shoving the real problem under the covers. Where it errors is this line 272 in src/backend/executor/nodeIndexscan https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/exe cutor/nodeIndexscan.c;h=3143bd94ec4499fba94b41693538b785c4b32e6c;hb=HEAD#l27 2 /* 259 * Was the ORDER BY value returned by the index accurate? The 260 * recheck flag means that the index can return inaccurate values, 261 * but then again, the value returned for any particular tuple 262 * could also be exactly correct. Compare the value returned by 263 * the index with the recalculated value. (If the value returned 264 * by the index happened to be exact right, we can often avoid 265 * pushing the tuple to the queue, just to pop it back out again.) 266 */ 267 cmp = cmp_orderbyvals(node->iss_OrderByValues, 268 node->iss_OrderByNulls, 269 scandesc->xs_orderbyvals, 270 scandesc->xs_orderbynulls, 271 node); 272 if (cmp < 0) 273 elog(ERROR, "index returned tuples in wrong order"); 274 else if (cmp == 0) 275 was_exact = true; 276 else 277 was_exact = false; If things are out of order, why isn't just going to was_exact = false good enough? I'm not sure if the mistake is in our PostGIS code or something in PostgreSQL recheck logic. If I change the elog(ERROR ...) to a elog(NOTICE, the answers are correct and sort order is right. Under what conditions would cmp return less than 0? I tried following the code in cmp_orderbyvals, but got lost and trying to put elog notices in to see what the distance is returning (I probably did it wrong), just ended up crashing by backend. Thanks for any thoughts, Regina
The git commit he references we can look for for guidance is this one: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=35fcb1b3d038a501f3f4c87c05630095abaaadab
Allow GiST distance function to return merely a lower-bound. The distance function can now set *recheck = false, like index quals. The executor will then re-check the ORDER BY expressions, and use a queue to reorder the results on the fly. This makes it possible to do kNN-searches on polygons and circles, which don't store the exact value in the index, but just a bounding box. Alexander Korotkov and me
comment:13 by , 6 years ago
|Milestone:||PostGIS 2.4.0 → PostGIS 2.2.5|
comment:14 by , 6 years ago
Try trunk at r15283
comment:15 by , 6 years ago
okay that works for me and Steve's original query doesn't trigger an error anymore, though it returns all NULLS. That is probably expected though.
I wonder if this will fix our PostgreSQL 10 KNN regressions too, probably not but would be nice. I've been afraid to look at why those are failing or if they are still failing.
I'll do one final check with some real data and then we can commit to 2.3 and 2.2.
comment:16 by , 6 years ago
Seems fine to me, I'm going to go ahead and add a regress test and then backport.
comment:17 by , 6 years ago
comment:18 by , 6 years ago
|Summary:||KNN recheck in 9.5 fails with index returned tuples in wrong order when used in function → KNN recheck in 9.5 fails with index returned tuples in wrong order|
comment:19 by , 6 years ago
comment:20 by , 6 years ago
|Status:||new → closed|
still an issue in 2.3. We might need to push this to 2.2.4 unless pramsey has another hammer lying around like he had at r13588.
I created a version for geography and geography doesn't exhibit the same error. I know I have fewer records here, so I retested geometry with my new smaller
and was still able to trigger the error.