Opened 5 years ago

Closed 4 years ago

#2703 closed enhancement (fixed)

KNN gist with recheck for 9.5?

Reported by: robe Owned by: pramsey
Priority: high Milestone: PostGIS 2.2.0
Component: postgis Version: trunk
Keywords: Cc:

Description

I noticed this feature is on the commitfest 9.4 block for consideration.

https://commitfest.postgresql.org/action/patch_view?id=1367

Would be cool if we can push it along and leverage it in 2.2. As best I understand it, it would mean our KNN would act as a true distance check.

From patch thread


KNN-GiST provides ability to get ordered results from index, but this order is based only on index information. For instance, GiST index contains bounding rectangles for polygons, and we can't get exact distance to polygon from index (similar situation is in PostGIS). In attached patch, GiST distance method can set recheck flag (similar to consistent method). This flag means that distance method returned lower bound of distance and we should recheck it from heap.


Am I missing anything here?

Change History (25)

comment:1 Changed 5 years ago by robe

Summary: KNN gist with recheck for 9.4?KNN gist with recheck for 9.5?

I don't think this feature made it into the 9.4 cut.

comment:2 Changed 4 years ago by pramsey

Still no reviewer for current patch https://commitfest.postgresql.org/4/156/

comment:3 Changed 4 years ago by robe

Priority: mediumhigh

Now that knn gist with recheck is officially going to be in 9.5, can push this to high. pramsey push back if you don't think we can get it in for 2.2

comment:5 Changed 4 years ago by pramsey

Update to <-> committed into trunk at r13530, supports real KNN from PgSQL 9.5+

comment:6 Changed 4 years ago by pramsey

Geography <-> support with recheck committed at r13538. Regina, some testing would be appreciated. I don't know what to do for <<->> and <<#>>, which are already committed by strk, but are harder to move to re-check since we don't have a generic n-d exact distance calculation available. See me -devel email on the topic.

comment:7 Changed 4 years ago by robe

I need to recompile Postgresql 9.5 with latest source to confirm it's not bugged here. My instance is about a week old, but so far not looking good. I pulled my airports database of 14,479 someodd records and this query makes my instance go BOOM.

-- POSTGIS="2.2.0dev r13538" GEOS="3.5.0dev-CAPI-1.9.0 r4034" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.11.1, released 2014/09/24 GDAL_DATA not found" LIBXML="2.7.8" LIBJSON="0.12" RASTER

SELECT * FROM  airports ORDER BY (SELECT geog FROM airports WHERE airport_id = 'KBOS') <-> geog limit 10;

Logs just show:

LOG:  server process (PID 7856) was terminated by exception 0xC0000005
DETAIL:  Failed process was running: SELECT * from airports ORDER BY (SELECT geog FROM airports WHERE airport_id = 'KBOS') <-> geog limit 10

comment:8 Changed 4 years ago by robe

Hmm nightly snapshot failed too. Perhaps I need to pull right from git.

In meantime can you test this on your dev. This dummy data example crashes for me too:



CREATE TABLE test_geog_knn AS
SELECT ROW_NUMBER()OVER(ORDER BY x,y) AS gid, ST_Point(x,y)::geography As geog
FROM generate_series(-10,70, 10) As x , generate_series(0,90, 10) As y;

SELECT * FROM test_geog_knn ORDER BY ST_GeogFromText('POINT(-71.0518888888889 42.4934722222222)') <-> geog limit 1;


Equivalent example using geometry doesn't crash though

comment:9 Changed 4 years ago by robe

Okay I built with latest commit -- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c5dd8ead403f85bd041590d2e3e79b72830472d4

and still crashes for me.

SELECT * FROM test_geog_knn ORDER BY ST_GeogFromText('POINT(-71.0518888888889 42.4934722222222)') <-> geog limit 1;

My back trace looks like this:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 10004.0x4b64]
0x00000000708a3263 in geography_distance (fcinfo=0x4b90610) at geography_measurement.c:139
139                     tolerance = PG_GETARG_FLOAT8(2);
(gdb) bt
#0  0x00000000708a3263 in geography_distance (fcinfo=0x4b90610) at geography_measurement.c:139
#1  0x000000000057bf4b in ExecMakeFunctionResultNoSets (fcache=0x4b905a0, econtext=0x4b90340, isNull=0x4b9170a "", isDone=<optimized out>) at execQual.c:2018
#2  0x0000000000581eb0 in ExecTargetList (isDone=0x273ebec, itemIsDone=0x4b91878, isnull=0x4b91708 "", values=0x4b916d0, econtext=0x4b90340, targetlist=0x4b91840) at execQual.c:5363
#3  ExecProject (projInfo=projInfo@entry=0x4b91728, isDone=isDone@entry=0x273ebec) at execQual.c:5578
#4  0x0000000000582300 in ExecScan (node=node@entry=0x4b90228, accessMtd=accessMtd@entry=0x597590 <SeqNext>, recheckMtd=recheckMtd@entry=0x597580 <SeqRecheck>) at execScan.c:207
#5  0x0000000000597603 in ExecSeqScan (node=node@entry=0x4b90228) at nodeSeqscan.c:114
#6  0x000000000057ada8 in ExecProcNode (node=node@entry=0x4b90228) at execProcnode.c:412
#7  0x0000000000598129 in ExecSort (node=node@entry=0x4b8ff90) at nodeSort.c:103
#8  0x000000000057ae18 in ExecProcNode (node=node@entry=0x4b8ff90) at execProcnode.c:488
#9  0x0000000000590a60 in ExecLimit (node=node@entry=0x4b8fc50) at nodeLimit.c:91
#10 0x000000000057acc8 in ExecProcNode (node=node@entry=0x4b8fc50) at execProcnode.c:520
#11 0x0000000000577a6e in ExecutePlan (dest=0x4cd2b50, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x4b8fc50, estate=0x4b8fb38) at execMain.c:1550
#12 standard_ExecutorRun (queryDesc=0xde81138, direction=<optimized out>, count=0) at execMain.c:337
#13 0x000000000068ba68 in PortalRunSelect (portal=portal@entry=0x4b85ae8, forward=forward@entry=1 '\001', count=0, count@entry=41153104, dest=dest@entry=0x0) at pquery.c:952
#14 0x000000000068d0c6 in PortalRun (portal=0x273eeb0, portal@entry=0x4b85ae8, count=41153104, count@entry=2147483647, isTopLevel=isTopLevel@entry=0 '\000', dest=0x0, dest@entry=0x4cd2b50, altdest=altdest@entry=0x4cd2b50, completionTag=0x273f210 "", completionTag@entry=0x40000000075 <error: Cannot access memory at address 0x40000000075>) at pquery.c:796
#15 0x000000000068a964 in exec_simple_query (query_string=0x0) at postgres.c:1104
#16 PostgresMain (argc=<optimized out>, argv=argv@entry=0x2e7f90, dbname=0x10000f000e000d <error: Cannot access memory at address 0x10000f000e000d>, username=<optimized out>) at postgres.c:4025
#17 0x000000000062a913 in BackendRun (port=0x273f400) at postmaster.c:4162
#18 SubPostmasterMain (argc=argc@entry=3, argv=argv@entry=0x357fa0) at postmaster.c:4649
#19 0x00000000007c6830 in main (argc=3, argv=0x357fa0) at main.c:198
(gdb)


comment:10 Changed 4 years ago by pramsey

Reverse the order of your KNN arguments (put the literal argument second instead of first).

Tom said that the crasher was fixed in the latest tip, so I'm disturbed you're seeing a crash on the very latest version.

comment:11 Changed 4 years ago by robe

Yah I was too when I saw that note. I specifically pulled that tar ball associated with the commit and used that and wiped out my old before I reinstalled. You have the latest built on your machine?

comment:12 Changed 4 years ago by robe

Still crashes with the literal being second. I sure wish postgresql had a hash embedded in their versions for extra confirmation I'm running the right version.

comment:13 Changed 4 years ago by robe

Your last commit r13542 fixed the crashing and made my simple contrived test work even after I put on a spatial index.

However, my real data query:

SELECT * FROM  airports ORDER BY geog <-> (SELECT geog FROM airports WHERE airport_id = 'KBOS')  limit 10;

SELECT * FROM  airports ORDER BY ST_GeogFromText('POINT(-71.0518888888889 42.4934722222222)') <-> geog limit 10;

Both give error:

ERROR:  index returned tuples in wrong order
********** Error **********

ERROR: index returned tuples in wrong order
SQL state: XX000


Explain plan shows index trying to be used:

Limit  (cost=8.58..12.64 rows=10 width=279)
  InitPlan 1 (returns $0)
    ->  Index Scan using pk_airports on airports airports_1  (cost=0.29..8.30 rows=1 width=32)
          Index Cond: (airport_id = 'KBOS'::bpchar)
  ->  Index Scan using idx_airports_geog on airports  (cost=0.28..5877.41 rows=14479 width=279)
        Order By: (geog <-> $0)

comment:14 Changed 4 years ago by robe

Okay seems an issue if I do limit more than 1 and I have an index on table:

CREATE TABLE test_geog_knn AS
SELECT ROW_NUMBER()OVER(ORDER BY x,y) AS gid, ST_Point(x,y)::geography As geog
FROM generate_series(-10,70, 10) As x , generate_series(0,90, 10) As y;

CREATE INDEX idx_test_geog_knn ON test_geog_knn USING gist(geog);
SELECT * FROM test_geog_knn ORDER BY (SELECT geog FROM test_geog_knn WHERE gid = 1)  <-> geog limit 10;


outputs:

ERROR:  index returned tuples in wrong order
********** Error **********

ERROR: index returned tuples in wrong order
SQL state: XX000

comment:15 Changed 4 years ago by pramsey

Yes, it reproduces, and it's the commit that fixed your crash r13542 that causes it for me. Prior to that, it *worked* for me.

comment:16 Changed 4 years ago by pramsey

I think I have it at r13544

comment:17 Changed 4 years ago by robe

Yap that did it :). I'll add some regress checks for this and then close this ticket out.

Last edited 4 years ago by robe (previous) (diff)

comment:18 Changed 4 years ago by robe

pramsey,

I added recheck tests at r13546 just for geometry2d (still have to add for 3D and geography). The last test currently fails because it replicates the issue nicklas brought up in dev about how with lateral (and an index in place, the answers are wrong).

I'm going to try to replicate the issue with the built in polygon types in postgresql to see if I can make them fail the same way to rule out postgis as the culprit.

comment:19 Changed 4 years ago by robe

I'm having trouble getting the built in geometry types to bulk the same way, but then again theirs is harder to test, because they don't have a distance funciton, just the <-> distance operator (so hard to do a compare). In running the tests though, I realized the problem may not be the lateral, but that the lateral allows for testing multiple records at a time.

For example the one in the regress I put in with the index in place:

SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY ST_Distance(a.geom, b.geom) ) As true_rn, b.rn  As knn_rn
FROM knn_recheck_geom As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geom, RANK() OVER(ORDER BY a.geom <-> g.geom) As rn 
			FROM knn_recheck_geom As g WHERE a.gid <> g.gid ORDER BY a.geom <-> g.geom LIMIT 5) As b ON true
	WHERE a.gid IN(50000,50001,70000,61000)
ORDER BY a.gid, b.rn;

-- returns
 gid  | match  | true_rn | knn_rn
------+--------+---------+--------
50000 |  48969 |       1 |      1
50000 |  51031 |       2 |      2
50000 |  47938 |       3 |      3
50000 |  52062 |       4 |      4
50000 |  46907 |       5 |      5
50001 |  53093 |       5 |      1
50001 |  48970 |       1 |      2
50001 |  51032 |       2 |      3
50001 |  47939 |       3 |      4
50001 |  52063 |       4 |      5
61000 |  62031 |       1 |      1
61000 |  59969 |       1 |      1
61000 |  63062 |       3 |      3
61000 |  58938 |       4 |      4
61000 |  46908 |       5 |      5
70000 |  64093 |       4 |      1
70000 |  57907 |       5 |      2
70000 | 600002 |       1 |      3
70000 | 600001 |       1 |      3
70000 | 600004 |       1 |      3

Note the problem children are 50001, 70000 (the true distance sort and knn sort should match but they don't. Now if I reduce to just testing one problem child:

SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY ST_Distance(a.geom, b.geom) ) As true_rn, b.rn  As knn_rn  -- , ST_Distance(a.geom, b.geom) , ST_GeometryType(a.geom)
FROM knn_recheck_geom As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geom, RANK() OVER(ORDER BY a.geom <-> g.geom) As rn 
			FROM knn_recheck_geom As g WHERE a.gid <> g.gid ORDER BY a.geom <-> g.geom LIMIT 5) As b ON true
	WHERE a.gid IN(70000)
ORDER BY a.gid, b.rn;

-- the order agrees --
  gid  | match  | true_rn | knn_rn
-------+--------+---------+--------
 70000 | 600001 |       1 |      1
 70000 | 600002 |       1 |      1
 70000 | 600004 |       1 |      1
 70000 |  71031 |       4 |      4
 70000 |  68969 |       5 |      5
(5 rows)

If I do both problem children, the first one is right the second one is screwed up.

comment:20 Changed 4 years ago by robe

minor correction to output at r13547

comment:21 Changed 4 years ago by robe

hold the presses, I think I got postgresql geometry types to fail.

comment:22 Changed 4 years ago by robe

reported upstream: http://www.postgresql.org/message-id/20150525052538.4705.92464@wrigleys.postgresql.org If they fix, then debbie's 9.5 daily run should pass. She is currently failing this recheck test.

Last edited 4 years ago by robe (previous) (diff)

comment:23 Changed 4 years ago by robe

Confirmed. Fix upstream seems to have fixed issue. Switching debbie to read from git since daily tar is not new enough

comment:24 Changed 4 years ago by robe

added geography tests at r13556

Last edited 4 years ago by robe (previous) (diff)

comment:25 Changed 4 years ago by pramsey

Resolution: fixed
Status: newclosed

Closing this ticket. Continuing work in #3133 and #3132

Note: See TracTickets for help on using tickets.