Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#3127 closed defect (fixed)

KNN geog distance doesn't match ST_Distance spheroid

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.2.0
Component: documentation Version: master
Keywords: Cc:

Description (last modified by robe)

I'm not sure if this is expected or not, but I was trying to write up regress tests for KNN geog and found inconsistencies between what <-> returns and what ST_Distance geography returns which changes the ranking. It seems the <-> is returning the sphere distance instead of spheroid (e.g. I get same answer if I do

ST_Distance(geog1,geog2, false) )


If that is the case, and by design, we just need to make a note of it in the docs, so people don't falsely report its bugged.

e.g.:

CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
INSERT INTO knn_recheck_geog(gid,geog)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*1.11,y*0.95)::geography As geog
FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-90,90,1) As y;

SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY ST_Distance(a.geog, b.geog) ) As true_rn, b.rn  As knn_rn,  ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog, false) As dist_sphere, 
a.geog <-> b.geog As knn_dist
FROM knn_recheck_geog As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geog, RANK() OVER(ORDER BY a.geog <-> g.geog) As rn 
			FROM knn_recheck_geog As g WHERE a.gid <> g.gid ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
	WHERE a.gid = 1000
ORDER BY a.gid, b.rn;


yields:

 gid  | match | true_rn | knn_rn |  dist_spheroid   |   dist_sphere    |     knn_dist
------+-------+---------+--------+------------------+------------------+------------------
 1000 |  1001 |       2 |      1 | 105051.447591582 |   105635.3257479 |   105635.3257479
 1000 |   999 |       1 |      2 | 105049.134514627 |   105635.3257479 |   105635.3257479
 1000 |   819 |       3 |      3 | 123294.778176043 | 123155.172887459 | 123155.172887459
 1000 |  1181 |       4 |      4 | 123294.778176044 |  123155.17288746 |  123155.17288746
 1000 |   820 |       5 |      5 | 161924.039877904 | 162197.178690157 | 162197.178690157
(5 rows)

Hmm and I should add -- I can't get this to use an index (or a simple query with a constant anymore so guess I may need to reopen that other ticket.

Change History (8)

comment:1 Changed 4 years ago by robe

Component: documentationpostgis
Resolution: fixed
Status: newclosed

pramsey fixed this at r13563 not sure if its related or something I just missed before but spatial index is no longer being used. Will post a separate ticket.

comment:2 Changed 4 years ago by robe

Resolution: fixed
Status: closedreopened
Summary: KNN geog distance doesn't match ST_Distance spheroid but uses sphereKNN geog distance doesn't match ST_Distance spheroid

comment:3 Changed 4 years ago by robe

Description: modified (diff)

Okay how far off are these supposed to be, I'm still getting cases where the distances don't line up with or without an index. Case in point:

CREATE TABLE knn_recheck_geog_small_2 (
    gid integer,
    geog geography
);



INSERT INTO knn_recheck_geog_small_2 VALUES (818, '0101000020E6100000A4703D0AD7A35AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (819, '0101000020E6100000A4703D0AD7A35AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (820, '0101000020E6100000A4703D0AD7A35AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (998, '0101000020E6100000CDCCCCCCCC5C5AC0666666666666FE3F');
INSERT INTO knn_recheck_geog_small_2 VALUES (999, '0101000020E6100000CDCCCCCCCC5C5AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1000, '0101000020E6100000CDCCCCCCCC5C5AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1001, '0101000020E6100000CDCCCCCCCC5C5AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (1180, '0101000020E6100000F6285C8FC2155AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1181, '0101000020E6100000F6285C8FC2155AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1182, '0101000020E6100000F6285C8FC2155AC00000000000001340');

Now without even creating an index, these distances don't match up:

SELECT a.gid,  ARRAY(SELECT  (gid,a.geog <-> g.geog)
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY a.geog <-> g.geog, g.gid LIMIT 2) As knn ,
        ARRAY(SELECT  (gid, ST_Distance(a.geog, g.geog ) )
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY ST_Distance(a.geog, g.geog), g.gid LIMIT 2) As dist
FROM knn_recheck_geog_small_2 As a 
WHERE a.gid = 1000
ORDER BY a.gid;

--gives

 gid  |                       knn                        |   dist
------+--------------------------------------------------+-----------------------------
 1000 | {"(1001,105635.3257479)","(999,105635.3257479)"} | {"(999,105049.134514627)",(1001,105051.447591582)"}

That could be expected behavior I guess. I should add, I can't get this or any simple derivative to use the spatial index UNLESS one side is a LITERAL (something like 'POINT(...)'. I think I was able to before (we tried to fix this issue). I may need to go to an old version to verify that assumption. Even enable_seqscan=false doesn't do it.

Last edited 4 years ago by robe (previous) (diff)

comment:4 Changed 4 years ago by robe

Disregard my index comment -- not sure if its an issue of concern or not, but it seems the extra order by clause prevented use of spatial index so getting rid of the addtional order by a.gid seems to allow index to kick in.

SELECT a.gid,  ARRAY(SELECT  (gid,a.geog <-> g.geog)
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY a.geog <-> g.geog LIMIT 2) As knn ,
        ARRAY(SELECT  (gid, ST_Distance(a.geog, g.geog ) )
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY ST_Distance(a.geog, g.geog) LIMIT 2) As dist
FROM knn_recheck_geog_small_2 As a 
WHERE a.gid = 1000
ORDER BY a.gid;

but answer with index use is still the same with it thinking these two geometries are same distance. Seems like its still using sphere to me at r13589

SELECT a.geog  <-> g.geog As knn_dist, ST_Distance(a.geog,g.geog) As dist, ST_Distance(a.geog,g.geog, false) As dist_sphere
FROM (SELECT 'POINT(-105.45 3.8)'::geography) As a(geog) ,
    (VALUES ('POINT(-105.45 2.85)'::geography ) , ( 'POINT(-105.45 4.75)'::geography) ) g(geog);

    knn_dist    |       dist       |  dist_sphere
----------------+------------------+----------------
 105635.3257479 | 105049.134514627 | 105635.3257479
 105635.3257479 | 105051.447591582 | 105635.3257479

comment:5 Changed 4 years ago by robe

Hey wait a minute -- looks like in r13588 you switched it back to not use spheroid.

comment:6 Changed 4 years ago by robe

Resolution: fixed
Status: reopenedclosed

Okay fixed at r13590 and the knn_recheck regress test I had put in is now passing now without crashing.

comment:7 Changed 4 years ago by robe

I'm beginning to think this is a sunk cause - should we just have it use sphere? I was all happy until I went to cleanup my tests (to get rid of the additional order by which was preventing index from kicking in) and was able to trigger the index out of order problem again. I'll confirm it's the forcing the spheroid that is causing this before we decide if we need to sacrifice this to make #3131 work.

comment:8 Changed 4 years ago by robe

Component: postgisdocumentation

reverted the fix, and amended the documentation to say its on sphere instead of spheroid at r13593

Note: See TracTickets for help on using tickets.