Opened 9 years ago

Last modified 9 years ago

#3127 closed defect

KNN geog distance doesn't match ST_Distance spheroid — at Version 3

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 2.2.0
Component: documentation Version: master
Keywords: Cc:

Description (last modified by robe)

I'm not sure if this is expected or not, but I was trying to write up regress tests for KNN geog and found inconsistencies between what ←> returns and what ST_Distance geography returns which changes the ranking. It seems the ←> is returning the sphere distance instead of spheroid (e.g. I get same answer if I do

ST_Distance(geog1,geog2, false) )


If that is the case, and by design, we just need to make a note of it in the docs, so people don't falsely report its bugged.

e.g.:

CREATE TABLE knn_recheck_geog(gid serial primary key, geog geography);
INSERT INTO knn_recheck_geog(gid,geog)
SELECT ROW_NUMBER() OVER(ORDER BY x,y) AS gid, ST_Point(x*1.11,y*0.95)::geography As geog
FROM generate_series(-100,100, 1) AS x CROSS JOIN generate_series(-90,90,1) As y;

SELECT a.gid, b.gid As match, RANK() OVER(PARTITION BY a.gid ORDER BY ST_Distance(a.geog, b.geog) ) As true_rn, b.rn  As knn_rn,  ST_Distance(a.geog, b.geog) As dist_spheroid, ST_Distance(a.geog, b.geog, false) As dist_sphere, 
a.geog <-> b.geog As knn_dist
FROM knn_recheck_geog As a 
	LEFT JOIN 
		LATERAL ( SELECT  gid, geog, RANK() OVER(ORDER BY a.geog <-> g.geog) As rn 
			FROM knn_recheck_geog As g WHERE a.gid <> g.gid ORDER BY 'POINT(-105.45 3.8)'::Geography <-> g.geog LIMIT 5) As b ON true
	WHERE a.gid = 1000
ORDER BY a.gid, b.rn;


yields:

 gid  | match | true_rn | knn_rn |  dist_spheroid   |   dist_sphere    |     knn_dist
------+-------+---------+--------+------------------+------------------+------------------
 1000 |  1001 |       2 |      1 | 105051.447591582 |   105635.3257479 |   105635.3257479
 1000 |   999 |       1 |      2 | 105049.134514627 |   105635.3257479 |   105635.3257479
 1000 |   819 |       3 |      3 | 123294.778176043 | 123155.172887459 | 123155.172887459
 1000 |  1181 |       4 |      4 | 123294.778176044 |  123155.17288746 |  123155.17288746
 1000 |   820 |       5 |      5 | 161924.039877904 | 162197.178690157 | 162197.178690157
(5 rows)

Hmm and I should add — I can't get this to use an index (or a simple query with a constant anymore so guess I may need to reopen that other ticket.

Change History (3)

comment:1 by robe, 9 years ago

Component: documentationpostgis
Resolution: fixed
Status: newclosed

pramsey fixed this at r13563 not sure if its related or something I just missed before but spatial index is no longer being used. Will post a separate ticket.

comment:2 by robe, 9 years ago

Resolution: fixed
Status: closedreopened
Summary: KNN geog distance doesn't match ST_Distance spheroid but uses sphereKNN geog distance doesn't match ST_Distance spheroid

comment:3 by robe, 9 years ago

Description: modified (diff)

Okay how far off are these supposed to be, I'm still getting cases where the distances don't line up with or without an index. Case in point:

CREATE TABLE knn_recheck_geog_small_2 (
    gid integer,
    geog geography
);



INSERT INTO knn_recheck_geog_small_2 VALUES (818, '0101000020E6100000A4703D0AD7A35AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (819, '0101000020E6100000A4703D0AD7A35AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (820, '0101000020E6100000A4703D0AD7A35AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (998, '0101000020E6100000CDCCCCCCCC5C5AC0666666666666FE3F');
INSERT INTO knn_recheck_geog_small_2 VALUES (999, '0101000020E6100000CDCCCCCCCC5C5AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1000, '0101000020E6100000CDCCCCCCCC5C5AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1001, '0101000020E6100000CDCCCCCCCC5C5AC00000000000001340');
INSERT INTO knn_recheck_geog_small_2 VALUES (1180, '0101000020E6100000F6285C8FC2155AC0CDCCCCCCCCCC0640');
INSERT INTO knn_recheck_geog_small_2 VALUES (1181, '0101000020E6100000F6285C8FC2155AC06666666666660E40');
INSERT INTO knn_recheck_geog_small_2 VALUES (1182, '0101000020E6100000F6285C8FC2155AC00000000000001340');

Now without even creating an index, these distances don't match up:

SELECT a.gid,  ARRAY(SELECT  (gid,a.geog <-> g.geog)
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY a.geog <-> g.geog, g.gid LIMIT 2) As knn ,
        ARRAY(SELECT  (gid, ST_Distance(a.geog, g.geog ) )
			FROM knn_recheck_geog_small_2 As g WHERE a.gid <> g.gid ORDER BY ST_Distance(a.geog, g.geog), g.gid LIMIT 2) As dist
FROM knn_recheck_geog_small_2 As a 
WHERE a.gid = 1000
ORDER BY a.gid;

—gives

 gid  |                       knn                        |   dist
------+--------------------------------------------------+-----------------------------
 1000 | {"(1001,105635.3257479)","(999,105635.3257479)"} | {"(999,105049.134514627)",(1001,105051.447591582)"}

That could be expected behavior I guess. I should add, I can't get this or any simple derivative to use the spatial index UNLESS one side is a LITERAL (something like 'POINT(…)'. I think I was able to before (we tried to fix this issue). I may need to go to an old version to verify that assumption. Even enable_seqscan=false doesn't do it.

Last edited 9 years ago by robe (previous) (diff)
Note: See TracTickets for help on using tickets.