Opened 6 years ago

Closed 4 years ago

Last modified 4 years ago

#2422 closed defect (fixed)

geography regression difference ST_DWithin

Reported by: robe Owned by: pramsey
Priority: high Milestone: PostGIS 2.2.0
Component: postgis Version: 2.1.x
Keywords: Cc:

Description (last modified by robe)

I presume geography in 2.0.4 and goegraphy in 2.1 should return the same answer. In benchmarking client's data I found something slightly troubling:

SELECT COUNT(*) FROM face
WHERE ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 );

when I run this on

POSTGIS="2.0.4SVN r11708" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit

 count
-------
   397
(1 row)


Time: 157.613 ms

On

 POSTGIS="2.1.0rc2 r11726" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 Mar
ch 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.7.8" LIBJSON="UNKNOWN
" RASTER PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit

 count
-------
   395
(1 row)


Time: 218.903 ms

whammo I got 2 geometries less and lost a good chunk of speed.

I ran a couple of times on each database with similar timings.

I'll try to get permission from client to release this datasource, but in mean time I'll try to narrow down to the offending geometries.

Attachments (1)

dwithinbug_data.sql (6.8 KB) - added by robe 6 years ago.

Download all attachments as: .zip

Change History (46)

comment:1 Changed 6 years ago by robe

Description: modified (diff)

comment:2 Changed 6 years ago by pramsey

It might be hard to find the exact differences because this is going to be a difference between the tree distance and the edge distance algorithms, and the tree distance gets turned on as caching goes active. Replace testing ST_Dwithin with testing for differences between _ST_DistanceUnCached and _ST_DistanceTree to avoid cache effects.

comment:3 Changed 6 years ago by robe

You mean the distance ones and not the _ST_DWithinUncached ones? I'll try both.

comment:4 Changed 6 years ago by robe

Running this test

SELECT COUNT(*) FROM face
WHERE _ST_DWithinUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 );
 POSTGIS="2.1.0rc2 r11726" GEOS="3.4.0dev-CAPI-1.8.0 r0" PROJ="Rel. 4.8.0, 6 Mar
ch 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.7.8" LIBJSON="UNKNOWN
" RASTER PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 64-bit

 count
-------
   397
(1 row)


Time: 161.412 ms

Looks closer to what I'd expect.

On the 2.0.4 no such functions exist so can't compare.

No such thing as _ST_DWithinTree, so closest which takes painfully long to run is

SELECT COUNT(*) FROM  face
WHERE _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography) <=  1609.344;

I'm still waiting for that one to finish

comment:5 Changed 6 years ago by robe

okay 109953 ms later

SELECT COUNT(*) FROM  face
WHERE _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography) <=  1609.344;

yields:

count
-------
   779
(1 row)

comment:6 Changed 6 years ago by robe

hmm even using _ST_DistanceTree combo returns the right answer with ST_Expand:

SELECT COUNT(*) FROM  face
WHERE geog && _ST_Expand(st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344) 
AND _ST_Expand(geog,1609.344) && st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography  
AND _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography) <  1609.344;

Returns:

 count
-------
   397
(1 row)


Time: 233.620 ms

So something must be fishy with the way _ST_DWithin is coded I'm trying

SELECT COUNT(*) FROM  face
WHERE ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 );

After 112 seconds of patient waiting and it finally returned with a presumably WRONG answer of 395.

comment:7 Changed 6 years ago by robe

oops I meant the query I ran on PostGIS 2.1 instance was:

ELECT COUNT(*) FROM  face
WHERE _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 );

which returned:

 count
-------
   395
(1 row)


Time: 110076.840 ms

comment:8 Changed 6 years ago by robe

Let me try that once more -- my memory buffer is giving me problems:

SELECT COUNT(*) FROM face
WHERE _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true );

yields this:

count
-------
   395
(1 row)


Time: 110076.840 ms

thus conclusion _ST_DWithin(geography, geography,dist, use_sphere) has got some issues.

I vaguely remember having this issue with geometry _ST_DWithin/ST_Distance you fixed. I'll try to dig up the ticket for that one.

comment:9 Changed 6 years ago by robe

Summary: geography regression differencegeography regression difference ST_DWithin

comment:10 Changed 6 years ago by robe

might be related to this one: #2351

comment:11 Changed 6 years ago by pramsey

No, simpler, just

SELECT 
someId, _ST_DistanceUnCached(), _ST_DistanceTree()
FROM yourtable
WHERE _ST_DistanceUnCached() != _ST_DistanceTree();

See? The problem is the algorithms are returning different answers for a very few example geographies, and this will both isolate the ones that are failing and give a hint what the different answers are.

comment:12 Changed 6 years ago by robe

Well doing this:

SELECT 
id, _ST_DistanceUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography)
, _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography)
FROM  face
WHERE _ST_Expand(geog,1609.344) && st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography 
  AND _ST_DistanceUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography) 
 != _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography);

returns 0 records. Doing a check without ST_Expand is going to take a bit longer since I think they are big geometries and 300,000 of them.

comment:13 Changed 6 years ago by robe

Okay check without ST_Expand:

   id   | _st_distanceuncached | _st_distancetree
--------+----------------------+------------------
 330881 |     2811307.65431338 | 2811411.63144931
(1 row)

comment:14 Changed 6 years ago by robe

I ran this test on my isolated dataset that is within the region of the error and this shows the geographies in question I think:

SELECT id FROM face
WHERE _ST_DWithinUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 )
  AND NOT _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true );

Yields:

   id
--------
 212164
  12301

comment:15 Changed 6 years ago by robe

Hmm it might not be specific to geometry. If I put these 2 bad geometries in a table called bad_face2 and run this:

SELECT id 
, _ST_DistanceTree(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography) 
, _ST_DistanceUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography)
FROM bad_face2
WHERE 
  NOT _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true )
AND _ST_DWithinUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 );

I get this:

 id   | _st_distancetree | _st_distanceuncached
------+------------------+----------------------
12301 | 1398.40705751386 |     1398.40705751386

I get only one back.

comment:16 Changed 6 years ago by robe

I think the ST_Distance cache is fine.

I created this function which uses _ST_Distance instead of _ST_DWithin. I frankly don't see why we even bother creating an _ST_DWithin (geography_dwithin) when the distance function takes a tolerance.

CREATE OR REPLACE FUNCTION st_dwithin2(geography, geography, double precision, boolean)
  RETURNS boolean AS
'SELECT $1 && _ST_Expand($2,$3) AND $2 && _ST_Expand($1,$3) AND _st_distance($1, $2, $3, $4) <= $3'
  LANGUAGE sql IMMUTABLE
  COST 100;

I run this query:

SELECT count(id) FROM face
WHERE ST_DWithin2(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true );
;

And voila -- get my expected answer:

 count
-------
   397

Anyway I would like this issue resolved today so I can release today otherwise I'll have to weigh and maybe release tomorrow. In meantime I'll finish doing garden regress against 2.0.4 and hopefully I won't find any other issues.

comment:17 Changed 6 years ago by pramsey

The answer regarding why have a dwithin code line is because the distance code line doesn't actually use the tolerance when evaluating two trees against one another, while the dwithin code line does. The idea is the dwithin line can short circuit out of the evaluation when it hits something within tolerance, while the distance calculation has to check all possibilities to get the right distance. However, the distance code line does take in a tolerance number, it just mostly ignores it.

What this case is showing is a failure in the tolerance-based code, perhaps, although you also have a case that shows the tree and segment based distance calculations coming up different.

comment:18 Changed 6 years ago by robe

food for thought with 9.3, you can make it a materialized view as we describe here:

http://www.postgresonline.com/journal/archives/314-Materialized-geometry_columns-using-Event-Triggers.html

comment:19 Changed 6 years ago by robe

Oops I wish I could take comments back. I just realized I posted the wrong comment to this ticket. Meant to post to another. Ignore the last one.

comment:20 Changed 6 years ago by robe

Milestone: PostGIS 2.1.0PostGIS 2.1.1

comment:21 Changed 6 years ago by pramsey

The point of my previous notes on finding the geometries that provided difference results for tree and brute-force distance calculations was to actually get my hands on the geometries, so I could fix the differences. You need to provide me data, or nothing is going to happen on this, it's not like there's a big bold error in the code, there's a niggly tiny wart somewhere deeply hidden that only intense study with replicable data is going to uncover.

If you think the problem is a difference between the dwithin distance distance code lines, give me (a) a SQL statement and (b) a table the demonstrates it. Try to keep it simple, keep it small.

comment:22 Changed 6 years ago by robe

I know I know. Don't worry about this one. I'm a subcontractor on this one and haven't had much luck moving up the chain of approval (it's a big company and everyone evidentally has higher priorities). I will retest after all your recent changes and see if it's still an issue and relook at those geometries.

This table I realized its a weird one because it has a mix of multipolygons and geometry collections.

comment:23 Changed 6 years ago by robe

actually never mind the geometry collection bit (its a mix of polygons and geometry collections). If my 2 geometries that failed this

SELECT gid, geog FROM  face
WHERE NOT _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true )
AND _ST_DWithinUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 )

and create a smaller only 2 row table with it. Then only one fails this time but putting the check in the from yield2 true so definitely seems like a caching bug of sorts.

SELECT gid, geog, _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true ) FROM  face2
WHERE NOT _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true )
AND _ST_DWithinUnCached(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344 )

Yields:

gid | _st_dwithin
----+-------------
  2 | t
SELECT gid, _ST_DWithin(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography, 1609.344,true )
, ST_Distance(geog, st_setsrid(st_makepoint(-71.0636, 42.3584), 4326)::geography ) 
FROM  face2;

 gid | _st_dwithin |  st_distance
-----+-------------+----------------
   1 | t           | 1398.407057514
   2 | f           |   1001.9245264

Berry berry interesting. What's kinda bizarre to me is both records ARE within distance so even caching wise, how could it ever have a cache that has false in it.

I still need to upgrade this db to test again. This is running with released 2.1.0

comment:24 Changed 6 years ago by robe

I should add that this artifact requires at least 2 records to exhibit the failure. But maybe that's because your caching logic doesn't try to kick in. So adding a gid = 2 it doesn't fail.

comment:25 Changed 6 years ago by robe

Okay I've got another that exhibits the same bug. Try attached -- it also fails on latest PostGIS 2.2. If I change any of the geometries ever so slightly it doesn't fail

See attached file

SELECT gid, _ST_DWithin(geog, tgeog, 1609.344,true ), ST_NPoints(geog::geometry)
, ST_Distance(geog, tgeog) 
FROM  dwithgeogbug CROSS JOIN ST_GeogFromText('POINT(-69.83262 43.43636)') As tgeog;

Changed 6 years ago by robe

Attachment: dwithinbug_data.sql added

comment:26 Changed 6 years ago by robe

Milestone: PostGIS 2.1.1PostGIS 2.2.0

comment:27 Changed 6 years ago by robe

Milestone: PostGIS 2.2.0PostGIS 2.1.2

comment:28 Changed 6 years ago by robe

Hey I think this might be the same issue I am having here: This is a really sweet short example this guy provided to reproduce:

http://lists.osgeo.org/pipermail/postgis-users/2014-January/038619.html

email message added below. I was able to reproduce issue on my windows 7 64-bit running POSTGIS="2.2.0dev r12204" GEOS="3.4.2-CAPI-1.8.2 r3924" PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.0, released 2013/04/24" LIBXML="2.7.8" LIBJSON="UNKNOWN" RASTER

Hello,

I am using Ubuntu 12.04 with the official PostgreSQL apt repo (via
https://wiki.postgresql.org/wiki/Apt).  I am running into an easily
reproducible issue, and was hoping for some help to solve this.

When using ST_Intersects() not all rows that intersect are returned.  This
was not the case in previous versions that we have upgraded from.

These are the steps to reproduce on a fresh install of Ubuntu 12.04 with
all packages updated and PostGIS/PostgreSQL 9.3 installed:

test=# CREATE TABLE test (id serial, condition_geo geography);
CREATE TABLE
test=# INSERT INTO test (condition_geo) VALUES
(ST_Buffer(ST_GeogFromWKB(ST_MakePoint(20.0,30.0)),10.0));
INSERT 0 1
test=# SELECT id FROM test WHERE ST_Intersects("condition_geo",
ST_Buffer(ST_GeogFromText('POINT(20.0 30.0)'), 20.0)) IS TRUE;
 id
----
  1
(1 row)

test=# INSERT INTO test (condition_geo) VALUES
(ST_Buffer(ST_GeogFromWKB(ST_MakePoint(20.0,30.0)),10.0));
INSERT 0 1
test=# SELECT id FROM test WHERE ST_Intersects("condition_geo",
ST_Buffer(ST_GeogFromText('POINT(20.0 30.0)'), 20.0)) IS TRUE;
 id
----
  1
(1 row)

test=# SELECT id FROM test WHERE ST_Intersects("condition_geo",
ST_Buffer(ST_GeogFromText('POINT(20.0 30.0)'), 20.0)) IS TRUE AND id = 2;
 id
----
  2
(1 row)

Note that the SELECT should return both rows 1 and 2 in the first SELECT.

Any thoughts?

comment:29 Changed 6 years ago by robe

Summary: geography regression difference ST_DWithingeography regression difference ST_DWithin and ST_Intersects

comment:30 Changed 6 years ago by robe

might be related to #2556

comment:31 Changed 6 years ago by robe

Summary: geography regression difference ST_DWithin and ST_Intersectsgeography regression difference ST_DWithin, ST_Intersects, ST_Distance

Not sure if it helps -- but definitely something screwy with distance cache -- observe using the example above:

SELECT id , ST_Distance(condition_geo,
ST_Buffer(ST_GeogFromText('POINT(20.0 30.0)'), 20.0)) from test;

 id | st_distance
----+-------------
  1 |           0
  2 | 9.954690252
(2 rows)

and only picking one of course which means it can't use the cache :)

SELECT id , ST_Distance(condition_geo,
ST_Buffer(ST_GeogFromText('POINT(20.0 30.0)'), 20.0)) from test
where id = 2;

 id | st_distance
----+-------------
  2 |           0

comment:32 Changed 6 years ago by robe

Summary: geography regression difference ST_DWithin, ST_Intersects, ST_Distancegeography regression difference ST_DWithin

I have concluded this is not the same issue (at least not exactly) as #2556 . This one seems particular to _ST_DWithin and in this case the distance calcs are fine. Still might be related though.

comment:33 Changed 6 years ago by robe

Some sort of floating point abberation with converting to radians:

SELECT gid, _ST_DWithin(geog, tgeog, 1600,true )
, ST_Distance(geog, tgeog) , _ST_DistanceUnCached(geog, tgeog) As uncached_dist
FROM  dwithgeogbug CROSS JOIN ST_GeogFromText('POINT(-69.83262 43.43636)') As tgeog;

 gid | _st_dwithin |  st_distance   |  uncached_dist
-----+-------------+----------------+------------------
   1 | t           | 1400.230346843 | 1400.23034684253
   2 | t           | 1005.627279769 | 1005.62727976866
SELECT gid, _ST_DWithin(geog, tgeog, 1609,true )
, ST_Distance(geog, tgeog) , _ST_DistanceUnCached(geog, tgeog) As uncached_dist
FROM  dwithgeogbug CROSS JOIN ST_GeogFromText('POINT(-69.83262 43.43636)') As tgeog;

 gid | _st_dwithin |  st_distance   |  uncached_dist
-----+-------------+----------------+------------------
   1 | t           | 1400.230346843 | 1400.23034684253
   2 | f           | 1005.627279769 | 1005.62727976866

how can a bigger radius result in false when a smaller radius results in true

comment:34 Changed 6 years ago by robe

although as of

POSTGIS="2.1.2dev r12213" 

#2556 seems fixed, this issue still remains so apparently they are different.

comment:35 Changed 6 years ago by pramsey

Probably "fixed" at r12214, (see comment). If fix confirmed, also needs port to trunk, but precise test case needs to be stripped out to be brought into CUnit and debugged in detail, is likely a subtle failure in the tree-on-tree distance code when used with tolerances.

comment:36 Changed 6 years ago by robe

okay that seems to fix the issue but like you said its just a bandage doesn't solve your underlying no?

I have dwithinbug_data.sql attached above in case you didn't notice but the geometry that was failing in that dataset has 197 points. You think that's too big to stuff in cunit?

comment:37 Changed 6 years ago by robe

pramsey,

I have a potentially stupid question to ask. I see that: FP_LT macro is defined as

 FP_LT	(A,
 	B	 )			   (((A) + FP_TOLERANCE) < (B))

which is used here: http://postgis.net/docs/doxygen/2.1/de/dc0/lwgeodetic__tree_8c_a337b8cc452a495c52080254b042536c0.html#a337b8cc452a495c52080254b042536c0

Wouldn't that mean that when I define a threshold of X, it is really treating it as X + FP_TOLERANCE ?

So If I happen to hit a node that is between X and (X + FP_TOLERANCE) it would kick out success.

But my X < threshold would fail?

comment:38 Changed 6 years ago by pramsey

Differences less than FP_TOLERANCE will be ignored, yes.

comment:39 in reply to:  36 Changed 6 years ago by pramsey

Replying to robe:

I have dwithinbug_data.sql attached above in case you didn't notice but the geometry that was failing in that dataset has 197 points. You think that's too big to stuff in cunit?

Well, it's not ideal, but barring having another option it'll have to do. I'll visually examine them to see if there's any clues about the issue there, but it's hard to know without some serious spelunking. The tolerance short circuit in the tree/tree code is quite complex, living in the midst of a recursive function and all. I'll see if I can get it replicated in C, and we can go from there.

comment:40 Changed 6 years ago by robe

Milestone: PostGIS 2.1.2PostGIS 2.2.0

okay you've bandaged this enough for 2.1.2 I think. Lets do the real fix in 2.1.3 or 2.2.0

comment:41 Changed 6 years ago by pramsey

I'd like to keep on this one, having the knowledge in my head, can you provide the SQL to run against the attached data to demonstrate the problem?

comment:42 Changed 6 years ago by robe

I think it was this one:

SELECT gid, _ST_DWithin(geog, tgeog, 1609,true )
, ST_Distance(geog, tgeog) , _ST_DistanceUnCached(geog, tgeog) As uncached_dist
FROM  dwithgeogbug CROSS JOIN ST_GeogFromText('POINT(-69.83262 43.43636)') As tgeog;

 gid | _st_dwithin |  st_distance   |  uncached_dist
-----+-------------+----------------+------------------
   1 | t           | 1400.230346843 | 1400.23034684253
   2 | f           | 1005.627279769 | 1005.62727976866

my comment being -- how can a bigger radius result in false when a smaller radius results in true

I think it works now because of your prior bandage

comment:43 Changed 4 years ago by pramsey

Resolution: fixed
Status: newclosed

This should now be correctly fixed at r13954

comment:44 Changed 4 years ago by robe

regression failure on winnie:

 geography .. failed (diff expected obtained: /projects/postgis/tmp/2.2.0dev_pg9.4_geos3.5.0_gdal2.0w64/test_38_diff)
-----------------------------------------------------------------------------
--- geography_expected	2015-08-20 14:19:09 -0400
+++ /projects/postgis/tmp/2.2.0dev_pg9.4_geos3.5.0_gdal2.0w64/test_38_out	2015-08-20 14:22:07 -0400
@@ -21,9 +21,9 @@
 geog_precision_pazafir|0|0
 geog_precision_pazafir|0|0
 1|MULTILINESTRING((0 0,1 1))
-#2422|2|1609|t|t|1005.62727977|1006.24818105|1005.62727976958|1005.62727976958
-#2422|2|1600|t|t|1005.62727977|1006.24818105|1005.62727976958|1005.62727976958
-#2422|2|1068|t|t|1005.62727977|1006.24818105|1005.62727976958|1005.62727976958
+#2422|2|1609|t|t|1005.62727977|1006.24818105|1005.62727976922|1005.62727976922
+#2422|2|1600|t|t|1005.62727977|1006.24818105|1005.62727976922|1005.62727976922
+#2422|2|1068|t|t|1005.62727977|1006.24818105|1005.62727976922|1005.62727976922
 #2422|1|1609|t|t|1400.23034685|1396.81609491|1400.23034684753|1400.23034684753
 #2422|1|1600|t|t|1400.23034685|1396.81609491|1400.23034684753|1400.23034684753
 #2422|1|1068|f|f|1400.23034685|1396.81609491|1400.23034684753|1400.23034684753
-----------------------------------------------------------------------------

comment:45 Changed 4 years ago by pramsey

Regression failures fixed at r13955

Last edited 4 years ago by pramsey (previous) (diff)
Note: See TracTickets for help on using tickets.