Opened 6 years ago
Closed 6 years ago
#4269 closed defect (invalid)
Inconsistent result for ST_Intersects depending on the plan
Reported by: | Algunenano | Owned by: | pramsey |
---|---|---|---|
Priority: | critical | Milestone: | PostGIS 2.4.7 |
Component: | postgis | Version: | 2.5.x -- EOL |
Keywords: | Cc: |
Description
Reported by Tom van Tilburg in the maillist: https://lists.osgeo.org/pipermail/postgis-devel/2018-December/027568.html
Same query outputs different values for st_intersects depending on the plan.
crap=# Select ST_Intersects(ag, bg), acid, bcid FROM ( WITH data AS ( SELECT 1 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 2, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 3, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT a.cid as acid, b.cid as bcid, a.geom as ag, b.geom as bg FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid::int = 720 ) _A; st_intersects | acid | bcid ---------------+------+------ f | 3 | 1 t | 2 | 1 f | 3 | 2 t | 1 | 2 f | 2 | 3 t | 1 | 3 (6 rows) crap=# set enable_hashjoin=off; SET crap=# Select ST_Intersects(ag, bg), acid, bcid FROM ( WITH data AS ( SELECT 1 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 2, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 3, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT a.cid as acid, b.cid as bcid, a.geom as ag, b.geom as bg FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid::int = 720 ) _A; st_intersects | acid | bcid ---------------+------+------ t | 1 | 2 t | 1 | 3 t | 2 | 1 t | 2 | 3 f | 3 | 1 t | 3 | 2 (6 rows)
I've tested it with PG11 - Postgis trunk and with PG10 - Postgis 2.4.
Change History (9)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
suspected: geometry cache can cache prepared geometry for invalid wrongly and differ on cached and non-cached case.
around here: https://codecov.io/gh/postgis/postgis/src/df7a938e3de65cdfc2ab91aa3db9909600f22566/postgis/lwgeom_geos.c#L2200
comment:3 by , 6 years ago
Yes, order dependent issues are almost always rooted in the cache code line returning different results from the uncached code line. See if using _ST_Intersects() instead (uncached) results in predictable results (probable does). Confirm by using just the && operator as another test case (probably predictable, but worth trying… if there's an issue in the backend it's more likely to manifest with just the pure operator test). Finally, is one or more of the inputs invalid, as komzpa is alluding?
comment:4 by , 6 years ago
original mail says there is invalid:
Plane with cid 10890 is on purpose vertical and therefore seen as 'invalid' by GEOS
however ST_IsValid tells "true" on all the records.
comment:5 by , 6 years ago
Using _ST_Intersects() instead of ST_Intersects does not change the issue with me.
It is correct that one of the geometries is (on purpose) invalid since it is a vertical plane. It's the third one in the example.
comment:6 by , 6 years ago
root=# select ST_IsValid('POLYGON Z ((122422.267 489220.741 root'# 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 root'# 30,122422.267 489220.741 0))'); NOTICE: Too few points in geometry component at or near point 122422.26700000001 489220.74099999998 0 st_isvalid ------------ f (1 row)
comment:7 by , 6 years ago
I thought when ST_IsValid is false the intersection is really unknown? Anyway it does seem like the caching is at fault here. ST_DWithin returns all true as expected, but the ST_Intersects flip answers depending on the order the data is traversed
WITH data AS ( SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom), ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b, a.cid AS acid, b.cid As bcid, a.blockid,b.blockid FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid::int = 720 AND ST_DWithin(a.geom, b.geom,0);
Gives me true for all 6 for ST_DWithin but for ST_Intersects the 3 that have invalid geoms would be filtered out
st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid | bcid | blockid | blockid ------------+---------------+-------------+-----------+-----------+-------+-------+---------+--------- t | f | 0 | f | t | 10890 | 134 | 720 | 720 t | t | 0 | t | t | 316 | 134 | 720 | 720 t | f | 0 | f | t | 10890 | 316 | 720 | 720 t | t | 0 | t | t | 134 | 316 | 720 | 720 t | f | 0 | t | f | 316 | 10890 | 720 | 720 t | t | 0 | t | f | 134 | 10890 | 720 | 720 (6 rows)
WITH data AS ( SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom), ST_Distance(a.geom, b.geom), ST_IsValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b, a.cid AS acid, b.cid AS bcid, a.blockid,b.blockid FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid = 720 AND ST_DWithin(a.geom, b.geom,0);
Gives me true for all 6 for ST_DWithin and filters out 1 with invalid
st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid | bcid | blockid | blockid ------------+---------------+-------------+-----------+-----------+-------+-------+---------+--------- t | t | 0 | t | t | 134 | 316 | 720 | 720 t | t | 0 | t | f | 134 | 10890 | 720 | 720 t | t | 0 | t | t | 316 | 134 | 720 | 720 t | t | 0 | t | f | 316 | 10890 | 720 | 720 t | f | 0 | f | t | 10890 | 134 | 720 | 720 t | t | 0 | f | t | 10890 | 316 | 720 | 720 (6 rows)
I thought maybe the issue was because prepared geometry is assuming ST_Intersects is a commutative operation. In the case of invalid geometries maybe it's not.
though doing a standalone and flipping the order of the geometries, I get false in both cases So this:
SELECT ST_Intersects('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry, 'POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry ); returns: false
SELECT ST_Intersects('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry , 'POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry ); returns: false
So not sure how it's getting a true for:
134 | 10890
and when I explicitly ask for the ids, I get the expected false answer for both regardless if I cast or don't cast the integer
WITH data AS ( SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom), ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b, a.cid AS acid, b.cid As bcid, a.blockid,b.blockid FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid::int = 720 AND ( (a.cid = 134 AND b.cid =10890) OR (a.cid = 10890 and b.cid = 134) ); st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid | bcid | blockid | blockid ------------+---------------+-------------+-----------+-----------+-------+-------+---------+--------- t | f | 0 | t | f | 134 | 10890 | 720 | 720 t | f | 0 | f | t | 10890 | 134 | 720 | 720 (2 rows)
WITH data AS ( SELECT 134 as cid,720::bigint as blockid, ST_GeomFromText('POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))') geom UNION ALL SELECT 316, 720, ST_GeomFromText('POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))') UNION ALL SELECT 10890, 720, ST_GeomFromText('POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))') ) SELECT ST_DWithin(a.geom, b.geom,0), ST_Intersects(a.geom,b.geom), ST_Distance(a.geom, b.geom), ST_ISValid(a.geom) AS isvalid_a, ST_IsValid(b.geom) AS isvalid_b, a.cid AS acid, b.cid As bcid, a.blockid,b.blockid FROM data a, data b WHERE a.cid != b.cid AND a.blockid = b.blockid AND a.blockid = 720 AND ( (a.cid = 134 AND b.cid =10890) OR (a.cid = 10890 and b.cid = 134) ); st_dwithin | st_intersects | st_distance | isvalid_a | isvalid_b | acid | bcid | blockid | blockid ------------+---------------+-------------+-----------+-----------+-------+-------+---------+--------- t | f | 0 | t | f | 134 | 10890 | 720 | 720 t | f | 0 | f | t | 10890 | 134 | 720 | 720 (2 rows)
comment:8 by , 6 years ago
Just skipping the whole join thing, which I feel is a red herring, you can see fun stuff happening in this example:
WITH data AS ( SELECT 1 as cid, 'POLYGON Z ((122427.759 489209.498 14.5066901703871,122395.957 489222.943 14.3997094723791,122399.649 489231.676 4.98556790825181,122431.452 489218.231 5.09216481886026,122427.759 489209.498 14.5066901703871))'::geometry geom UNION ALL SELECT 2, 'POLYGON Z ((122427.027 489216.053 5.97863674861823,122419.873 489219.077 6.03923989893278,122421.6 489223.163 1.83027232084998,122428.754 489220.138 1.77054008712955,122427.027 489216.053 5.97863674861823))'::geometry UNION ALL SELECT 3, 'POLYGON Z ((122422.267 489220.741 0,122424.396 489219.847 0,122424.396 489219.847 30,122422.267 489220.741 30,122422.267 489220.741 0))'::geometry ORDER BY cid ) SELECT a.cid, b.cid, ST_Relate(a.geom, b.geom) AS relate, _ST_Intersects(a.geom, b.geom) FROM data a, data b ORDER BY a.cid, b.cid; cid | cid | relate | _st_intersects -----+-----+-----------+---------------- 1 | 1 | 2FFF1FFF2 | t 1 | 2 | 212101212 | t 1 | 3 | FF2FF1FF2 | t 2 | 1 | 212101212 | t 2 | 2 | 2FFF1FFF2 | t 2 | 3 | FF2FF1FF2 | t 3 | 1 | FFFFFF212 | f 3 | 2 | FFFFFF212 | t 3 | 3 | FFFFFFFF2 | t (9 rows)
Note how the relate matrix of polygon 3 vs itself is not the standard pattern of the other two. The invalid third polygon is problematic. I don't see how this isn't just a known issue… invalid inputs lead to odd results, like non-commutative tests on intersects.
comment:9 by , 6 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
And to address the join thing, we're almost certainly seeing an effect of the cache, where one join form pushes the data through in a different order from the other, so you get a full intersects once and a prepared geometry call the next time, and they give different answers when handed that invalid.
Plan with
hashjoin
on:Plan with hashjoin off: