#5828 closed defect (fixed)
Performace issue in ST_DFullyWithin
Reported by: | nbvfgh | Owned by: | pramsey |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 3.6.0 |
Component: | postgis | Version: | 3.5.x |
Keywords: | Cc: |
Description
The premise of this issue is that ST_DFullyWithin(A, B, R) and ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R are equivalent.
EXPLAIN ANALYZE SELECT ST_Contains(ST_Buffer(a, 2), a_translate) FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a, ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate); -- Execution Time: 0.017 ms EXPLAIN ANALYZE SELECT ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= 2 FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a, ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate); -- Execution Time: 0.005 ms
We found that the running time of ST_DFullyWithin is several times that of the equivalent query. To eliminate the impact of randomness, we conducted the following experiments:
DO $$ DECLARE start_time TIMESTAMP; end_time TIMESTAMP; i INTEGER; BEGIN start_time := clock_timestamp(); FOR i IN 1..10000 LOOP PERFORM ST_Contains(ST_Buffer(a, i), a_translate) FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a, ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate); END LOOP; end_time := clock_timestamp(); RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time; END $$; -- Total execution time for 10000 runs: 00:00:00.585733 DO $$ DECLARE start_time TIMESTAMP; end_time TIMESTAMP; i INTEGER; BEGIN start_time := clock_timestamp(); FOR i IN 1..10000 LOOP PERFORM ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= i FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a, ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate); END LOOP; end_time := clock_timestamp(); RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time; END $$; -- Total execution time for 10000 runs: 00:00:00.016847
ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R indeed runs more than almost 4 times faster than ST_DFullyWithin(A, B, R) in this case. I know that the current implementation of ST_DFullyWithin(A, B, R) is ST_Contains (ST_Buffer (A, R), B), if the equivalence relationship I proposed holds, I believe there will be a significant improvement in the performance of this function.
Change History (4)
comment:2 by , 4 weeks ago
Priority: | blocker → medium |
---|---|
Resolution: | → fixed |
Status: | new → closed |
follow-up: 4 comment:3 by , 4 weeks ago
In general
ST_DFullyWithin(A, B, R)
and
ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R
are NOT equivalent.
A counter-example is:
WITH data(a, b) AS (VALUES ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry) ) SELECT ST_DFullyWithin(a, b, 3.0), ST_MaxDistance(ST_ClosestPoint(a, b), b) FROM data;
The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.
comment:4 by , 3 weeks ago
Replying to mdavis:
In general
ST_DFullyWithin(A, B, R)and
ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ Rare NOT equivalent.
A counter-example is:
WITH data(a, b) AS (VALUES ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry) ) SELECT ST_DFullyWithin(a, b, 3.0), ST_MaxDistance(ST_ClosestPoint(a, b), b) FROM data;The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.
Yes, I realized this, so I closed this ticket. Thanks for your patience!
It may not be necessary to actually calculate the buffer for extending the distance of geometry A to determine whether geometry B is fully included in the buffer for extending the distance of geometry A. Because in reality, the point on geometry A that is closest to geometry B is the one that determines whether geometry B is fully included in the buffer zone of the extended distance of geometry A. We refer to this point as P1.
So, we can simplify this problem as "Is geometry B completely contained within the buffer zone of the extended distance from point P1?" And the buffer zone of a point is a circle, and we only need to consider its radius. Under what circumstances does a circle contain a geometry? We call the point on geometry B that is farthest from point P1 as P2, and finally simplify the problem as "Is the distance between P1 and P2 less than the distance?".
Finally, we can obtain the following equivalent formula:
ST_DFullyWithin(A, B, R) <=> ST_Contains(ST_Buffer(A, R), B) <=> ST_MaxDistance(ST_ClosestPoint(A, B), B) ≤ R
Sorry for the lengthy explanation, and thank you for your patience.