Opened 4 weeks ago

Closed 4 weeks ago

Last modified 3 weeks ago

#5828 closed defect (fixed)

Performace issue in ST_DFullyWithin

Reported by: nbvfgh Owned by: pramsey
Priority: medium Milestone: PostGIS 3.6.0
Component: postgis Version: 3.5.x
Keywords: Cc:

Description

The premise of this issue is that ST_DFullyWithin(A, B, R) and ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R are equivalent.

EXPLAIN ANALYZE SELECT ST_Contains(ST_Buffer(a, 2), a_translate)
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
-- Execution Time: 0.017 ms

EXPLAIN ANALYZE SELECT ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= 2 
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
-- Execution Time: 0.005 ms

We found that the running time of ST_DFullyWithin is several times that of the equivalent query. To eliminate the impact of randomness, we conducted the following experiments:

DO $$
DECLARE
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    i INTEGER;
BEGIN
    start_time := clock_timestamp();

    FOR i IN 1..10000 LOOP
        PERFORM ST_Contains(ST_Buffer(a, i), a_translate)
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
    END LOOP;

    end_time := clock_timestamp();
    RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time;
END $$;
-- Total execution time for 10000 runs: 00:00:00.585733

DO $$
DECLARE
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    i INTEGER;
BEGIN
    start_time := clock_timestamp();

    FOR i IN 1..10000 LOOP
        PERFORM ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= i 
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
    END LOOP;

    end_time := clock_timestamp();
    RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time;
END $$;
-- Total execution time for 10000 runs: 00:00:00.016847

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R indeed runs more than almost 4 times faster than ST_DFullyWithin(A, B, R) in this case. I know that the current implementation of ST_DFullyWithin(A, B, R) is ST_Contains (ST_Buffer (A, R), B), if the equivalence relationship I proposed holds, I believe there will be a significant improvement in the performance of this function.

Change History (4)

comment:1 by nbvfgh, 4 weeks ago

It may not be necessary to actually calculate the buffer for extending the distance of geometry A to determine whether geometry B is fully included in the buffer for extending the distance of geometry A. Because in reality, the point on geometry A that is closest to geometry B is the one that determines whether geometry B is fully included in the buffer zone of the extended distance of geometry A. We refer to this point as P1.

So, we can simplify this problem as "Is geometry B completely contained within the buffer zone of the extended distance from point P1?" And the buffer zone of a point is a circle, and we only need to consider its radius. Under what circumstances does a circle contain a geometry? We call the point on geometry B that is farthest from point P1 as P2, and finally simplify the problem as "Is the distance between P1 and P2 less than the distance?".

Finally, we can obtain the following equivalent formula: ST_DFullyWithin(A, B, R) <=> ST_Contains(ST_Buffer(A, R), B) <=> ST_MaxDistance(ST_ClosestPoint(A, B), B) ≤ R

Sorry for the lengthy explanation, and thank you for your patience.

Version 0, edited 4 weeks ago by nbvfgh (next)

comment:2 by nbvfgh, 4 weeks ago

Priority: blockermedium
Resolution: fixed
Status: newclosed

comment:3 by mdavis, 4 weeks ago

In general

ST_DFullyWithin(A, B, R)

and

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R

are NOT equivalent.

A counter-example is:

WITH data(a, b) AS (VALUES
  ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry)
)
SELECT ST_DFullyWithin(a, b, 3.0), 
  ST_MaxDistance(ST_ClosestPoint(a, b), b)
  FROM data;

The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.

in reply to:  3 comment:4 by nbvfgh, 3 weeks ago

Replying to mdavis:

In general

ST_DFullyWithin(A, B, R)

and

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R

are NOT equivalent.

A counter-example is:

WITH data(a, b) AS (VALUES
  ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry)
)
SELECT ST_DFullyWithin(a, b, 3.0), 
  ST_MaxDistance(ST_ClosestPoint(a, b), b)
  FROM data;

The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.

Yes, I realized this, so I closed this ticket. Thanks for your patience!:D

Note: See TracTickets for help on using tickets.