Opened 2 weeks ago

Closed 2 weeks ago

Last modified 13 days ago

#5828 closed defect (fixed)

Performace issue in ST_DFullyWithin

Reported by: nbvfgh Owned by: pramsey
Priority: medium Milestone: PostGIS 3.6.0
Component: postgis Version: 3.5.x
Keywords: Cc:

Description

The premise of this issue is that ST_DFullyWithin(A, B, R) and ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R are equivalent.

EXPLAIN ANALYZE SELECT ST_Contains(ST_Buffer(a, 2), a_translate)
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
-- Execution Time: 0.017 ms

EXPLAIN ANALYZE SELECT ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= 2 
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
-- Execution Time: 0.005 ms

We found that the running time of ST_DFullyWithin is several times that of the equivalent query. To eliminate the impact of randomness, we conducted the following experiments:

DO $$
DECLARE
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    i INTEGER;
BEGIN
    start_time := clock_timestamp();

    FOR i IN 1..10000 LOOP
        PERFORM ST_Contains(ST_Buffer(a, i), a_translate)
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
    END LOOP;

    end_time := clock_timestamp();
    RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time;
END $$;
-- Total execution time for 10000 runs: 00:00:00.585733

DO $$
DECLARE
    start_time TIMESTAMP;
    end_time TIMESTAMP;
    i INTEGER;
BEGIN
    start_time := clock_timestamp();

    FOR i IN 1..10000 LOOP
        PERFORM ST_MaxDistance(ST_ClosestPoint(a, a_translate), a_translate) <= i 
FROM (SELECT ST_GeomFromText('POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))') AS a,
  ST_GeomFromText('POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))') AS a_translate);
    END LOOP;

    end_time := clock_timestamp();
    RAISE NOTICE 'Total execution time for 10000 runs: %', end_time - start_time;
END $$;
-- Total execution time for 10000 runs: 00:00:00.016847

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R indeed runs more than almost 4 times faster than ST_DFullyWithin(A, B, R) in this case. I know that the current implementation of ST_DFullyWithin(A, B, R) is ST_Contains (ST_Buffer (A, R), B), if the equivalence relationship I proposed holds, I believe there will be a significant improvement in the performance of this function.

Change History (4)

comment:1 by nbvfgh, 2 weeks ago

Last edited 2 weeks ago by nbvfgh (previous) (diff)

comment:2 by nbvfgh, 2 weeks ago

Priority: blockermedium
Resolution: fixed
Status: newclosed

comment:3 by mdavis, 2 weeks ago

In general

ST_DFullyWithin(A, B, R)

and

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R

are NOT equivalent.

A counter-example is:

WITH data(a, b) AS (VALUES
  ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry)
)
SELECT ST_DFullyWithin(a, b, 3.0), 
  ST_MaxDistance(ST_ClosestPoint(a, b), b)
  FROM data;

The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.

in reply to:  3 comment:4 by nbvfgh, 13 days ago

Replying to mdavis:

In general

ST_DFullyWithin(A, B, R)

and

ST_MaxDistance(ST_ClosestPoint(A, B), B) ⇐ R

are NOT equivalent.

A counter-example is:

WITH data(a, b) AS (VALUES
  ('LINESTRING (0 0, 0 9)'::geometry, 'LINESTRING( 1 0, 2 9)'::geometry)
)
SELECT ST_DFullyWithin(a, b, 3.0), 
  ST_MaxDistance(ST_ClosestPoint(a, b), b)
  FROM data;

The maximum distance of the closest point is ~ 9.2, but B is fully within < 3 units of A.

Yes, I realized this, so I closed this ticket. Thanks for your patience!:D

Note: See TracTickets for help on using tickets.