Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#4853 closed defect (fixed)

Documentation of ST_ClusterDBSCAN function and minpoints parameter definition issue

Reported by: gisupanalizy Owned by: pramsey
Priority: medium Milestone: PostGIS 2.4.10
Component: postgis Version: 2.4.x
Keywords: documentation window st_clusterdbscan windowing function Cc:

Description

While using ST_ClusterDBSCAN function I was confused seeing that some clusters haven't got created even if they evidently should be. I was assuming that:

ST_ClusterDBSCAN(geom, eps := 1, minpoints := 4) OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type))

would result in clustering points that have the same "PARTITION BY" attributes and the minimum group would need to have 4 points (including core point) in eps distance. According to the docs: https://postgis.net/docs/manual-2.4/ST_ClusterDBSCAN.html An input geometry will be added to a cluster if it is either:

  • A "core" geometry, that is within eps distance of at least minpoints input geometries (including itself) or
  • A "border" geometry, that is within eps distance of core geometry. its surrounding area with radius eps.

But it seems that it's not exactly true. It seems that in order to cluster 4 points (like minpoints parameter is set), the grouping query:

OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type))

needs to result in at least five objects to let clst_id to be created for other four.. Here is an example:

CREATE TABLE IF NOT EXISTS public.point_table
(   
    point_sys_id serial primary key,--System generated Primary Key - Asset Cache
    point_id bigint,
    geom geometry(Point,4326),--Geometry Field
    country_code varchar(4),--Country Code
    elevation_ft numeric(7,2),--Elevation in Feet
    height_ft numeric(7,2),--Height in Feet
    obstacle_type varchar(50)--Obstacle Type
)

INSERT INTO point_table(point_id, geom, country_code, elevation_ft, height_ft, obstacle_type)
VALUES
(1, '0101000020E6100000E4141DC9E5934B40D235936FB6193940', 'ARE', 100, 50, 'BUILDING'),
(2, '0101000020E6100000C746205ED7934B40191C25AFCE193940', 'ARE', 100, 50, 'BUILDING'),
(3, '0101000020E6100000C780ECF5EE934B40B6BE4868CB193940', 'ARE', 100, 50, 'BUILDING'),
(4, '0101000020E6100000A97A358FA5AF4B4074A0C65B724C3940', 'ARE', 100, 50, 'BUILDING'), -- this point is outside of the cluster distance (eps)
(5, '0101000020E6100000ABB2EF8AE0934B404451A04FE4193940', 'ARE', 100, 50, 'BUILDING')

select ST_ClusterDBSCAN(geom, eps := 0.000906495804256269, minpoints := 4) OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type)) as clst_id, 
       point_id, geom, country_code, elevation_ft, height_ft, obstacle_type
from point_table
--where point_id != 4

Running clustering query agains all five points works fine. But once you exclude seemingly irrelevant point_id = 4 (which is outside of eps distance anyway) the clustering stops working (clst_id becomes null), even if still theoretically 4 needed points (according to the docs) are in place.

Once I change the minpoints parameter to 3, clustering works fine for those 4 neighbor points.

Can someone confirm my conclusion that ST_ClusterDBSCAN is not correct or give some good exmplanation for this behavior?

The same issue I've posted on StackOverflow: https://stackoverflow.com/questions/66159186/st-clusterdbscan-function-and-minpoints-parameter-definition

Change History (4)

comment:1 by gisupanalizy, 3 years ago

Keywords: documentation added

comment:2 by Daniel Baston <dbaston@…>, 3 years ago

Resolution: fixed
Status: newclosed

In 4b0e00a/git:

Allow DBSCAN cluster to form when recordset length = minPoints

Fixes #4853

comment:3 by Daniel Baston <dbaston@…>, 3 years ago

In 8d87f1f/git:

Allow DBSCAN cluster to form when recordset length = minPoints

Fixes #4853

comment:4 by Daniel Baston <dbaston@…>, 3 years ago

In 697270f/git:

Allow DBSCAN cluster to form when recordset length = minPoints

Fixes #4853

Note: See TracTickets for help on using tickets.