Documentation of ST_ClusterDBSCAN function and minpoints parameter definition issue
|Reported by:||gisupanalizy||Owned by:||pramsey|
|Keywords:||documentation window st_clusterdbscan windowing function||Cc:|
While using ST_ClusterDBSCAN function I was confused seeing that some clusters haven't got created even if they evidently should be. I was assuming that:
ST_ClusterDBSCAN(geom, eps := 1, minpoints := 4) OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type))
would result in clustering points that have the same "PARTITION BY" attributes and the minimum group would need to have 4 points (including core point) in eps distance. According to the docs: https://postgis.net/docs/manual-2.4/ST_ClusterDBSCAN.html An input geometry will be added to a cluster if it is either:
- A "core" geometry, that is within eps distance of at least minpoints input geometries (including itself) or
- A "border" geometry, that is within eps distance of core geometry. its surrounding area with radius eps.
But it seems that it's not exactly true. It seems that in order to cluster 4 points (like minpoints parameter is set), the grouping query:
OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type))
needs to result in at least five objects to let clst_id to be created for other four.. Here is an example:
CREATE TABLE IF NOT EXISTS public.point_table ( point_sys_id serial primary key,--System generated Primary Key - Asset Cache point_id bigint, geom geometry(Point,4326),--Geometry Field country_code varchar(4),--Country Code elevation_ft numeric(7,2),--Elevation in Feet height_ft numeric(7,2),--Height in Feet obstacle_type varchar(50)--Obstacle Type ) INSERT INTO point_table(point_id, geom, country_code, elevation_ft, height_ft, obstacle_type) VALUES (1, '0101000020E6100000E4141DC9E5934B40D235936FB6193940', 'ARE', 100, 50, 'BUILDING'), (2, '0101000020E6100000C746205ED7934B40191C25AFCE193940', 'ARE', 100, 50, 'BUILDING'), (3, '0101000020E6100000C780ECF5EE934B40B6BE4868CB193940', 'ARE', 100, 50, 'BUILDING'), (4, '0101000020E6100000A97A358FA5AF4B4074A0C65B724C3940', 'ARE', 100, 50, 'BUILDING'), -- this point is outside of the cluster distance (eps) (5, '0101000020E6100000ABB2EF8AE0934B404451A04FE4193940', 'ARE', 100, 50, 'BUILDING') select ST_ClusterDBSCAN(geom, eps := 0.000906495804256269, minpoints := 4) OVER(PARTITION BY CONCAT(country_code, elevation_ft, height_ft, obstacle_type)) as clst_id, point_id, geom, country_code, elevation_ft, height_ft, obstacle_type from point_table --where point_id != 4
Running clustering query agains all five points works fine. But once you exclude seemingly irrelevant point_id = 4 (which is outside of eps distance anyway) the clustering stops working (clst_id becomes null), even if still theoretically 4 needed points (according to the docs) are in place.
Once I change the minpoints parameter to 3, clustering works fine for those 4 neighbor points.
Can someone confirm my conclusion that ST_ClusterDBSCAN is not correct or give some good exmplanation for this behavior?
The same issue I've posted on StackOverflow: https://stackoverflow.com/questions/66159186/st-clusterdbscan-function-and-minpoints-parameter-definition