Opened 4 years ago

Closed 4 years ago

#4850 closed defect (invalid)

ST_ClusterKMeans with M seems to do nothing

Reported by: robe Owned by: komzpa
Priority: medium Milestone: PostGIS 3.1.2
Component: postgis Version: 3.1.x
Keywords: Cc:

Description (last modified by robe)

According to the docs, ST_ClusterKMeans in PostGIS 3.1 should support weights ergo - M coordinate.

I thought I could use this to handle things like clustering by population density so that if I have a hi-rise with say 300 people and town houses with say 1-4 people, I should see my hi-rise area clusters have fewer records. It doesn't seem to make a difference whether I pass in M or not. Z does something.

here is a revised example I was going to put in the docs.

CREATE TABLE parcels AS
SELECT lpad(g.ord::text,3,'0') As parcel_id, geom,
('{residential, commercial}'::text[])[1 + mod(g.ord,2)] As type,
CASE WHEN g.ord < 3 THEN g.ord*3000 ELSE 1 END AS population
   
FROM
    ST_Subdivide(ST_Buffer('SRID=3857;LINESTRING(40 100, 98 100, 100 150, 60 90)'::geometry,
    40, 'endcap=square'),12)  WITH ORDINALITY AS g(geom,ord);
 
-- no weight
SELECT ST_ClusterKMeans(ST_Centroid(geom), 5) OVER() AS cid, parcel_id, population
FROM parcels
ORDER BY cid, parcel_id;

-- yields
 cid | parcel_id | population
-----+-----------+------------
   0 | 002       |       6000
   0 | 003       |          1
   1 | 006       |          1
   1 | 007       |          1
   2 | 001       |       3000
   3 | 004       |          1
   4 | 005       |          1
(7 rows)

-- with weight by population

SELECT ST_ClusterKMeans(ST_Force3DM(ST_Centroid(geom), population), 5) OVER() AS cid, parcel_id, population
FROM parcels
ORDER BY cid, parcel_id;

yields:
 cid | parcel_id | population
-----+-----------+------------
   0 | 002       |       6000
   0 | 003       |          1
   1 | 006       |          1
   1 | 007       |          1
   2 | 001       |       3000
   3 | 004       |          1
   4 | 005       |          1
(7 rows)

See answers are the same. I would have expected parcels 002 and 001 to have their own dedicated cluster cause they have such a huge population

Change History (3)

comment:1 by robe, 4 years ago

Description: modified (diff)

comment:2 by komzpa, 4 years ago

As mentioned in the https://postgis.net/docs/ST_ClusterKMeans.html - weight is supported only for POINT. There is no single/reasonable way to interpret different weights on points of polygon or multipoint so these cases are not supported.

comment:3 by komzpa, 4 years ago

Resolution: invalid
Status: assignedclosed
Note: See TracTickets for help on using tickets.