Opened 9 years ago
Closed 9 years ago
#3428 closed enhancement (fixed)
Feature request: A function to convert a geometry's coordinates into a MultiPoint
Reported by: | mwtoews | Owned by: | dbaston |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 2.3.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description
This feature request is for a function to convert the coordinates from any geometry into a MULTIPOINT geometry. For example:
POINT(30 10) => MULTIPOINT(30 10) LINESTRING Z (30 10 4,10 30 5,40 40 6) => MULTIPOINT Z (30 10 4,10 30 5,40 40 6) POLYGON((30 10,40 40,20 40,10 20,30 10)) => MULTIPOINT(30 10,40 40,20 40,10 20,30 10) POLYGON((35 10,45 45,15 40,10 20,35 10),(20 30,35 35,30 20,20 30)) => MULTIPOINT(35 10,45 45,15 40,10 20,35 10,20 30,35 35,30 20,20 30) MULTIPOINT M (10 40 1,40 30 2,20 20 3,30 10 4) => MULTIPOINT M (10 40 1,40 30 2,20 20 3,30 10 4)
Similar functionality can be done with existing ST_Collect and ST_DumpPoints functions, e.g.:
CREATE OR REPLACE FUNCTION ST_ToMultiPoint(geometry) RETURNS geometry AS 'SELECT ST_CollectionExtract(ST_Collect(geom), 1) FROM (SELECT * FROM ST_DumpPoints($1)) AS dp;' LANGUAGE sql IMMUTABLE;
however this is not optimised and has a dismal performance. An equivalent function in C would be very efficient.
As for a function name, similar to ST_Boundary, it could be ST_Coordinate or similar.
Attachments (1)
Change History (11)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
A quick C implementation in this case takes about 350ms, relative to 1150ms for the SQL version.
comment:3 by , 9 years ago
Version: | → 2.2.x |
---|
The performance degrades with more geometries (e.g. 500 geometries, each with 1000 points):
DROP TABLE IF EXISTS random_lines; SELECT id, ST_MakeLine(array_agg(pt)) AS geom INTO TEMP random_lines FROM ( SELECT id, ST_MakePoint(random(), random()) pt FROM generate_series(1, 500) id, generate_series(1, 1000) ) r GROUP BY id; -- Test 1 WITH rows AS ( SELECT ST_NPoints(ST_ToMultiPoint(geom)) FROM random_lines) SELECT count(*) FROM rows; -- Test 2 WITH rows AS ( SELECT ST_NPoints(geom) FROM random_lines) SELECT count(*) FROM rows;
With PostGIS 2.1 I'm comparing 500 ms vs. 30 ms.
(Where I really noticed this issue was with an older PostGIS 1.5 that my work insists on using, where I'm comparing a painful 22021 ms vs. 30 ms, which is dismal, but on an unsupported release.)
comment:4 by , 9 years ago
Version: | 2.2.x |
---|
comment:5 by , 9 years ago
Timings on my end for the random_lines example: 400ms for the SQL ST_ToMultiPoint, 130ms for C version. It's only about 40 lines of code, so maybe worth adding. I'll defer to others on that.
comment:6 by , 9 years ago
I'm fine for a new C function, but need a good name.
How about ST_Points ? So that ST_Dump(ST_Points(g)) would be equivalent to ST_DumpPoints(g)
comment:7 by , 9 years ago
Milestone: | PostGIS Future → PostGIS 2.3.0 |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Version: | → trunk |
ST_Points sounds good to me.
comment:8 by , 9 years ago
PR at https://github.com/postgis/postgis/pull/85, will merge if no issues.
strk, what do you think of the addition of utility functions to cu_tester.c? Seems like it would be nice share some utility functions among test files to reduce boilerplate.
comment:9 by , 9 years ago
+1 for sharing utility code See also the ASSERT macros with no CU_ prefix in cu_tester.h, which you may want to use inside the transformation_test (they add printing the obtained result when it doesn't match expected)
On my laptop it takes about 1s to call the ST_ToMultiPoint function on a LineString with 1 million points. Not as fast as it could be, but I'm not sure about dismal. The components of the SQL function are all implemented in C, so there's no glaring performance problem, like PLPGSQL array access, that would be resolved by a direct native implementation.
There's certainly overhead — a few serialization-deserialization round trips, some overhead from the SRF and aggregate machinery, but I'm not sure how much better a direct implementation would be. Certainly not a hard one to put together, though…
I'll attach a postscript file with the call graph and execution times for the following: