Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#4991 closed defect (invalid)

st_union of single row should keep the row intact

Reported by: michal Owned by: pramsey
Priority: medium Milestone: PostGIS 3.1.5
Component: postgis Version: 3.1.x
Keywords: Cc:

Description

when using st_union as aggregate and applied only to a single row, the result should be identical to the original row. (eg grouping by user_id, the users with just one geometry should by returned the original geometry)

the following query:

select st_astext(st_union(st_geomfromtext('LINESTRING(17.1043382 48.1402074,17.1042455 48.1401187,17.1041991 48.1400716,17.1042055 48.1398829,17.1042068 48.139845,17.1036456 48.139859,17.1027185 48.1398876,17.1025576 48.139914,17.1024604 48.1399439,17.1024039 48.1399764,17.1023737 48.1400056,17.1023571 48.1400384,17.1023541 48.1400711,17.1023659 48.1401087,17.1023811 48.1401357,17.1024202 48.1401652,17.1025061 48.1402134,17.1026764 48.1402846,17.1036586 48.1405359,17.1037086 48.1405476,17.1037503 48.1405545,17.1038012 48.140559,17.1038458 48.14056,17.1038913 48.1405606,17.1039475 48.1405575,17.1039949 48.1405504,17.104034 48.1405409,17.1040669 48.1405298,17.104091 48.1405186,17.1041205 48.140503,17.1041562 48.140481,17.1041879 48.1404584,17.1042111 48.140438,17.1042332 48.1404161,17.1042533 48.1403897,17.1042754 48.1403494,17.1042942 48.140305,17.1043882 48.1400244)')));

returns multilinestring, instead of a linestring: MULTILINESTRING((17.104329639610913 48.14019920899124,17.1042455 48.1401187,17.1041991 48.1400716,17.1042055 48.1398829,17.1042068 48.139845,17.1036456 48.139859,17.1027185 48.1398876,17.1025576 48.139914,17.1024604 48.1399439,17.1024039 48.1399764,17.1023737 48.1400056,17.1023571 48.1400384,17.1023541 48.1400711,17.1023659 48.1401087,17.1023811 48.1401357,17.1024202 48.1401652,17.1025061 48.1402134,17.1026764 48.1402846,17.1036586 48.1405359,17.1037086 48.1405476,17.1037503 48.1405545,17.1038012 48.140559,17.1038458 48.14056,17.1038913 48.1405606,17.1039475 48.1405575,17.1039949 48.1405504,17.104034 48.1405409,17.1040669 48.1405298,17.104091 48.1405186,17.1041205 48.140503,17.1041562 48.140481,17.1041879 48.1404584,17.1042111 48.140438,17.1042332 48.1404161,17.1042533 48.1403897,17.1042754 48.1403494,17.1042942 48.140305,17.104329639610913 48.14019920899124),(17.1043382 48.1402074,17.104329639610913 48.14019920899124),(17.104329639610913 48.14019920899124,17.1043882 48.1400244))

removing the first or the last point in the original query, returns linestring

I assume there are issues with precision, but imho st_union should do nothing on single row entries.

POSTGIS="3.1.1 aaf4c79" [EXTENSION] PGSQL="130" GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY

Change History (12)

comment:1 by michal, 3 years ago

using st_memunion instead of st_union does not produce a multilinestring but a proper linestring (at least in the above example).

comment:2 by komzpa, 3 years ago

Resolution: invalid
Status: newclosed

Result of single-row ST_Union can also be identical to result of ST_UnaryUnion. It is also documented that ST_Union will perform Cascaded Union on your input https://postgis.net/docs/ST_Union.html

If you need to keep your linestring simple because there's more semantics to it, use appropriate functions like ST_Collect that don't modify elements.

comment:3 by michal, 3 years ago

the documentation for st_memunion says Produces the same result as ST_Union, which at least in the above case, is not true (st_union creates multilinestring, st_memunion creates linestring).

even with usage of cascaded union, the union should not split the geometry.

(plus st_union probably does not use cascaded union on linestring, only on polygons, or the comments are outdated)

(of course, I do not know whether the data is singlerow or multirow before running the query).

comment:4 by mdavis, 3 years ago

Hoo boy there's a lot of confusion going on here. Some of it is due to slightly confusing or possibly erroneous documentation, and some is due to (apparently) inconsistent implementation.

The key fact is that it is intentional that Union operation semantics cause *all* lines to be nodedd, even single ones. The input line in this case crosses itself, so it is turned into a MultiLineString.

Given this, it seems like ST_MemUnion may have an optimization that avoids this semantic - which should probably be fixed, so things are consistent.

comment:5 by mdavis, 3 years ago

Why is the semantics of preserving a single-row input required? Multi-row inputs are likely to change, in general. Surely the logic should be able to handle change in the output in all cases?

comment:6 by michal, 3 years ago

I do service like https://mhd.oma.sk/-133867 where I collect data from OSM (linestring or multilinesting), grouping by id in the OSM database. osm2pgsql does a great job of importing and merging data, but sometimes I have to alter the multilinestring (eg removing platforms from tram routes). Then query like select st_union(way) from planet_osm_line where platform is null and osm_id=123 creates usually a linestring (and not multilinestring). Sometimes the number of matching rows is 1, sometimes 2+. but since st_memunion keeps the data intact, I do not have to make if-else before the query (which I like very, very much, please do not change it :) ).

having linestrings allows to use st_linelocatepoint - so info like 'there is a bus stop named AAA after 1.5km'

in this particular linestring, it is a bus line which makes a left turn over a bridge, so when crossing itself the Z index is different. but the Z index is dropped during import, because it's not needed (so the postgis data is 2D only).

comment:7 by mdavis, 3 years ago

Thanks for the detailed explanation. I can see why you want to preserve single linestrings now.

Unfortunately I don't have anything to suggest other than a fairly ugly conditional check before doing the union. You might be able to wrap it in a function to make it easier to use.

comment:8 by mdavis, 3 years ago

I'm puzzled why you use ST_Union at all. Do you need the noding behaviour in some cases? It doesn't sew lines together (for that you need ST_LineMerge). Or why not just use ST_MemUnion?

comment:9 by michal, 3 years ago

well, until today, I did no know ST_MemUnion exists. I should read the docs more.

after reading docs for ST_Union there is no mention that it does not sew linestrings. "Unions the input geometries, merging geometry to produce a result geometry with no overlaps." - for me, this means that it sews linestrings as well as polygons

SELECT ST_AsText(ST_union( ST_GeomFromText('MULTILINESTRING((-44 -34,-45 -33),(-44 -34,-46 -32))') )); removes the overlap but imho it should remove the boundary between the linestrings (producing a linestring, not multilinestring). just as it sews polygons and does not leave border between them: SELECT ST_AsText(ST_union( ST_GeomFromText('MULTIPOLYGON(((-44 -34, -45 -33, -10 -10,-44 -34)),((-44 -34,-46 -32, -50 -50, -44 -34)))') )); - the result is polygon, not multipolygon.

comment:10 by mdavis, 3 years ago

Yes, the ST_Union documentation should mention that it does not sew lines, and have a link to ST_LineMerge.

The reason for not sewing lines is that it is possible to merge the output lines together, but it isn't possible to unmerge them if they were returned as merged. So the operations are more modular this way.

Perhaps ST_Union should take an option to merge lines automatically.

Last edited 3 years ago by mdavis (previous) (diff)

comment:11 by robe, 3 years ago

ST_MemUnion hasn't been touched in eons and does a state ST_Union(geom1,geom2). At a glance ST_Union(geom1,geom2) returns the non-null geometry if one of the geometries is null (as would be the case with an ST_MemUnion with only one geometry). I'm surprised this doesn't use ST_UnaryUnion :) so there is another function we need to clarify the behavior.

At anyrate ST_Union agg is generally much more efficient than ST_MemUnion if you have many geometries. That said I'd go with ST_LineMerge(ST_Union(..)) for performance unless you never have more than a few geometries.

comment:12 by michal, 3 years ago

I've added PR into postgis docs https://git.osgeo.org/gitea/postgis/postgis/pulls/58

not sure if the wording is perfect…

Note: See TracTickets for help on using tickets.