Opened 10 years ago

Last modified 4 years ago

#926 new enhancement

ST_GeometryN, ST_PointN (input of array) return set

Reported by: robe Owned by: pramsey
Priority: medium Milestone: PostGIS Fund Me
Component: postgis Version: main
Keywords: Cc: bitner, skorasaurus

Description

Thinking about ST_DumpPoints and how it doesn't scale with Line strings.

The problem with ST_GeometryN and ST_PointN is that as Paul noted in IRC, requires a memory copy of the whole geometry for each call. See thread - http://logs.qgis.org/postgis/%23postgis.2011-04-27.log

I've had other needs in the past where I needed to explode only a portion of a geometry but not all. ST_Dump becomes slow because it needs to return data I don't care about and ST_PointN/ST_GeometryN are too slow too because if I care about 1000 points of a 100,000 geometry -- memcopy 1000 times is also painful.

Alternative

ST_GeometryN/ST_PointN/ST_InteriorRingN(geom, int[]) returns a set of geometries and takes an arbitrary 1-dimensional array of index positions.

This has a myriad of utilities -- e.g. I could take a huge line and break it into multiple segments easily without incurring memory copy penalty.

My linestring dump would then be.

SELECT ST_PointN(geom, ARRAY(SELECT generate_series(1, ST_NumPoints(geom)));

And of course if I only cared about points from 4 to 6, I would do which I predict would be really fast even for a 100,000 point linestring

SELECT ST_PointN(geom,[4,5,6]);

Then if we swap out the ST_PointN use in ST_DumpPoints with this new implementation, it may beat bitnerd's implementation.

http://pastebin.com/MhR4kr3f

And be much less cryptic looking :).

Change History (11)

comment:1 Changed 10 years ago by bitner

Cc: bitner added

Regina --

Kludgy and heinous, yes. Cryptic??? I think I might take offense! :-)

I do think that this approach does have something to offer with regards to flexibility over a direct c implementation of st_dumppoints.

If we get a good implementation of dumppoints going (even if not), I'll make sure to show you some of the approaches that I have used for data smoothing using my cryptic dumppoints and the newish postgresql windowing functions for linestrings.

David

comment:2 Changed 10 years ago by robe

David, Don't take offense. Perhaps that was not the right word to use. Kludgy and heinous is nicer than cryptic :) Actually I thought a lot of things look cryptic if you haven't seen them before. accessing composite objects looks cryptic, the first time you see it, the way Oracle spatial folks access their geometries with arrays looks cryptic, but many will defend it to the ground why its better and more efficient than what everyone else does and they are probably right.

With that said even the array thing above now that I think about it would look a bit cryptic, but its shorter. Yah shorter is what I was going for.

You see any issues with this approach?

What my idea is keep ST_DumpPoints as an SQL implementation and just do these in C. I can almost imagine doing these myself in C so I think probalby easier than implementing ST_DumpPoints in C.

I think current dump point weak point is the dumping of linestring points as your approach clearly demonstrated, but if we can have these helper functions in C, it will open the door for a lot more fast terse SQL / PLPGSQL function implementations was my thinking.

You think returning points / geometries as a SET is better than an array or would an array make more sense and people can unnest if they want. I'm not sure if there is a performance or usability benefit to outputting it as an array from postgis layer instead of a set.

comment:3 Changed 10 years ago by bitner

I am not sure that I would have a preference between returning an array and using unnest and just returning a SET. I think the reason that both of us are interested in having this is to be able to rely on implementing things quickly in SQL/PLPGSQL (not to mention this would make things also more easy to parse into arrays of coordinates that could be passed into PLR).

One thing that might be a convenience option would to be able to specify an index range in addition to using an array (ie pointn(geom,5,10) as equivalent to pointn(geom,{5,6,7,8,9,10}). One nice trick that could be done using your generate_series approach could be thinning a line by using st_pointn(geom,generate_series(1,st_numpoints(geom),3)) to just get every third point.

comment:4 Changed 10 years ago by robe

David,

I really like the from,to range idea you have. I was thinking to myself if you wanted to grab all 100,000 points of a long linestring that passing in an array of 100,000 integers would be mighty inefficient (granted much more efficient than what we are doing now, but still inefficient). The range idea solves that problem nicely I think and will probably be more used than the input array option.

The skip idea for thining yap that was the along the lines of what I was thinking why it's nice to pass in an array index. I'll look at the C code we have so far to see if I can patch together something that does this wihtout having to learn C.

If it doesn't work I'll come whimpering to the PostGIS Dev group for help. Probably would be worthwhile to do benchmarks on the two approaches.

I'm not sure why I think it but I suspect the return of array might be more efficient.

Do you have some examples of PL/R code how you would stuff such an array into it. Would be curious to see that and yes being able to write fast, efficient, and terse sql / plpgsql functions is my primary objective for this. I see this mem copy issue as our main barrier for that so this is one step in that direction.

comment:5 Changed 10 years ago by bitner

So.... I haven't quite figured out how to actually use this or if I am even barking down the right tree, but I have been trying to make sense of using loess/lowess for some data smoothing.

CREATE OR REPLACE FUNCTION lowess(real[], real[])

RETURNS real[] AS

$BODY$ x <- arg1 y <- arg2 ret <- lowess(y,x,1,1) return(data.frame(ret)) $BODY$

LANGUAGE plr VOLATILE STRICT COST 100;

select lowess(ARRAY[0.1,.51,.98,1.53,1.95,2.5],ARRAY[0,1,2,3,4,5]);

I'm testing smoothing x vs. m, y vs. m for flight path data.

comment:6 Changed 10 years ago by strk

Milestone: PostGIS 2.0.0PostGIS 2.1.0

Not for 2.0, I guess

comment:7 Changed 8 years ago by robe

Milestone: PostGIS 2.1.0PostGIS Future

What I really want is a universal geometry iterator. Perhaps something like described here: http://adpgtech.blogspot.com/2013/03/json-processing-functions-and-new-json.html

But I'm beginning to feel this is a pipe dream.

comment:8 Changed 8 years ago by robe

Milestone: PostGIS FuturePostGIS 2.2.0

comment:9 Changed 7 years ago by skorasaurus

Cc: skorasaurus added

comment:10 Changed 6 years ago by pramsey

Milestone: PostGIS 2.2.0PostGIS Future

comment:11 Changed 4 years ago by robe

Milestone: PostGIS FuturePostGIS Fund Me

Milestone renamed

Note: See TracTickets for help on using tickets.