Opened 16 years ago

Closed 15 years ago

#76 closed enhancement (fixed)

ST_DumpPoints - I think we said we would add to TODO

Reported by: robe Owned by: robe
Priority: medium Milestone: PostGIS 1.5.0
Component: postgis Version: master
Keywords: Cc:

Description (last modified by kneufeld)

As described here http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003142.html

Though I would keep with the path, geom geom_dump structure we have for ST_Dump and ST_DumpRings. Though the path would be rarely used, I think there are times it comes in handy to reference back to a geometry.

Attachments (2)

my_st_dump_points.sql (2.0 KB ) - added by kneufeld 15 years ago.
A proprosed implemenation of ST_DumpPoints by Maxime van Noppen (maxime@…)
my_st_dump_points.2.sql (2.0 KB ) - added by yabo 15 years ago.
ST_DumpPoints implementation in plpgsql using ST_Dump

Download all attachments as: .zip

Change History (15)

comment:1 by kneufeld, 16 years ago

Wow. An actual use case for the 'path' part of ST_DumpRings. http://postgis.refractions.net/pipermail/postgis-users/2009-February/022674.html

I guess it would good to have a similar datatype for ST_DumpPoints.

— Kevin

comment:2 by robe, 16 years ago

I always thought so. You would definitely need to group points back with the specific Polygon/Line they came from and that is why the path is probably even more so important for ST_DumpPoints

comment:3 by pramsey, 16 years ago

<i>(No comment was entered for this change.)</i>

by kneufeld, 15 years ago

Attachment: my_st_dump_points.sql added

A proprosed implemenation of ST_DumpPoints by Maxime van Noppen (maxime@…)

comment:4 by kneufeld, 15 years ago

Description: modified (diff)

I just attached an implementation of ST_DumpPoints by Maxime van Noppen (maxime@…) recently posted on postgis-users (http://postgis.refractions.net/pipermail/postgis-users/2009-August/024118.html)

At first glance it looks promising. Would a C implementation really be noticeably faster?

Thoughts, team?

comment:5 by robe, 15 years ago

Yes it would be noticeably faster. Didn't we already prove that with 1000 fold geometries with ST_Dump is faster than ST_GeometryN. I really want a C implementation.

comment:6 by kneufeld, 15 years ago

Ah yes, you're right. I forgot that little detail. I was thinking in the back of my mind scalability might be an issue and now I remember why.

It'll still be a nice addition for plpgsql collection on the PostGIS wiki though.

comment:7 by robe, 15 years ago

Yap though we need to separate out into two functions (I see pg 8.4 sections and pg 8.3 or less alternatives — we should separate it out better)

comment:8 by robe, 15 years ago

Relevant NOTE form Yabo: On 08/05/09 14:47, Maxime van Noppen wrote:

Thanks. However Robe (cf ticket) and you are right, there is a scalability issue. For the sake of the test I've implemented the ST_Dump function in plpgsql with a recursive call (as my_ST_DumpPoints is) and in some very bad cases it's very slow. For example I've tested to dump a geometry containing 100 nested GEOMETRYCOLLECTION containing each 10 multipolygons. ST_Dump runs in 35ms whereas my_ST_Dump runs in 230ms (sorry if I'm stating obvious things here, I'm quite new to the postgis world).

On the same example (100 nested collections) my_ST_DumpPoints runs in ~2 seconds which is quite alot. I'll try to come up with a C version, but it seems quite more complicated.

Hm, the test case is the other way : 10 nested geometry collections of 100 multipolygons each.

— yabo

—-MY RESPONSE TO Above Yabo — if its any help. Its not so much that C is a better language than plpgsql. Its not really. Its just the interaction between PostgreSQL and the PostGIS c code that is the main problem. I think the issue is that each call to ST_GeometryN etc. incurs a memcopy penalty where as when done in C (as is the case with ST_Dump — the same geometry object is reused).

If you replace your ST_GeometryN calls with ST_Dump — you'd probably also get much better performance but would run into performance issues when you get to many points in a single geom.

Actually if we solved that more annoying issue — we could probably get way better performance with plpgsql/sql functions which are far easier to write than C functions.

in reply to:  8 comment:9 by yabo, 15 years ago

Replying to robe:

Yabo — if its any help. Its not so much that C is a better language than plpgsql. Its not really. Its just the interaction between PostgreSQL and the PostGIS c code that is the main problem. I think the issue is that each call to ST_GeometryN etc. incurs a memcopy penalty where as when done in C (as is the case with ST_Dump — the same geometry object is reused).

I see. So it could be easier to develop some low-level C functions that allows more direct access to geometries ? Would it be possible to write a GeometryN function that doesn't copy the geometry ?

If you replace your ST_GeometryN calls with ST_Dump — you'd probably also get much better performance but would run into performance issues when you get to many points in a single geom.

Very nice improvement indeed ! Though it limits the depth to 32 (as defined in the source code of ST_Dump) the performance improvement is quite nice :

31 nested collections of 100 multipolygons each : with ST_GeometryN : 3700 ms with ST_Dump : 550 ms

I've then tested to dump the points of a linestring containing 10 000 points and it takes ~ 3 seconds. It doesn't seem to scale linearly as with 50 000 points it takes up to ~ 73 seconds…

However I've attached the new code using ST_Dump rather than ST_GeometryN.

Actually if we solved that more annoying issue — we could probably get way better performance with plpgsql/sql functions which are far easier to write than C functions.

It would be great.

by yabo, 15 years ago

Attachment: my_st_dump_points.2.sql added

ST_DumpPoints implementation in plpgsql using ST_Dump

comment:10 by kneufeld, 15 years ago

This ticket is a year old now and the functionality has again been requested on -users.

I recommend that we add the attached plpgsql solution to PostGIS and close this ticket. A new ticket can be added to document the function, another to improve/rewrite it to C, and maybe a third to add regression tests.

I would much rather have a slow, but functional, method in PostGIS, than a really fast / robust non-existent function in C.

comment:11 by robe, 15 years ago

+1 I guess enough people have asked for this that a slower implementation is better than none at all. And since its not trivially obvious and something we want eventually in our code base.

FWIW: Nicklas started working on a C implementation. He has all the recursive logic done, and I had promised to try to get this from the recursive like model he has (which does just write statements for debugging at the moment), into the stack push model employed by the other ST_Dump functions. I'm still not convinced a stack model is the best given that most people think in terms of recursion rather than stacks (stacks is so assembler) though the stack model is probably more efficient for this.

Anyrate I apologize for letting this slide. I don't think I'll have time to look into it before 1.5 release time especially given my deficient C skills and the fact I'm over extended on at least 5 other projects with fairly firm approaching deadlines.

comment:12 by pramsey, 15 years ago

Add and test and close please! :)

comment:13 by kneufeld, 15 years ago

Resolution: acceptedfixed
Status: assignedclosed

committed a slightly modified version of this proposed plpgsql solution to trunk. r4836

Note: See TracTickets for help on using tickets.