Opened 16 years ago
Closed 15 years ago
#76 closed enhancement (fixed)
ST_DumpPoints - I think we said we would add to TODO
Reported by: | robe | Owned by: | robe |
---|---|---|---|
Priority: | medium | Milestone: | PostGIS 1.5.0 |
Component: | postgis | Version: | master |
Keywords: | Cc: |
Description (last modified by )
As described here http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003142.html
Though I would keep with the path, geom geom_dump structure we have for ST_Dump and ST_DumpRings. Though the path would be rarely used, I think there are times it comes in handy to reference back to a geometry.
Attachments (2)
Change History (15)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
I always thought so. You would definitely need to group points back with the specific Polygon/Line they came from and that is why the path is probably even more so important for ST_DumpPoints
by , 15 years ago
Attachment: | my_st_dump_points.sql added |
---|
A proprosed implemenation of ST_DumpPoints by Maxime van Noppen (maxime@…)
comment:4 by , 15 years ago
Description: | modified (diff) |
---|
I just attached an implementation of ST_DumpPoints by Maxime van Noppen (maxime@…) recently posted on postgis-users (http://postgis.refractions.net/pipermail/postgis-users/2009-August/024118.html)
At first glance it looks promising. Would a C implementation really be noticeably faster?
Thoughts, team?
comment:5 by , 15 years ago
Yes it would be noticeably faster. Didn't we already prove that with 1000 fold geometries with ST_Dump is faster than ST_GeometryN. I really want a C implementation.
comment:6 by , 15 years ago
Ah yes, you're right. I forgot that little detail. I was thinking in the back of my mind scalability might be an issue and now I remember why.
It'll still be a nice addition for plpgsql collection on the PostGIS wiki though.
comment:7 by , 15 years ago
Yap though we need to separate out into two functions (I see pg 8.4 sections and pg 8.3 or less alternatives — we should separate it out better)
follow-up: 9 comment:8 by , 15 years ago
Relevant NOTE form Yabo: On 08/05/09 14:47, Maxime van Noppen wrote:
Thanks. However Robe (cf ticket) and you are right, there is a scalability issue. For the sake of the test I've implemented the ST_Dump function in plpgsql with a recursive call (as my_ST_DumpPoints is) and in some very bad cases it's very slow. For example I've tested to dump a geometry containing 100 nested GEOMETRYCOLLECTION containing each 10 multipolygons. ST_Dump runs in 35ms whereas my_ST_Dump runs in 230ms (sorry if I'm stating obvious things here, I'm quite new to the postgis world).
On the same example (100 nested collections) my_ST_DumpPoints runs in ~2 seconds which is quite alot. I'll try to come up with a C version, but it seems quite more complicated.
Hm, the test case is the other way : 10 nested geometry collections of 100 multipolygons each.
— yabo
—-MY RESPONSE TO Above Yabo — if its any help. Its not so much that C is a better language than plpgsql. Its not really. Its just the interaction between PostgreSQL and the PostGIS c code that is the main problem. I think the issue is that each call to ST_GeometryN etc. incurs a memcopy penalty where as when done in C (as is the case with ST_Dump — the same geometry object is reused).
If you replace your ST_GeometryN calls with ST_Dump — you'd probably also get much better performance but would run into performance issues when you get to many points in a single geom.
Actually if we solved that more annoying issue — we could probably get way better performance with plpgsql/sql functions which are far easier to write than C functions.
comment:9 by , 15 years ago
Replying to robe:
Yabo — if its any help. Its not so much that C is a better language than plpgsql. Its not really. Its just the interaction between PostgreSQL and the PostGIS c code that is the main problem. I think the issue is that each call to ST_GeometryN etc. incurs a memcopy penalty where as when done in C (as is the case with ST_Dump — the same geometry object is reused).
I see. So it could be easier to develop some low-level C functions that allows more direct access to geometries ? Would it be possible to write a GeometryN function that doesn't copy the geometry ?
If you replace your ST_GeometryN calls with ST_Dump — you'd probably also get much better performance but would run into performance issues when you get to many points in a single geom.
Very nice improvement indeed ! Though it limits the depth to 32 (as defined in the source code of ST_Dump) the performance improvement is quite nice :
31 nested collections of 100 multipolygons each : with ST_GeometryN : 3700 ms with ST_Dump : 550 ms
I've then tested to dump the points of a linestring containing 10 000 points and it takes ~ 3 seconds. It doesn't seem to scale linearly as with 50 000 points it takes up to ~ 73 seconds…
However I've attached the new code using ST_Dump rather than ST_GeometryN.
Actually if we solved that more annoying issue — we could probably get way better performance with plpgsql/sql functions which are far easier to write than C functions.
It would be great.
by , 15 years ago
Attachment: | my_st_dump_points.2.sql added |
---|
ST_DumpPoints implementation in plpgsql using ST_Dump
comment:10 by , 15 years ago
This ticket is a year old now and the functionality has again been requested on -users.
I recommend that we add the attached plpgsql solution to PostGIS and close this ticket. A new ticket can be added to document the function, another to improve/rewrite it to C, and maybe a third to add regression tests.
I would much rather have a slow, but functional, method in PostGIS, than a really fast / robust non-existent function in C.
comment:11 by , 15 years ago
+1 I guess enough people have asked for this that a slower implementation is better than none at all. And since its not trivially obvious and something we want eventually in our code base.
FWIW: Nicklas started working on a C implementation. He has all the recursive logic done, and I had promised to try to get this from the recursive like model he has (which does just write statements for debugging at the moment), into the stack push model employed by the other ST_Dump functions. I'm still not convinced a stack model is the best given that most people think in terms of recursion rather than stacks (stacks is so assembler) though the stack model is probably more efficient for this.
Anyrate I apologize for letting this slide. I don't think I'll have time to look into it before 1.5 release time especially given my deficient C skills and the fact I'm over extended on at least 5 other projects with fairly firm approaching deadlines.
comment:13 by , 15 years ago
Resolution: | accepted → fixed |
---|---|
Status: | assigned → closed |
committed a slightly modified version of this proposed plpgsql solution to trunk. r4836
Wow. An actual use case for the 'path' part of ST_DumpRings. http://postgis.refractions.net/pipermail/postgis-users/2009-February/022674.html
I guess it would good to have a similar datatype for ST_DumpPoints.
— Kevin