id summary reporter owner description type status priority milestone component version resolution keywords cc cpu platform 542 grass7 vector libraries modifications mmetz grass-dev@… "I want to suggest some more profound changes to the vector model for grass7. These changes would affect topology, spatial index and maybe category index, but not the coor file. That means that there will be limited forward/backward compatibility: topology would need to be rebuilt before vectors can be accessed. Vector modules would not need to be rewritten, but more efficient library functions could be made available. My general idea/complaint is that the current topology layout is not tailored towards vector object types; instead several (very) different types (points, lines, boundaries, centroids, faces, kernels) are stored in the same structure. Working with one particular type is a bit inefficient because the desired type has to be selected out of everything stored in this universal structure every single time. I am sure that a lot of time and space can be safed with a redesigned topology layout and vector libraries that make use of it. As an example, what I want to get rid of is {{{ for (line = 0; line < nlines; line++) { if (!Vect_line_alive(Map, line)) continue; type = Vect_read_line(map, points, cats, line); if (!(type & otype)) continue; /* process line */ } }}} The whole coor file is read, in the worst case e.g. just to get the few centroids in it. This can not always be avoided or changed, but could often be replaced with e.g. {{{ for (centroid = 1; centroid < ncentroids; centroid++) { /* process centroid */ } }}} The current implementation has some consequences of which I am not sure if they are actually desired. E.g. when cleaning a vector with tool=snap (snapping vertices of lines and boundaries), lines and boundaries may be snapped together at the same time: a boundary may be snapped to a line and vice versa. Maybe this is sometimes desired, but maybe this should be avoided? Another example is removing duplicates: currently it is possible to do that for points and centroids together, and if there are a point and a centroid with identical coordinates, one of them is deleted (random selection). With the changes I have in mind, the size of support structures should generally go down, most for point datasets, least for areas. Massive point datasets like LIDAR could be easier processed on level 2 with topology, because support structures for massive point datasets would be reduced in size by about 70% (rough estimates: spatial index reduced down to 25%, topology reduced down to 40%). There are however some problems with my suggestions: 1) IMHO nobody should decide on that alone, 2) the coding is too much for one person alone, e.g. I can't do all that without help, 3) I'm not really a programmer, 4) I don't know enough about vector geometry algorithms. Below are more technical details: == Status quo == the coor file holds lines (better: primitives) of types[[BR]] point[[BR]] line[[BR]] boundary[[BR]] centroid[[BR]] face (3D boundary, not yet implemented)[[BR]] kernel (3D centroid, not yet implemented)[[BR]] structures derived from these types are[[BR]] nodes[[BR]] areas[[BR]] isles[[BR]] edges (3D areas, not yet implemented)[[BR]] volumes (3D shapes, not yet implemented)[[BR]] holes (3D volumes within volumes, like isles in areas, not yet implemented) topology holds information about[[BR]] nodes[[BR]] lines[[BR]] areas[[BR]] isles where lines can be points, lines, boundaries, centroids, faces, or kernels see [http://trac.osgeo.org/grass/browser/grass/trunk/include/vect/dig_structs.h#L440] points, lines, boundaries, centroids, faces, kernels are obviously different things, but the current topology layout squeezes all of them into the same structure with information about: start node (assigned for all types, but not needed for points, centroids, kernels)[[BR]] end node (used for lines and boundaries, otherwise unused)[[BR]] area to left (for boundary, area for centroid, unused for all other types)[[BR]] area to right (for boundary, unused for all other types)[[BR]] 3D bounding box (completely redundant for points, centroids, kernels)[[BR]] offset (into coor file)[[BR]] type (point, line, boundary, centroid, face, or kernel) == Proposed new layout == the coor file would hold the same types as before. To avoid confusion, all coordinate strings would be referred to as primitives (like in the output of current v.build), but that's just naming. IMHO anything but line is fine. A line can be a line or boundary or point or ... is too philosophical for my taste. topology would have a separate data structure for each of[[BR]] points[[BR]] lines[[BR]] boundaries[[BR]] nodes (only needed for lines, boundaries, and faces)[[BR]] centroids[[BR]] areas[[BR]] isles[[BR]] faces[[BR]] edges[[BR]] volumes[[BR]] holes An additional small data structure would be needed that would be a boiled down replacement of current P_Line with information about primitives. Similarly, a separate spatial index would be created for each type separately, instead of lumping all points, lines, boundaries, centroids, faces, and kernels into the same spatial index. It is more efficient with regard to time and space if separate spatial indices are maintained. I'm reaching limits on what I can change in the vector libs without breaking compatibility, and I'm sometimes getting frustrated with the waste of time and space for large vectors. IIUR grass7 is an opportunity to introduce changes like these, so I hope to initiate a discussion and for more ideas on how to improve grass vector handling. Regards, Markus M " enhancement closed major 8.0.0 Vector svn-trunk fixed martinl All All