Opened 11 years ago
Last modified 9 years ago
#2035 new enhancement
v.kcv optimization
Reported by: | mmetz | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 6.4.6 |
Component: | Vector | Version: | svn-trunk |
Keywords: | v.kcv | Cc: | |
CPU: | Unspecified | Platform: | Unspecified |
Description
Jan Ruzicka and Jan Vandrol:
in connection with attempts to rewrite some GRASS module with OpenMP we have rewrote the v.kcv module. The new version is much faster that the previous (no OpenMP is now needed - there is only one loop now with direct writing to the resulting layer). We believe that the new version of the module do the same work as original. We are not familiar with GRASS GIS development so the code is included in the attachment of this message. Hope that it could be useful.
Attachments (2)
Change History (11)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Any testing of v.kcv must ensure that there is no bias both in the point selection and the partition assignment. Thus a 2-step random selection process seems to provide the least biased results: 1. randomly select a point, 2. randomly assign a partition to this point. This is what v.kcv does in GRASS 6, but in a very inefficient way.
v.kcv has been optimized in trunk r57231.
Some testing with vectors in the North Carolina dataset:
Vector geonames_wake with 1088 points copied to the sqlite mapset:
g.copy vect=geonames_wake,my_geonames_wake # g6 time v.kcv in=my_geonames_wake out=my_geonames_wake_kvc col=part k=10 real 2m27.108s user 0m0.979s sys 0m0.754s # with db_[begin|commit]_transaction() added to local copy real 0m1.931s user 0m0.742s sys 0m0.358s # g7 v.kcv map=my_geonames_wake col=part k=10 real 0m0.625s user 0m0.133s sys 0m0.068s
Vector geodetic_pts with 29939 points copied to the sqlite mapset:
g.copy vect=geodetic_pts,my_geodetic_pts # g6 with db_[begin|commit]_transaction() added to local copy v.kcv in=my_geodetic_pts out=my_geodetic_pts_kvc col=part k=10 real 3m26.652s user 2m52.859s sys 0m30.714s # g7 v.kcv map=my_geodetic_pts col=part k=10 real 0m4.813s user 0m2.836s sys 0m1.527s
In GRASS 7, about 70% of the processing time are used by the database driver sqlite for a vector with about 120 000 points (not feasible to process in GRASS 6). Any further optimization should thus look at the sqlite driver, or even sqlite itself instead of the v.kcv module.
For GRASS 6, I would recommend to add db_[begin|commit]_transaction().
by , 11 years ago
Attachment: | v.kcv.patch added |
---|
follow-up: 4 comment:3 by , 11 years ago
The attached patch is for GRASS 6.4.3 for those interested.
comment:4 by , 11 years ago
Milestone: | 7.0.0 → 6.4.4 |
---|
by , 11 years ago
Attachment: | v.kcv_G64.patch added |
---|
Patch updated for keeping original input parameter name
follow-up: 6 comment:5 by , 11 years ago
I strongly prefer the GRASS65 version of v.kcv, and so would support backporting it, with one serious remark though:
> g.gisenv GISDBASE=/data/GRASS/DATA LOCATION_NAME=nc_spm_08 MAPSET=user1 > v.info -c elev_lid792_randpts@PERMANENT Displaying column types/names for database connection of layer 1: INTEGER|cat DOUBLE PRECISION|value > v.kcv elev_lid792_randpts@PERMANENT k=10 col=partition 100% ERROR: Bug: attempt to update map which is not in current mapset > v.info -c elev_lid792_randpts@PERMANENT Displaying column types/names for database connection of layer 1: INTEGER|cat DOUBLE PRECISION|value INTEGER|partition
In other words: I have just updated the attribute table of a vector map that is not in the current mapset. That looks like a serious breach of our general rules here !
Trying the same in trunk, I get:
ERREUR :Vector map <elev_lid792_randpts> not found in current mapset
Moritz
comment:6 by , 11 years ago
Replying to mlennert:
I strongly prefer the GRASS65 version of v.kcv, and so would support backporting it, with one serious remark though:
...
In other words: I have just updated the attribute table of a vector map that is not in the current mapset.
I have added a test as used in v.build: r60263.
Still you can override that :(
GRASS 6.5.svn (nc_spm_08):~ > v.kcv input=geonames_wake@PERMANENT column=part k=10 100% ERROR: Bug: attempt to update map which is not in current mapset
This appears to affect also at least v.build:
GRASS 6.5.svn (nc_spm_08):~ > g.gisenv | grep MAPSET MAPSET='neteler'; GRASS 6.5.svn (nc_spm_08):~ > v.build geonames_wake@PERMANENT Building topology for vector map <geonames_wake>... Registering primitives... 1088 primitives registered ...
... hence not scope of this ticket.
comment:7 by , 10 years ago
Milestone: | 6.4.4 → 6.4.5 |
---|
For the record: The attachment:v.kcv_G64.patch still awaits the 6.4 backport. OK to apply it?
comment:9 by , 9 years ago
Milestone: | → 6.4.6 |
---|
Some comments on the patch:
The patch makes sense to me and is indeed a substantial speed improval. However, partitions were confused with categories. The patch was written in C++ style, but the module is a C module. I made further changes to the patch: the attribute table of current vector is modified instead of creating a new vector, giving a further speed-up. The vector is opened without topology, that makes it easier for larger point clouds. Updating the database uses db_[begin|commit]_transaction(), giving a further speed-up.
Thanks for your contribution!