Opened 11 years ago

Last modified 9 years ago

#2035 new enhancement

v.kcv optimization

Reported by: mmetz Owned by: grass-dev@…
Priority: normal Milestone: 6.4.6
Component: Vector Version: svn-trunk
Keywords: v.kcv Cc:
CPU: Unspecified Platform: Unspecified

Description

Jan Ruzicka and Jan Vandrol:

in connection with attempts to rewrite some GRASS module with OpenMP we have rewrote the v.kcv module. The new version is much faster that the previous (no OpenMP is now needed - there is only one loop now with direct writing to the resulting layer). We believe that the new version of the module do the same work as original. We are not familiar with GRASS GIS development so the code is included in the attachment of this message. Hope that it could be useful.

Attachments (2)

v.kcv.patch (14.8 KB ) - added by mmetz 11 years ago.
v.kcv_G64.patch (14.9 KB ) - added by neteler 11 years ago.
Patch updated for keeping original input parameter name

Download all attachments as: .zip

Change History (11)

comment:1 by mmetz, 11 years ago

Some comments on the patch:

The patch makes sense to me and is indeed a substantial speed improval. However, partitions were confused with categories. The patch was written in C++ style, but the module is a C module. I made further changes to the patch: the attribute table of current vector is modified instead of creating a new vector, giving a further speed-up. The vector is opened without topology, that makes it easier for larger point clouds. Updating the database uses db_[begin|commit]_transaction(), giving a further speed-up.

Thanks for your contribution!

comment:2 by mmetz, 11 years ago

Any testing of v.kcv must ensure that there is no bias both in the point selection and the partition assignment. Thus a 2-step random selection process seems to provide the least biased results: 1. randomly select a point, 2. randomly assign a partition to this point. This is what v.kcv does in GRASS 6, but in a very inefficient way.

v.kcv has been optimized in trunk r57231.

Some testing with vectors in the North Carolina dataset:

Vector geonames_wake with 1088 points copied to the sqlite mapset:

g.copy vect=geonames_wake,my_geonames_wake

# g6
time v.kcv in=my_geonames_wake out=my_geonames_wake_kvc col=part k=10

real	2m27.108s
user	0m0.979s
sys	0m0.754s

# with db_[begin|commit]_transaction() added to local copy
real	0m1.931s
user	0m0.742s
sys	0m0.358s

# g7
v.kcv map=my_geonames_wake col=part k=10

real	0m0.625s
user	0m0.133s
sys	0m0.068s

Vector geodetic_pts with 29939 points copied to the sqlite mapset:

g.copy vect=geodetic_pts,my_geodetic_pts

# g6 with db_[begin|commit]_transaction() added to local copy
v.kcv in=my_geodetic_pts out=my_geodetic_pts_kvc col=part k=10

real	3m26.652s
user	2m52.859s
sys	0m30.714s

# g7
v.kcv map=my_geodetic_pts col=part k=10

real	0m4.813s
user	0m2.836s
sys	0m1.527s

In GRASS 7, about 70% of the processing time are used by the database driver sqlite for a vector with about 120 000 points (not feasible to process in GRASS 6). Any further optimization should thus look at the sqlite driver, or even sqlite itself instead of the v.kcv module.

For GRASS 6, I would recommend to add db_[begin|commit]_transaction().

by mmetz, 11 years ago

Attachment: v.kcv.patch added

comment:3 by mmetz, 11 years ago

The attached patch is for GRASS 6.4.3 for those interested.

in reply to:  3 comment:4 by neteler, 11 years ago

Milestone: 7.0.06.4.4

Replying to mmetz:

The attached patch is for GRASS 6.4.3 for those interested.

Backported to devbranch65 in r59417.

by neteler, 11 years ago

Attachment: v.kcv_G64.patch added

Patch updated for keeping original input parameter name

comment:5 by mlennert, 11 years ago

I strongly prefer the GRASS65 version of v.kcv, and so would support backporting it, with one serious remark though:

> g.gisenv
GISDBASE=/data/GRASS/DATA
LOCATION_NAME=nc_spm_08
MAPSET=user1

> v.info -c elev_lid792_randpts@PERMANENT
Displaying column types/names for database connection of layer 1:
INTEGER|cat
DOUBLE PRECISION|value

> v.kcv elev_lid792_randpts@PERMANENT k=10 col=partition
 100%
ERROR: Bug: attempt to update map which is not in current mapset

 > v.info -c elev_lid792_randpts@PERMANENT
Displaying column types/names for database connection of layer 1:
INTEGER|cat
DOUBLE PRECISION|value
INTEGER|partition

In other words: I have just updated the attribute table of a vector map that is not in the current mapset. That looks like a serious breach of our general rules here !

Trying the same in trunk, I get:

ERREUR :Vector map <elev_lid792_randpts> not found in current mapset

Moritz

in reply to:  5 comment:6 by neteler, 11 years ago

Replying to mlennert:

I strongly prefer the GRASS65 version of v.kcv, and so would support backporting it, with one serious remark though:

...

In other words: I have just updated the attribute table of a vector map that is not in the current mapset.

I have added a test as used in v.build: r60263.

Still you can override that :(

GRASS 6.5.svn (nc_spm_08):~ > v.kcv input=geonames_wake@PERMANENT column=part k=10
 100%
ERROR: Bug: attempt to update map which is not in current mapset

This appears to affect also at least v.build:

GRASS 6.5.svn (nc_spm_08):~ > g.gisenv  | grep MAPSET
MAPSET='neteler';
GRASS 6.5.svn (nc_spm_08):~ > v.build geonames_wake@PERMANENT 
Building topology for vector map <geonames_wake>...
Registering primitives...
1088 primitives registered
...

... hence not scope of this ticket.

comment:7 by neteler, 10 years ago

Milestone: 6.4.46.4.5

For the record: The attachment:v.kcv_G64.patch still awaits the 6.4 backport. OK to apply it?

comment:8 by martinl, 9 years ago

Milestone: 6.4.5

Ticket retargeted after milestone closed

comment:9 by martinl, 9 years ago

Milestone: 6.4.6
Note: See TracTickets for help on using tickets.