Terrible performance from v.what.rast due to per-iteration db_execute
|Reported by:||hamish||Owned by:|
|Keywords:||v.what.rast, db_execute, v.to.db||Cc:|
I'm running v.what.rast for 175k query points in 6.x. It's taking a horribly long time. With debug at level 1 it shows that it gets done with the query processing and on to the "Updating db table" stage in less than 1 second. Over an *hour later* I'm still waiting for the dbf process, which is running at 99% cpu! This is a fast workstation too.
v.out.ascii's columns= option was suffering the same trouble last time I tried, to the point where it becomes unusable with more than ~ 10k vector points.
The v.colors, v.in.garmin, and v.in.gpsbabel scripts /used to/ suffer from the same thing, but we sped that up by writing all the sql commands to a temp file and then just running db.execute once. It seems that opening and closing the database has non-trivial overhead associated with it, and when you do that for every single cat it adds up in a pretty impressive way. Even if another DB backend is faster to start+write+stop, I doubt it would be more than ~20% different, max. It seems 100k points takes much much longer than just 10x the time for a 10k point vector map.
g.region rast=elevation v.random out=test_100k_pts n=100000 v.db.addtable test_100k_pts column='cat integer, elev double' #gets slow too! time v.what.rast vect=test_100k_pts rast=elevation column=elev
My current workaround is to add a flag to v.what.rast to optionally print the result to stdout instead of writing it to a db column. (done locally, I'm still testing some other interpolation improvements so haven't committed anything yet) With that -p flag, the module takes 0.5 seconds to complete when stdout is redirected to /dev/null.
any thoughts on the idea to write the sql commands to a to tempfile or pipe, then run db_execute_immediate() just once for all of them?
(maybe the per-iteration bsearch() in the loop is inefficient too, but
shows that 'dbf' is the thing eating all the cpu time)
in trunk it takes about 6 seconds to complete the 100k random points, I'm not seeing anything obvious in the module changelog, so I guess something in the libraries got fixed? any hints?