Opened 11 years ago

Closed 8 years ago

#2131 closed defect (wontfix)

Terrible performance from v.what.rast due to per-iteration db_execute

Reported by: hamish Owned by: grass-dev@…
Priority: major Milestone: 6.4.6
Component: Database Version: svn-develbranch6
Keywords: v.what.rast, db_execute, v.to.db Cc:
CPU: x86-64 Platform: Linux

Description

Hi,

I'm running v.what.rast for 175k query points in 6.x. It's taking a horribly long time. With debug at level 1 it shows that it gets done with the query processing and on to the "Updating db table" stage in less than 1 second. Over an *hour later* I'm still waiting for the dbf process, which is running at 99% cpu! This is a fast workstation too.

v.out.ascii's columns= option was suffering the same trouble last time I tried, to the point where it becomes unusable with more than ~ 10k vector points.

The v.colors, v.in.garmin, and v.in.gpsbabel scripts /used to/ suffer from the same thing, but we sped that up by writing all the sql commands to a temp file and then just running db.execute once. It seems that opening and closing the database has non-trivial overhead associated with it, and when you do that for every single cat it adds up in a pretty impressive way. Even if another DB backend is faster to start+write+stop, I doubt it would be more than ~20% different, max. It seems 100k points takes much much longer than just 10x the time for a 10k point vector map.

demo:

g.region rast=elevation
v.random out=test_100k_pts n=100000
v.db.addtable test_100k_pts column='cat integer, elev double'   #gets slow too!
time v.what.rast vect=test_100k_pts rast=elevation column=elev

My current workaround is to add a flag to v.what.rast to optionally print the result to stdout instead of writing it to a db column. (done locally, I'm still testing some other interpolation improvements so haven't committed anything yet) With that -p flag, the module takes 0.5 seconds to complete when stdout is redirected to /dev/null.

any thoughts on the idea to write the sql commands to a to tempfile or pipe, then run db_execute_immediate() just once for all of them?

(maybe the per-iteration bsearch() in the loop is inefficient too, but top shows that 'dbf' is the thing eating all the cpu time)

in trunk it takes about 6 seconds to complete the 100k random points, I'm not seeing anything obvious in the module changelog, so I guess something in the libraries got fixed? any hints?

thanks, Hamish

Change History (5)

in reply to:  description comment:1 by hamish, 11 years ago

Replying to hamish:

I'm running v.what.rast for 175k query points in 6.x. It's taking a horribly long time. With debug at level 1 it shows that it gets done with the query processing and on to the "Updating db table" stage in less than 1 second. Over an *hour later* I'm still waiting for the dbf process, which is running at 99% cpu! This is a fast workstation too.

...

in trunk it takes about 6 seconds to complete the 100k random points, I'm not seeing anything obvious in the module changelog, so I guess something in the libraries got fixed? any hints?

actually trunk is pretty bad too. v.db.addtable takes a couple of minutes to read the categories (the first half is reasonably quick, but then it slows down more and more), and then v.what.rast after running for 13 minutes is only 9% done. (with the new print instead of update DB flag it takes only 0.347s to complete)

any ideas?

thanks, Hamish

comment:2 by neteler, 11 years ago

Keywords: v.to.db added

The slow part appears to be v.to.db, so that needs to be optimized.

comment:3 by neteler, 9 years ago

Milestone: 6.4.46.4.6

comment:4 by mlennert, 8 years ago

Using the sqlite db backend, I get results in +/- 5s, so it seems to be specific to dbf.

I would plead to close this as wontfix as alternatives exist.

comment:5 by martinl, 8 years ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.