Opened 8 years ago

Last modified 5 years ago

#3069 new enhancement

Evaluate branches in r.mapcalc if-function only when needed

Reported by: wenzeslaus Owned by: grass-dev@…
Priority: normal Milestone: 7.6.2
Component: Raster Version: svn-trunk
Keywords: r.mapcalc, r3.mapcalc, optimization, expression, evaluation, if-statement, if-function Cc:
CPU: Unspecified Platform: Unspecified

Description

The r.mapcalc expression x = if(0, rand(0, 2), 3) takes about the same time to evaluate as x = rand(0, 2) + if(0, 2, 3), although the true-branch of the if-statement is never used. As far as I understand, this is given by fact that if is actually a function in r.mapcalc, so all the arguments are evaluated before the function is called.

It would be a nice enhancement if the values of if-function arguments would be computed only when needed, i.e. x = if(0, rand(0, 2), 3) would take same time as x = if(0, 1, 3) because rand(0, 2) is never used. This would allow for some optimizations when part of the output raster is already determined and there is no need to perform the computation, for example a moving window calculation which needs to be performed only when the central cell has a certain value. Another case when this would be advantageous is when both branches (arguments) are costly to compute, currently both are computed and one thrown away.

Here is the minimal example:

Error: Failed to load processor bash
No macro or processor named 'bash' found

In both case, I get something like 14.5 s and for the expression with trivial unused branch (x = 2 + if(0, 2, 3)) I get less than 6 s.

Change History (7)

in reply to:  description comment:1 by glynn, 8 years ago

Replying to wenzeslaus:

The r.mapcalc expression x = if(0, rand(0, 2), 3) takes about the same time to evaluate as x = rand(0, 2) + if(0, 2, 3), although the true-branch of the if-statement is never used. As far as I understand, this is given by fact that if is actually a function in r.mapcalc, so all the arguments are evaluated before the function is called.

It's largely a consequence of r.mapcalc using vectorised (SIMD) evaluation, similar to e.g. std::valarray in C++, NumPy, or GPU-oriented languages such as GLSL.

The "value" of any expression is a row buffer, an array containing a value for each column. Each function takes row buffers as arguments and populates the row buffer for the result.

The reason for this approach is to reduce the overhead of the interpreter, as the interpreter only need to be run per-row rather than per-cell.

As it stands, conditional evaluation would require first evaluating the condition expression, then analysing the row buffer holding the condition to determine whether a specific case (zero or non-zero for the 2-argument and 3-argument versions, negative, zero or positive for the 4-argument version) doesn't occur in any column.

This could be optimised by analysing the expression to identify sub-expressions which are constant for any given row. This includes literals, and function calls where all arguments are constant and the function itself doesn't "inject" non-constant values (specifically, rand(), col() and x()). Ideally, variables would also need to be flagged as holding constant-per-row values.

In the case of variables, there's also the issue that a given variable may be used within multiple conditionals where the conditions are different. The variable would need to be evaluated for each column where its value is needed for at least one usage (i.e. the union of all of the conditions). That would require either a) associating a validity mask with each variable (indicating the columns for which the value has been calculated), b) performing the evaluation independently for each conditional (typically resulting in redundant evaluation), or c) simply evaluating the variable for all columns in all but the most straightfoward cases.

Needless to say, this isn't a simple change. It isn't one which I have time to work on.

comment:2 by martinl, 8 years ago

Milestone: 7.3.07.4.0

Milestone renamed

comment:3 by neteler, 6 years ago

Milestone: 7.4.07.4.1

Ticket retargeted after milestone closed

comment:4 by neteler, 6 years ago

Milestone: 7.4.17.4.2

comment:5 by martinl, 6 years ago

Milestone: 7.4.27.6.0

All enhancement tickets should be assigned to 7.6 milestone.

comment:6 by martinl, 5 years ago

Milestone: 7.6.07.6.1

Ticket retargeted after milestone closed

comment:7 by martinl, 5 years ago

Milestone: 7.6.17.6.2

Ticket retargeted after milestone closed

Note: See TracTickets for help on using tickets.