Opened 4 years ago

Closed 4 years ago

#2466 closed defect (fixed)

r.param.scale slow on Windows when there is no data in raster

Reported by: annakrat Owned by: grass-dev@…
Priority: normal Milestone: 7.0.0
Component: Raster Version: unspecified
Keywords: r.param.scale, nan Cc:
CPU: Unspecified Platform: MSWindows 8

Description

When I run r.param.scale on Windows with raster map with a lot of no data, the computation is significantly slower (90 s on Ubuntu, 50 min on Windows). r.param.scale doesn't propagate nulls. I read somewhere that computing with nan can be slower on some architectures, could this be the reason?

I attached a patch to propagate nans. I also realized that at one point during preprocessing, there are large numbers computed (especially with big cell size and high window size), so I thought double would be more appropriate.

If there are no objection, I would commit it to test it on Windows.

Attachments (1)

rparamscale.diff (1.1 KB) - added by annakrat 4 years ago.
r.param.scale patch

Download all attachments as: .zip

Change History (6)

Changed 4 years ago by annakrat

Attachment: rparamscale.diff added

r.param.scale patch

comment:1 in reply to:  description ; Changed 4 years ago by glynn

Replying to annakrat:

When I run r.param.scale on Windows with raster map with a lot of no data, the computation is significantly slower (90 s on Ubuntu, 50 min on Windows). r.param.scale doesn't propagate nulls. I read somewhere that computing with nan can be slower on some architectures, could this be the reason?

Normal floating-point operations will be handled directly by the CPU, without the OS getting involved. But it's possible that converting NaN to integer raises an exception which must be handled by the OS.

Aside from performance, it's not guaranteed that converting a NaN to an integer will produce the integer null value (-231), although that happens to be the case on Linux/x86-64.

Code which converts potentially-null values needs to handle nulls explicitly, e.g.

double d = ...;
int x;

if (Rast_is_d_null_value(&d))
    Rast_set_c_null_value(&x, 1);
else
    x = (CELL)floor(d);

Also, conversion should use normally use floor, ceil or rounding (floor(x+0.5)). A C cast rounds toward zero (i.e. downward for positive values, upward for negative values), which is rarely the correct choice.

I attached a patch to propagate nans. I also realized that at one point during preprocessing, there are large numbers computed (especially with big cell size and high window size), so I thought double would be more appropriate.

Are these values ever converted to integer? If so, conversion of out-of-range values needs to be handled explicitly.

comment:2 in reply to:  1 ; Changed 4 years ago by annakrat

Replying to glynn:

Replying to annakrat:

When I run r.param.scale on Windows with raster map with a lot of no data, the computation is significantly slower (90 s on Ubuntu, 50 min on Windows). r.param.scale doesn't propagate nulls. I read somewhere that computing with nan can be slower on some architectures, could this be the reason?

Normal floating-point operations will be handled directly by the CPU, without the OS getting involved. But it's possible that converting NaN to integer raises an exception which must be handled by the OS.

Aside from performance, it's not guaranteed that converting a NaN to an integer will produce the integer null value (-231), although that happens to be the case on Linux/x86-64.

Code which converts potentially-null values needs to handle nulls explicitly, e.g.

double d = ...;
int x;

if (Rast_is_d_null_value(&d))
    Rast_set_c_null_value(&x, 1);
else
    x = (CELL)floor(d);

Also, conversion should use normally use floor, ceil or rounding (floor(x+0.5)). A C cast rounds toward zero (i.e. downward for positive values, upward for negative values), which is rarely the correct choice.

It's not clear to me, what the algorithm should do when there is a null value at one of the cells in the search window. It could just skip the entire computation and write null in the central cell. This would be probably the most correct option but it results in null edges. Or we could assign 0 values but then the result on edges would be distorted by the zeros. Which approach should we implement?

I attached a patch to propagate nans. I also realized that at one point during preprocessing, there are large numbers computed (especially with big cell size and high window size), so I thought double would be more appropriate.

Are these values ever converted to integer? If so, conversion of out-of-range values needs to be handled explicitly.

I don't think so.

comment:3 in reply to:  2 ; Changed 4 years ago by glynn

Replying to annakrat:

It's not clear to me, what the algorithm should do when there is a null value at one of the cells in the search window. It could just skip the entire computation and write null in the central cell. This would be probably the most correct option but it results in null edges. Or we could assign 0 values but then the result on edges would be distorted by the zeros. Which approach should we implement?

AFAICT, you either need to accept that nulls will propagate (i.e. if any cell in the window is null, the result will be null), or replace the least-squares fitting algorithm with something which can handle nulls.

FWIW, that module dates back to at least GRASS 4.x, when neither nulls nor floating-point existed.

comment:4 in reply to:  3 ; Changed 4 years ago by annakrat

Replying to glynn:

AFAICT, you either need to accept that nulls will propagate (i.e. if any cell in the window is null, the result will be null), or replace the least-squares fitting algorithm with something which can handle nulls.

I changed it so that nulls propagate (r62648). I will test on Windows if there will be any speed difference.

comment:5 in reply to:  4 Changed 4 years ago by annakrat

Resolution: fixed
Status: newclosed

Replying to annakrat:

Replying to glynn:

AFAICT, you either need to accept that nulls will propagate (i.e. if any cell in the window is null, the result will be null), or replace the least-squares fitting algorithm with something which can handle nulls.

I changed it so that nulls propagate (r62648). I will test on Windows if there will be any speed difference.

Tested on Windows, it is running fast now. Backported in r62692. Closing ticket.

Note: See TracTickets for help on using tickets.