Ticket #475 (new defect)

Opened 5 years ago

Last modified 5 years ago

r.stats: last bin always has a single cell

Reported by: hamish Owned by: grass-dev@…
Priority: major Milestone: 6.4.0
Component: Raster Version: svn-develbranch6
Keywords: r.stats Cc:
Platform: All CPU: All

Description

Hi, r.stats seems to be doing something funny: the last bin always contains only one cell. Same result with GRASS 5.4.1 - 6.5svn on Linux, OSX, Solaris.

speafish:

G65> g.region rast=elevation.10m
G65> r.stats elevation.10m nsteps=10 -c | nl
 100%
     1  1061.064087-1139.632019 243490
     2  1139.632019-1218.199951 732066
     3  1218.199951-1296.767883 399213
     4  1296.767883-1375.335815 351601
     5  1375.335815-1453.903748 315246
     6  1453.903748-1532.47168 265487
     7  1532.47168-1611.039612 216796
     8  1611.039612-1689.607544 115175
     9  1689.607544-1768.175476 15727
    10  1768.175476-1846.743408 1

G65> r.mapcalc 'bin10 = if(elevation.10m > 1768.175476, 1, null() )'
G65> r.univar bin10 | grep '^n'
n: 12344

other bin counts don't line up either:

G65> r.mapcalc 'bin2 = if(elevation.10m > 1139.632019 && elevation.10m < 1218.199951, 1, null() )'
G65> r.univar bin2 | grep '^n'                                                           
n: 645823

G65> r.mapcalc 'bin9 = if(elevation.10m > 1689.607544 && elevation.10m < 1768.175476, 1, null() )'
G65> r.univar bin9 | grep '^n'                                                           
n: 87599

all-in-one bin gives correct result:

r.stats elevation.10m nsteps=1 -c     
 100%
1061.064087-1846.743408 2654802

r.univar elevation.10m | grep '^n'
n: 2654802

# but,
r.stats elevation.10m nsteps=2 -c
 100%
1061.064087-1453.903748 2654801
1453.903748-1846.743408 1

sum total seems to be ok, just distribution does not match reported range:

r.stats elevation.10m nsteps=5 -c
 100%
1061.064087-1218.199951 1090464
1218.199951-1375.335815 817890
1375.335815-1532.47168 573165
1532.47168-1689.607544 173282
1689.607544-1846.743408 1

echo $((1090464 + 817890 + 573165 + 173282 + 1))
2654802

thanks to Doc Robinson for reporting this.

(I have a vague memory that this was also reported years ago in the mailing list archives??)

thanks, Hamish

Change History

in reply to: ↑ description   Changed 5 years ago by hamish

Replying to hamish:

Hi, r.stats seems to be doing something funny: the last bin always contains only one cell. Same result with GRASS 5.4.1 - 6.5svn on Linux, OSX, Solaris.

apparently in raster/r.stats/stats.c update_cell_stats(), but then I get lost as I'm not really trained to deal with hash tables.

Interestingly if I change the resolution to misalign I only get the first output step:

G65> g.region rast=elevation.dem
G65> r.stats elevation.10m nsteps=2 -c
 100%
1061.064087-1453.903748 294978

Hamish

  Changed 5 years ago by hamish

I see the same bug in 5.0.0, so it probably dates back to the initial FP conversion.

H

follow-up: ↓ 4   Changed 5 years ago by hamish

request for help solving this critical bug: we are reporting bad statistical data from our core stats module. It is used by r.report, d.histogram, etc and on to users' work output.

I'm horrified by this bug but unfamiliar with hash tables and don't know how to proceed with debugging.

thanks, Hamish

in reply to: ↑ 3   Changed 5 years ago by glynn

Replying to hamish:

request for help solving this critical bug: we are reporting bad statistical data from our core stats module. It is used by r.report, d.histogram, etc and on to users' work output. I'm horrified by this bug but unfamiliar with hash tables and don't know how to proceed with debugging.

I think that you're looking in the wrong place.

This appears to be a problem with the quantisation semantics.

In your test case, r.stats calls

G_quant_add_rule(&q, 1061.064087, 1846.743408, 1, 10)

It then assigns the quantisation rules to the input map.

Cell values are obtained using G_get_c_raster_row(), which converts the map's FP values to integer CELL values via G_quant_get_cell_value(). Given the above quantisation rule, it converts FP values as follows:

> print G_quant_get_cell_value(&q, 1846.743408)
$15 = 10
> print G_quant_get_cell_value(&q, 1846.743407)
$17 = 9
> print G_quant_get_cell_value(&q, 1846.743409)
$18 = -2147483648

Essentially, G_quant_add_rule() treats the ranges as half-open, i.e. the values range from low (inclusive) to high (exclusive). While half-open ranges are a common concept (e.g. floor() behaves the same way), the range of a GRASS raster is closed, i.e. both the low and high values are inclusive.

I suggest changing r.stats to use cHigh=nsteps+1, and truncate the quantised values into range (fudging dHigh may seem simpler, but the maximum value may be positive, negative or zero, and the range may be 1e+300 or 1e-300, so choosing a suitable fudge factor isn't straightforward).

  Changed 5 years ago by hamish

I've applied a fix for this bug in devbr6 r37699, and trunk r37700.

please review/test.

One thing I notice is that for multiple input maps of mixed types the numbers for the * second map's ranges are a bit weird + change with the patch.

e.g.:

#spearfish
#   elevation.dem is CELL, elevation.10m is DCELL
g.region rast=elevation.10m
g.region n=n+10

# same before and after patch
G65> r.stats elevation.dem nsteps=5 -c  | tail -n 11
 100%
1831 108
1832 117
1833 108
1834 90
1835 72
1836 63
1837 36
1838 9
1839 27
1840 36
* 25848
# before patch
G65> r.stats elevation.10m nsteps=5 -c
 100%
1061.064087-1218.199951 1090464
1218.199951-1375.335815 817890
1375.335815-1532.47168 573165
1532.47168-1689.607544 173282
1689.607544-1846.743408 1

G65> r.stats elevation.dem,elevation.10m nsteps=5 -c  | tail -n 11
 100%
1836 1532.47168-1689.607544 63
1837 1532.47168-1689.607544 35
1837 1689.607544-1846.743408 1
1838 1532.47168-1689.607544 9
1839 1532.47168-1689.607544 27
1840 1532.47168-1689.607544 36
* 1061.064087-1218.199951 10968
* 1218.199951-1375.335815 2496
* 1375.335815-1532.47168 10395
* 1532.47168-1689.607544 90
* * 1899

note 2x cat 1837 in above..(?) [they add up ok]

# after patch
G65> r.stats elevation.10m nsteps=5 -c
 100%
1061.064087-1218.199951 847076
1218.199951-1375.335815 730296
1375.335815-1532.47168 565685
1532.47168-1689.607544 411802
1689.607544-1846.743408 99943
* 1899

G65> r.stats elevation.dem,elevation.10m nsteps=5 -c  | tail -n 11
 100%
1835 1689.607544-1846.743408 72
1836 1689.607544-1846.743408 63
1837 1689.607544-1846.743408 36
1838 1689.607544-1846.743408 9
1839 1689.607544-1846.743408 27
1840 1689.607544-1846.743408 36
* 1061.064087-1218.199951 10090
* 1218.199951-1375.335815 881
* 1375.335815-1532.47168 8634
* 1532.47168-1689.607544 4344
* * 1899

Hamish

  Changed 5 years ago by hamish

Index: stats.c
===================================================================
--- stats.c     (revision 37700)
+++ stats.c     (working copy)
@@ -98,8 +98,8 @@
 void fix_max_fp_val(CELL *cell, int ncols)
 {
     while (ncols-- > 0) {
-       if (cell[ncols] > nsteps)
-           cell[ncols] = nsteps;
+       if (cell[ncols] > (CELL)nsteps)
+           cell[ncols] = (CELL)nsteps;
            /* { G_debug(5, ". resetting %d to %d\n", cell[ncols], nsteps); } */
     }
     return;

?

  Changed 5 years ago by hamish

  • priority changed from critical to major

downgrading severity because the main reported bug is fixed.

the weird output for multiple input maps of mixed types still remains so not closing it.

Hamish

Note: See TracTickets for help on using tickets.