Opened 12 years ago

Closed 12 years ago

#873 closed enhancement (fixed)

r.univar: allow percentile= to be doubles

Reported by: hamish Owned by: grass-dev@…
Priority: major Milestone: 6.5.0
Component: Raster Version: svn-develbranch6
Keywords: r.univar, stats Cc:
CPU: All Platform: All

Description

Hi,

for r.univar it is useful to have sub-integer percentile output, e.g. for calculating confidence intervals/sigma at 68.2689%, 95.45%, and 99.73%. Currently r.univar only accepts whole integers for the percentile=. Actually it's worse: it accepts floats on the command line but they quietly get truncated into integers.(!)

see http://article.gmane.org/gmane.comp.gis.grass.user/20744

currently r.colors.stddev is hit by this and probably anyone doing stats on map values.

the attached patch (vs. 6.5svn) attempts to fix this. please review. R-bridge users please compare with R's answers.

#spearfish
g.region rast=elevation.10m
r.univar elevation.10m perc=68,68.2689,95,95.45,99,99.73 -e

Hamish

Attachments (1)

runiv_perc_dbl.diff (4.6 KB) - added by hamish 12 years ago.
patch to allow r.univar to use percentile=double

Download all attachments as: .zip

Change History (5)

Changed 12 years ago by hamish

Attachment: runiv_perc_dbl.diff added

patch to allow r.univar to use percentile=double

comment:1 in reply to:  description ; Changed 12 years ago by glynn

Replying to hamish:

for r.univar it is useful to have sub-integer percentile output, e.g. for calculating confidence intervals/sigma at 68.2689%, 95.45%, and 99.73%. Currently r.univar only accepts whole integers for the percentile=. Actually it's worse: it accepts floats on the command line but they quietly get truncated into integers.(!)

See http://article.gmane.org/gmane.comp.gis.grass.user/20744

Currently r.colors.stddev is hit by this and probably anyone doing stats on map values.

Note that r.quantile supports non-integer percentiles, works with large maps, and is significantly more efficient.

comment:2 in reply to:  1 Changed 12 years ago by mmetz

Replying to glynn:

Note that r.quantile supports non-integer percentiles, works with large maps, and is significantly more efficient.

IMHO, it would still be nice if r.univar supports floating point percentiles. The r.univar manual could mention that r.quantile is the better choice if only quantiles and/or percentiles are requested, e.g. under NOTES where it already says that r.univar can use large amounts of system memory for extended statistics. I would suggest to also add r.quantile to the SEE ALSO section.

Markus M

comment:3 Changed 12 years ago by hamish

patch committed to 6.5svn with r40939. (will port to trunk soon)

R-bridge users please compare with R's answers.

I'd still like verification before considering if this should go in 6.4 or not. I'd settle for the list of commands to test it myself.

mmetz:

The r.univar manual could mention that r.quantile is the better choice if only quantiles and/or percentiles are requested

added in the patch.

Hamish

comment:4 Changed 12 years ago by hamish

Resolution: fixed
Status: newclosed

backported to relbr64 with r41377.

Tested in 6.5svn against R's quantile(). mean residuals less than 3mm for Spearfish's elevation.10m.

test method follows.

GRASS 6.5svn:

#spearfish
g.region rast=elevation.10m
Q=`seq -s, 0 5 100`

r.univar -e elevation.10m percentile=$Q -g
n=2654802
null_cells=0
cells=2654802
min=1061.064087
max=1846.743408
range=785.679321
mean=1348.37144603811
mean_of_abs=1348.37144603811
stddev=175.494309916948
variance=30798.2528132259
coeff_var=13.0152793158443
sum=3579659211.6848597527
first_quartile=1196.8
median=1309.37
third_quartile=1480.29
percentile_0=1061.06
percentile_5=1123.09
percentile_10=1153.04
percentile_15=1171.19
percentile_20=1184.32
percentile_25=1196.8
percentile_30=1210.82
percentile_35=1229.45
percentile_40=1251.94
percentile_45=1280.39
percentile_50=1309.37
percentile_55=1346.4
percentile_60=1378.71
percentile_65=1410.04
percentile_70=1440.23
percentile_75=1480.29
percentile_80=1525.85
percentile_85=1568.51
percentile_90=1613.6
percentile_95=1671.23
percentile_100=1846.74


Q=`seq -s, 0 0.1 1`
r.univar -e elevation.10m percentile=$Q -g
n=2654802
null_cells=0
cells=2654802
min=1061.064087
max=1846.743408
range=785.679321
mean=1348.37144603811
mean_of_abs=1348.37144603811
stddev=175.494309916948
variance=30798.2528132259
coeff_var=13.0152793158443
sum=3579659211.6848597527
first_quartile=1196.8
median=1309.37
third_quartile=1480.29
percentile_0=  1061.06
percentile_0_1=1073.82
percentile_0_2=1078.85
percentile_0_3=1083.27
percentile_0_4=1085.54
percentile_0_5=1087.8
percentile_0_6=1090.37
percentile_0_7=1092.09
percentile_0_8=1093.77
percentile_0_9=1095.2
percentile_1=  1096.48

R 2.7.1:

library(spgrass6)
G <- gmeta6()
spear <- rast.get6("elevation.10m", cat = FALSE, ignore.stderr = TRUE)

summary(spear)

Q5 <- quantile(spear$elevation.10m, probs=seq(0,1,0.05), na.rm=TRUE, names=TRUE, type=8)
cat(Q5, sep="\n")
1061.064
1123.092
1153.037
1171.192
1184.320
1196.800
1210.821
1229.449
1251.937
1280.389
1309.368
1346.401
1378.709
1410.040
1440.232
1480.286
1525.850
1568.510
1613.603
1671.230
1846.743


Q1 <- quantile(spear$elevation.10m, probs=seq(0,0.01,0.001), na.rm=TRUE, names=TRUE, type=8)
cat(Q1, sep="\n")
1061.064
1073.824
1078.849
1083.274
1085.538
1087.803
1090.373
1092.091
1093.775
1095.205
1096.479

Octave/Matlab?:

Q5_G = [ ...
1061.06
1123.09
1153.04
1171.19
1184.32
1196.8
1210.82
1229.45
1251.94
1280.39
1309.37
1346.4
1378.71
1410.04
1440.23
1480.29
1525.85
1568.51
1613.6
1671.23
1846.74 ];

Q5_R = [ ...
1061.064
1123.092
1153.037
1171.192
1184.320
1196.800
1210.821
1229.449
1251.937
1280.389
1309.368
1346.401
1378.709
1410.040
1440.232
1480.286
1525.850
1568.510
1613.603
1671.230
1846.743 ];


Q1_G = [ ...
1061.06
1073.82
1078.85
1083.27
1085.54
1087.8
1090.37
1092.09
1093.77
1095.2
1096.48 ];

Q1_R = [ ...
1061.064
1073.824
1078.849
1083.274
1085.538
1087.803
1090.373
1092.091
1093.775
1095.205
1096.479 ];

%summarize residuals
>> mean ( abs(Q5_G - Q5_R) )
ans =  0.001 meter
>> std( abs(Q5_G - Q5_R) )
ans = 0.0014

>> mean ( abs(Q1_G - Q1_R) )
ans =  0.003 meter
>> std( abs(Q1_G - Q1_R) )
ans = 0.0015

Hamish

Note: See TracTickets for help on using tickets.