Opened 15 years ago
Closed 14 years ago
#873 closed enhancement (fixed)
r.univar: allow percentile= to be doubles
| Reported by: | hamish | Owned by: | |
|---|---|---|---|
| Priority: | major | Milestone: | 6.5.0 |
| Component: | Raster | Version: | svn-develbranch6 |
| Keywords: | r.univar, stats | Cc: | |
| CPU: | All | Platform: | All |
Description
Hi,
for r.univar it is useful to have sub-integer percentile output, e.g. for calculating confidence intervals/sigma at 68.2689%, 95.45%, and 99.73%. Currently r.univar only accepts whole integers for the percentile=. Actually it's worse: it accepts floats on the command line but they quietly get truncated into integers.(!)
see http://article.gmane.org/gmane.comp.gis.grass.user/20744
currently r.colors.stddev is hit by this and probably anyone doing stats on map values.
the attached patch (vs. 6.5svn) attempts to fix this. please review. R-bridge users please compare with R's answers.
#spearfish g.region rast=elevation.10m r.univar elevation.10m perc=68,68.2689,95,95.45,99,99.73 -e
Hamish
Attachments (1)
Change History (5)
by , 15 years ago
| Attachment: | runiv_perc_dbl.diff added |
|---|
follow-up: 2 comment:1 by , 15 years ago
Replying to hamish:
for
r.univarit is useful to have sub-integer percentile output, e.g. for calculating confidence intervals/sigma at 68.2689%, 95.45%, and 99.73%. Currentlyr.univaronly accepts whole integers for the percentile=. Actually it's worse: it accepts floats on the command line but they quietly get truncated into integers.(!)
See http://article.gmane.org/gmane.comp.gis.grass.user/20744
Currently
r.colors.stddevis hit by this and probably anyone doing stats on map values.
Note that r.quantile supports non-integer percentiles, works with large maps, and is significantly more efficient.
comment:2 by , 15 years ago
Replying to glynn:
Note that
r.quantilesupports non-integer percentiles, works with large maps, and is significantly more efficient.
IMHO, it would still be nice if r.univar supports floating point percentiles. The r.univar manual could mention that r.quantile is the better choice if only quantiles and/or percentiles are requested, e.g. under NOTES where it already says that r.univar can use large amounts of system memory for extended statistics. I would suggest to also add r.quantile to the SEE ALSO section.
Markus M
comment:3 by , 15 years ago
patch committed to 6.5svn with r40939. (will port to trunk soon)
R-bridge users please compare with R's answers.
I'd still like verification before considering if this should go in 6.4 or not. I'd settle for the list of commands to test it myself.
mmetz:
The r.univar manual could mention that r.quantile is the better choice if only quantiles and/or percentiles are requested
added in the patch.
Hamish
comment:4 by , 14 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
backported to relbr64 with r41377.
Tested in 6.5svn against R's quantile(). mean residuals less than 3mm for Spearfish's elevation.10m.
test method follows.
GRASS 6.5svn:
#spearfish g.region rast=elevation.10m Q=`seq -s, 0 5 100` r.univar -e elevation.10m percentile=$Q -g n=2654802 null_cells=0 cells=2654802 min=1061.064087 max=1846.743408 range=785.679321 mean=1348.37144603811 mean_of_abs=1348.37144603811 stddev=175.494309916948 variance=30798.2528132259 coeff_var=13.0152793158443 sum=3579659211.6848597527 first_quartile=1196.8 median=1309.37 third_quartile=1480.29 percentile_0=1061.06 percentile_5=1123.09 percentile_10=1153.04 percentile_15=1171.19 percentile_20=1184.32 percentile_25=1196.8 percentile_30=1210.82 percentile_35=1229.45 percentile_40=1251.94 percentile_45=1280.39 percentile_50=1309.37 percentile_55=1346.4 percentile_60=1378.71 percentile_65=1410.04 percentile_70=1440.23 percentile_75=1480.29 percentile_80=1525.85 percentile_85=1568.51 percentile_90=1613.6 percentile_95=1671.23 percentile_100=1846.74 Q=`seq -s, 0 0.1 1` r.univar -e elevation.10m percentile=$Q -g n=2654802 null_cells=0 cells=2654802 min=1061.064087 max=1846.743408 range=785.679321 mean=1348.37144603811 mean_of_abs=1348.37144603811 stddev=175.494309916948 variance=30798.2528132259 coeff_var=13.0152793158443 sum=3579659211.6848597527 first_quartile=1196.8 median=1309.37 third_quartile=1480.29 percentile_0= 1061.06 percentile_0_1=1073.82 percentile_0_2=1078.85 percentile_0_3=1083.27 percentile_0_4=1085.54 percentile_0_5=1087.8 percentile_0_6=1090.37 percentile_0_7=1092.09 percentile_0_8=1093.77 percentile_0_9=1095.2 percentile_1= 1096.48
R 2.7.1:
library(spgrass6)
G <- gmeta6()
spear <- rast.get6("elevation.10m", cat = FALSE, ignore.stderr = TRUE)
summary(spear)
Q5 <- quantile(spear$elevation.10m, probs=seq(0,1,0.05), na.rm=TRUE, names=TRUE, type=8)
cat(Q5, sep="\n")
1061.064
1123.092
1153.037
1171.192
1184.320
1196.800
1210.821
1229.449
1251.937
1280.389
1309.368
1346.401
1378.709
1410.040
1440.232
1480.286
1525.850
1568.510
1613.603
1671.230
1846.743
Q1 <- quantile(spear$elevation.10m, probs=seq(0,0.01,0.001), na.rm=TRUE, names=TRUE, type=8)
cat(Q1, sep="\n")
1061.064
1073.824
1078.849
1083.274
1085.538
1087.803
1090.373
1092.091
1093.775
1095.205
1096.479
Q5_G = [ ... 1061.06 1123.09 1153.04 1171.19 1184.32 1196.8 1210.82 1229.45 1251.94 1280.39 1309.37 1346.4 1378.71 1410.04 1440.23 1480.29 1525.85 1568.51 1613.6 1671.23 1846.74 ]; Q5_R = [ ... 1061.064 1123.092 1153.037 1171.192 1184.320 1196.800 1210.821 1229.449 1251.937 1280.389 1309.368 1346.401 1378.709 1410.040 1440.232 1480.286 1525.850 1568.510 1613.603 1671.230 1846.743 ]; Q1_G = [ ... 1061.06 1073.82 1078.85 1083.27 1085.54 1087.8 1090.37 1092.09 1093.77 1095.2 1096.48 ]; Q1_R = [ ... 1061.064 1073.824 1078.849 1083.274 1085.538 1087.803 1090.373 1092.091 1093.775 1095.205 1096.479 ]; %summarize residuals >> mean ( abs(Q5_G - Q5_R) ) ans = 0.001 meter >> std( abs(Q5_G - Q5_R) ) ans = 0.0014 >> mean ( abs(Q1_G - Q1_R) ) ans = 0.003 meter >> std( abs(Q1_G - Q1_R) ) ans = 0.0015
Hamish

patch to allow r.univar to use percentile=double