#359 closed defect (worksforme)
r.quantile computes incorrect values with default number of bins
Reported by: | dylan | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | 6.5.0 |
Component: | Default | Version: | svn-develbranch6 |
Keywords: | r.quantile | Cc: | |
CPU: | x86-32 | Platform: | Linux |
Description
I noticed that when computing quantiles from a highly skewed distribution, incorrect values are returned with the default number of bins. Reducing the number of bins seems to correct the problem
Example, based on attached raster file in GRASS ASCII format:
r.quantile in=beam_150 quantiles=5 0:20.000000:7440.040527 1:40.000000:7512.872559 2:60.000000:7611.160645 3:80.000000:7611.161133 r.quantile in=beam_150 quantiles=5 bins=10') 0:20.000000:7440.040527 1:40.000000:7512.872559 2:60.000000:7570.587402 3:80.000000:7611.161133
Quantiles computed in R:
library(spgrass6) x <- readRAST6('beam_150') quantile(x@data$beam_150, prob=c(0,0.2,0.4,0.6,0.8,1), na.rm=TRUE) 0% 20% 40% 60% 80% 100% 6429.886 7440.040 7512.871 7570.580 7611.161 7638.599
Change History (7)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Replying to dylan:
I noticed that when computing quantiles from a highly skewed distribution, incorrect values are returned with the default number of bins. Reducing the number of bins seems to correct the problem
Example, based on attached raster file in GRASS ASCII format:
[snip]
I cannot reproduce this. I always get the second set of values regardless of the number of bins used:
$ for n in 1 2 3 4 5 6 7 8 9 10 100 1000 10000 100000 1000000 ; do r.quantile beam_150 quantiles=5 bins=$n > quantile.$n.txt ; done $ md5sum quantile.*.txt 2486fb3ae6eca4988acf21de10636a45 quantile.1.txt 2486fb3ae6eca4988acf21de10636a45 quantile.10.txt 2486fb3ae6eca4988acf21de10636a45 quantile.100.txt 2486fb3ae6eca4988acf21de10636a45 quantile.1000.txt 2486fb3ae6eca4988acf21de10636a45 quantile.10000.txt 2486fb3ae6eca4988acf21de10636a45 quantile.100000.txt 2486fb3ae6eca4988acf21de10636a45 quantile.1000000.txt 2486fb3ae6eca4988acf21de10636a45 quantile.2.txt 2486fb3ae6eca4988acf21de10636a45 quantile.3.txt 2486fb3ae6eca4988acf21de10636a45 quantile.4.txt 2486fb3ae6eca4988acf21de10636a45 quantile.5.txt 2486fb3ae6eca4988acf21de10636a45 quantile.6.txt 2486fb3ae6eca4988acf21de10636a45 quantile.7.txt 2486fb3ae6eca4988acf21de10636a45 quantile.8.txt 2486fb3ae6eca4988acf21de10636a45 quantile.9.txt $ cat quantile.1.txt 0:20.000000:7440.040527 1:40.000000:7512.872559 2:60.000000:7570.587402 3:80.000000:7611.161133
comment:4 by , 15 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
No answer, seems to be unreproducible. Feel free to reopen.
comment:5 by , 15 years ago
I'd be very cautious about ignoring this one and assuming that it magically healed itself. Silent data corruption is rather nasty when you base other work on it.
The URL test dataset is no longer active, so I can't help test.
I wonder if "g.region rast=inputmap" might help though.
Hamish
comment:6 by , 15 years ago
Milestone: | 6.4.0 → 6.5.0 |
---|
Some updates: Looks like things are working now... with a similar dataset I get the following:
in GRASS:
r.quantile in=rtk_hybrid_beam_0150 quantiles=5 Computing histogram 100% Computing bins Binning data 100% Sorting bins 100% Computing quantiles 0:20.000000:7096.507812 1:40.000000:7291.034180 2:60.000000:7412.145020 3:80.000000:7514.283691
in R:
library(spgrass6) x <- readRAST6('rtk_hybrid_beam_0150') quantile(x@data, prob=c(0.2,0.4,0.6,0.8), na.rm=TRUE) 20% 40% 60% 80% 7096.507 7291.034 7412.144 7514.284
Seems close enough for me. The test file is attached.
comment:7 by , 15 years ago
Example referenced above can be found here: http://169.237.35.250/~dylan/temp/rtk_hybrid_beam_0150.asc.gz
Oops. Looks like the attachment is too big. It can be accessed here: http://169.237.35.250/~dylan/temp/beam_150.gz