Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#359 closed defect (worksforme)

r.quantile computes incorrect values with default number of bins

Reported by: dylan Owned by: grass-dev@…
Priority: major Milestone: 6.5.0
Component: Default Version: svn-develbranch6
Keywords: r.quantile Cc:
CPU: x86-32 Platform: Linux

Description

I noticed that when computing quantiles from a highly skewed distribution, incorrect values are returned with the default number of bins. Reducing the number of bins seems to correct the problem

Example, based on attached raster file in GRASS ASCII format:

r.quantile in=beam_150 quantiles=5

0:20.000000:7440.040527
1:40.000000:7512.872559
2:60.000000:7611.160645
3:80.000000:7611.161133

r.quantile in=beam_150 quantiles=5 bins=10')

0:20.000000:7440.040527
1:40.000000:7512.872559
2:60.000000:7570.587402
3:80.000000:7611.161133

Quantiles computed in R:

library(spgrass6)

x <- readRAST6('beam_150')

quantile(x@data$beam_150, prob=c(0,0.2,0.4,0.6,0.8,1), na.rm=TRUE)

      0%      20%      40%      60%      80%     100% 
6429.886 7440.040 7512.871 7570.580 7611.161 7638.599

Change History (7)

comment:1 Changed 11 years ago by dylan

Oops. Looks like the attachment is too big. It can be accessed here: http://169.237.35.250/~dylan/temp/beam_150.gz

comment:2 in reply to:  description Changed 11 years ago by glynn

Replying to dylan:

I noticed that when computing quantiles from a highly skewed distribution, incorrect values are returned with the default number of bins. Reducing the number of bins seems to correct the problem

Example, based on attached raster file in GRASS ASCII format:

[snip]

I cannot reproduce this. I always get the second set of values regardless of the number of bins used:

$ for n in 1 2 3 4 5 6 7 8 9 10 100 1000 10000 100000 1000000 ; do r.quantile beam_150 quantiles=5 bins=$n > quantile.$n.txt ; done
$ md5sum quantile.*.txt
2486fb3ae6eca4988acf21de10636a45  quantile.1.txt
2486fb3ae6eca4988acf21de10636a45  quantile.10.txt
2486fb3ae6eca4988acf21de10636a45  quantile.100.txt
2486fb3ae6eca4988acf21de10636a45  quantile.1000.txt
2486fb3ae6eca4988acf21de10636a45  quantile.10000.txt
2486fb3ae6eca4988acf21de10636a45  quantile.100000.txt
2486fb3ae6eca4988acf21de10636a45  quantile.1000000.txt
2486fb3ae6eca4988acf21de10636a45  quantile.2.txt
2486fb3ae6eca4988acf21de10636a45  quantile.3.txt
2486fb3ae6eca4988acf21de10636a45  quantile.4.txt
2486fb3ae6eca4988acf21de10636a45  quantile.5.txt
2486fb3ae6eca4988acf21de10636a45  quantile.6.txt
2486fb3ae6eca4988acf21de10636a45  quantile.7.txt
2486fb3ae6eca4988acf21de10636a45  quantile.8.txt
2486fb3ae6eca4988acf21de10636a45  quantile.9.txt
$ cat quantile.1.txt
0:20.000000:7440.040527
1:40.000000:7512.872559
2:60.000000:7570.587402
3:80.000000:7611.161133

comment:3 Changed 11 years ago by neteler

Dylan, can we close this?

Markus

comment:4 Changed 10 years ago by neteler

Resolution: worksforme
Status: newclosed

No answer, seems to be unreproducible. Feel free to reopen.

comment:5 Changed 10 years ago by hamish

I'd be very cautious about ignoring this one and assuming that it magically healed itself. Silent data corruption is rather nasty when you base other work on it.

The URL test dataset is no longer active, so I can't help test.

I wonder if "g.region rast=inputmap" might help though.

Hamish

comment:6 Changed 10 years ago by dylan

Milestone: 6.4.06.5.0

Some updates: Looks like things are working now... with a similar dataset I get the following:

in GRASS:

r.quantile in=rtk_hybrid_beam_0150 quantiles=5
Computing histogram
 100%
Computing bins
Binning data
 100%
Sorting bins
 100%
Computing quantiles
0:20.000000:7096.507812
1:40.000000:7291.034180
2:60.000000:7412.145020
3:80.000000:7514.283691

in R:

library(spgrass6)
x <- readRAST6('rtk_hybrid_beam_0150')

quantile(x@data, prob=c(0.2,0.4,0.6,0.8), na.rm=TRUE)
     20%      40%      60%      80% 
7096.507 7291.034 7412.144 7514.284 

Seems close enough for me. The test file is attached.

comment:7 Changed 10 years ago by dylan

Example referenced above can be found here: http://169.237.35.250/~dylan/temp/rtk_hybrid_beam_0150.asc.gz

Note: See TracTickets for help on using tickets.