#1668 closed defect (fixed)
r.regression.line F-test incorrect
Reported by: | cmbarton | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | 7.0.0 |
Component: | Raster | Version: | unspecified |
Keywords: | Cc: | cmbarton | |
CPU: | Unspecified | Platform: | Unspecified |
Description
One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).
For example, from the Spearfish demo data,
r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT y = a + b*x a (Offset): -16.675093 b (Gain): 0.020833 R (sumXY - sumX*sumY/N): 0.481666 N (Number of elements): 2611107 F (F-test significance): -0.232002 meanX (Mean of map1): 1353.724982 sdX (Standard deviation of map1): 176.754565 meanY (Mean of map2): 11.527723 sdY (Standard deviation of map2): 7.645157
0.4816662 = 0.232002
I haven't checked, but this probably affects all versions of GRASS
Change History (10)
comment:1 by , 12 years ago
follow-up: 3 comment:2 by , 12 years ago
These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.
http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))
follow-up: 4 comment:3 by , 12 years ago
Replying to cmbarton:
These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.
http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))
Yup, sorry, in my proposal another set of parentheses was missing. It should be:
F = R * R / ((1 - R * R) / (count - 2));
where
R*R = SumSq Regression 1-R*R = SumSq Errors
which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.
Again, I can commit this, but would like to have someone more versed in statistics confirm.
Moritz
follow-up: 5 comment:4 by , 12 years ago
Replying to mlennert:
Replying to cmbarton:
These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.
http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))
Yup, sorry, in my proposal another set of parentheses was missing. It should be:
F = R * R / ((1 - R * R) / (count - 2));
where
R*R = SumSq Regression 1-R*R = SumSq Errors
which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.
Again, I can commit this, but would like to have someone more versed in statistics confirm.
You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.
Markus M
comment:5 by , 12 years ago
Replying to mmetz:
You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.
This is what I did and actually r.regression.line with the above change gives the same result as R, not r.regression.multi (regressing the landsat bands 10 and 20 from the NC data set):
F (R): 6268922 F (r.regression.line with change): 6268922.212939 F (r.regression.multi): 6268947.256273
I think I've found the problem in r.regression.multi:
Instead of
F = ((SStot - SSerr) * (count - n_predictors)) / (SSerr * n_predictors);
it should be
F = ((SStot - SSerr) * (count - n_predictors - 1)) / (SSerr * n_predictors);
Moritz
follow-up: 7 comment:6 by , 12 years ago
I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.
Moritz
comment:7 by , 12 years ago
Replying to mlennert:
I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.
Your fix for r.regression.multi seems correct. Apparently I have only validated the F values for the predictors against R, not the global F. Fixed in r51906.
Markus M
comment:9 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Replying to cmbarton:
The values only seem the same because of rounding.
However, the formula for calculating the statistic does not seem correct in the code. IIUC, instead of
I think it should be
but this should be checked by a statistician.
Moritz