Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#1668 closed defect (fixed)

r.regression.line F-test incorrect

Reported by: cmbarton Owned by: grass-dev@…
Priority: normal Milestone: 7.0.0
Component: Raster Version: unspecified
Keywords: Cc: cmbarton
CPU: Unspecified Platform: Unspecified

Description

One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
y = a + b*x
   a (Offset): -16.675093
   b (Gain): 0.020833
   R (sumXY - sumX*sumY/N): 0.481666
   N (Number of elements): 2611107
   F (F-test significance): -0.232002
   meanX (Mean of map1): 1353.724982
   sdX (Standard deviation of map1): 176.754565
   meanY (Mean of map2): 11.527723
   sdY (Standard deviation of map2): 7.645157

0.4816662 = 0.232002

I haven't checked, but this probably affects all versions of GRASS

Change History (10)

in reply to:  description comment:1 by mlennert, 12 years ago

Replying to cmbarton:

One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
y = a + b*x
   a (Offset): -16.675093
   b (Gain): 0.020833
   R (sumXY - sumX*sumY/N): 0.481666
   N (Number of elements): 2611107
   F (F-test significance): -0.232002
   meanX (Mean of map1): 1353.724982
   sdX (Standard deviation of map1): 176.754565
   meanY (Mean of map2): 11.527723
   sdY (Standard deviation of map2): 7.645157

0.4816662 = 0.232002

The values only seem the same because of rounding.

However, the formula for calculating the statistic does not seem correct in the code. IIUC, instead of

F = R * R / (1 - R * R / count - 2);

I think it should be

F = R * R / (1 - R * R) / (count - 2);

but this should be checked by a statistician.

Moritz

comment:2 by cmbarton, 12 years ago

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

in reply to:  2 ; comment:3 by mlennert, 12 years ago

Replying to cmbarton:

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq Regression 1-R*R = SumSq Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

Moritz

in reply to:  3 ; comment:4 by mmetz, 12 years ago

Replying to mlennert:

Replying to cmbarton:

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq Regression 1-R*R = SumSq Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

Markus M

in reply to:  4 comment:5 by mlennert, 12 years ago

Replying to mmetz:

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

This is what I did and actually r.regression.line with the above change gives the same result as R, not r.regression.multi (regressing the landsat bands 10 and 20 from the NC data set):

F (R): 6268922 F (r.regression.line with change): 6268922.212939 F (r.regression.multi): 6268947.256273

I think I've found the problem in r.regression.multi:

Instead of

F = ((SStot - SSerr) * (count - n_predictors)) / (SSerr * n_predictors);

it should be

F = ((SStot - SSerr) * (count - n_predictors - 1)) / (SSerr * n_predictors);

Moritz

comment:6 by mlennert, 12 years ago

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Moritz

in reply to:  6 comment:7 by mmetz, 12 years ago

Replying to mlennert:

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Your fix for r.regression.multi seems correct. Apparently I have only validated the F values for the predictors against R, not the global F. Fixed in r51906.

Markus M

comment:8 by cmbarton, 12 years ago

Does this mean that is fixed in GRASS 7 too?

Michael

in reply to:  8 comment:9 by mlennert, 12 years ago

Resolution: fixed
Status: newclosed

Replying to cmbarton:

Does this mean that is fixed in GRASS 7 too?

Yes, closing the bug.

Moritz

comment:10 by cmbarton, 12 years ago

Cc: cmbarton added

Thanks much

Michael

Note: See TracTickets for help on using tickets.