Opened 8 years ago

Closed 8 years ago

# r.regression.line F-test incorrect

Reported by: Owned by: cmbarton grass-dev@… normal 7.0.0 Raster unspecified cmbarton Unspecified Unspecified

### Description

One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

```r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
y = a + b*x
a (Offset): -16.675093
b (Gain): 0.020833
R (sumXY - sumX*sumY/N): 0.481666
N (Number of elements): 2611107
F (F-test significance): -0.232002
meanX (Mean of map1): 1353.724982
sdX (Standard deviation of map1): 176.754565
meanY (Mean of map2): 11.527723
sdY (Standard deviation of map2): 7.645157

```

0.4816662 = 0.232002

I haven't checked, but this probably affects all versions of GRASS

### comment:1 in reply to:  description Changed 8 years ago by mlennert

One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

```r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT
y = a + b*x
a (Offset): -16.675093
b (Gain): 0.020833
R (sumXY - sumX*sumY/N): 0.481666
N (Number of elements): 2611107
F (F-test significance): -0.232002
meanX (Mean of map1): 1353.724982
sdX (Standard deviation of map1): 176.754565
meanY (Mean of map2): 11.527723
sdY (Standard deviation of map2): 7.645157

```

0.4816662 = 0.232002

The values only seem the same because of rounding.

However, the formula for calculating the statistic does not seem correct in the code. IIUC, instead of

```F = R * R / (1 - R * R / count - 2);
```

I think it should be

```F = R * R / (1 - R * R) / (count - 2);
```

but this should be checked by a statistician.

Moritz

### comment:2 follow-up:  3 Changed 8 years ago by cmbarton

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq? Regression /( SumSq? Errors / (count - 2))

### comment:3 in reply to:  2 ; follow-up:  4 Changed 8 years ago by mlennert

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq? Regression /( SumSq? Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq? Regression 1-R*R = SumSq? Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

Moritz

### comment:4 in reply to:  3 ; follow-up:  5 Changed 8 years ago by mmetz

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq? Regression /( SumSq? Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq? Regression 1-R*R = SumSq? Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

Markus M

### comment:5 in reply to:  4 Changed 8 years ago by mlennert

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

This is what I did and actually r.regression.line with the above change gives the same result as R, not r.regression.multi (regressing the landsat bands 10 and 20 from the NC data set):

F (R): 6268922 F (r.regression.line with change): 6268922.212939 F (r.regression.multi): 6268947.256273

I think I've found the problem in r.regression.multi:

```F = ((SStot - SSerr) * (count - n_predictors)) / (SSerr * n_predictors);
```

it should be

```F = ((SStot - SSerr) * (count - n_predictors - 1)) / (SSerr * n_predictors);
```

Moritz

### comment:6 follow-up:  7 Changed 8 years ago by mlennert

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Moritz

### comment:7 in reply to:  6 Changed 8 years ago by mmetz

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Your fix for r.regression.multi seems correct. Apparently I have only validated the F values for the predictors against R, not the global F. Fixed in r51906.

Markus M

### comment:8 follow-up:  9 Changed 8 years ago by cmbarton

Does this mean that is fixed in GRASS 7 too?

Michael

### comment:9 in reply to:  8 Changed 8 years ago by mlennert

Resolution: → fixed new → closed

Does this mean that is fixed in GRASS 7 too?

Yes, closing the bug.

Moritz