## #1668 closed defect (fixed)

# r.regression.line F-test incorrect

Reported by: | cmbarton | Owned by: | |
---|---|---|---|

Priority: | normal | Milestone: | 7.0.0 |

Component: | Raster | Version: | unspecified |

Keywords: | Cc: | cmbarton | |

CPU: | Unspecified | Platform: | Unspecified |

## Description

One of my students noticed that the "F-test" in r.regression.line does not seem to be calculating F, but instead calculating -(R squared).

For example, from the Spearfish demo data,

r.regression.line map1=elevation.dem@PERMANENT map2=slope@PERMANENT y = a + b*x a (Offset): -16.675093 b (Gain): 0.020833 R (sumXY - sumX*sumY/N): 0.481666 N (Number of elements): 2611107 F (F-test significance): -0.232002 meanX (Mean of map1): 1353.724982 sdX (Standard deviation of map1): 176.754565 meanY (Mean of map2): 11.527723 sdY (Standard deviation of map2): 7.645157

0.481666^{2 = 0.232002
}

I haven't checked, but this probably affects all versions of GRASS

### Change History (10)

### comment:1 by , 11 years ago

### follow-up: 3 comment:2 by , 11 years ago

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

### follow-up: 4 comment:3 by , 11 years ago

Replying to cmbarton:

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq Regression 1-R*R = SumSq Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

Moritz

### follow-up: 5 comment:4 by , 11 years ago

Replying to mlennert:

Replying to cmbarton:

These DO give VERY different values. For slope vs. elevation in my Spearfish example, the current equation gives a value of -0.2320019531, while the revised equation gives a value of 1.02410689008565E-006.

http://www.weibull.com/DOEWeb/hypothesis_tests_in_simple_linear_regression.htm defines F = SumSq Regression /( SumSq Errors / (count - 2))

Yup, sorry, in my proposal another set of parentheses was missing. It should be:

F = R * R / ((1 - R * R) / (count - 2));

where

R*R = SumSq Regression 1-R*R = SumSq Errors

which in your example gives you a value for F = 788781, i.e. a probability that there is a relationship so close to one that most software will probably just give you 1 after rounding.

Again, I can commit this, but would like to have someone more versed in statistics confirm.

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

Markus M

### comment:5 by , 11 years ago

Replying to mmetz:

You can compare to the grass7 addon r.regression.multi whose results are identical to those of R.

This is what I did and actually r.regression.line with the above change gives the same result as R, not r.regression.multi (regressing the landsat bands 10 and 20 from the NC data set):

F (R): 6268922 F (r.regression.line with change): 6268922.212939 F (r.regression.multi): 6268947.256273

I think I've found the problem in r.regression.multi:

Instead of

F = ((SStot - SSerr) * (count - n_predictors)) / (SSerr * n_predictors);

it should be

F = ((SStot - SSerr) * (count - n_predictors - 1)) / (SSerr * n_predictors);

Moritz

### follow-up: 7 comment:6 by , 11 years ago

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Moritz

### comment:7 by , 11 years ago

Replying to mlennert:

I've just committed a fix for r.regression.line to trunk, grass65 and grass64_release. I'll leave it up to Markus to decide whether my fix for r.regression.multi is the right one.

Your fix for r.regression.multi seems correct. Apparently I have only validated the F values for the predictors against R, not the global F. Fixed in r51906.

Markus M

### comment:9 by , 11 years ago

Resolution: | → fixed |
---|---|

Status: | new → closed |

**Note:**See TracTickets for help on using tickets.

Replying to cmbarton:

The values only seem the same because of rounding.

However, the formula for calculating the statistic does not seem correct in the code. IIUC, instead of

I think it should be

but this should be checked by a statistician.

Moritz