Opened 11 years ago

Closed 6 years ago

## #576 closed defect (fixed)

# i.pca fails to center data prior to analysis

Reported by: | nikos | Owned by: | |
---|---|---|---|

Priority: | normal | Milestone: | 6.4.4 |

Component: | Raster | Version: | svn-develbranch6 |

Keywords: | i.pca, data centering, prcomp(), R, eigenvectors | Cc: | nikos.alexandris@… |

CPU: | Unspecified | Platform: | Unspecified |

### Description

I have spotted one case where *i.pca* does not work as expected. I have a set of 3 MODIS surface reflectance bands. Performing PCA on those using *i.pca* **does not center** the data before the analysis, that is, the mean of each dimension (band) is not subtracted from the dimension itself to give a dataset that has zero mean which is an integral part of the solution to PCA.

- i.pca on the _raw_ bands gives the following Eigenvalues + Eigenvectors:
PC1 6307563.04 (-0.6353,-0.6485,-0.4192) [98.71%] PC2 78023.63 (-0.7124, 0.2828, 0.6422) [ 1.22%] PC3 4504.60 (-0.2979, 0.7067,-0.6417) [ 0.07%]

- Using the same data with the
*prcomp(x, center=TRUE, scale=FALSE)*function in R, which centers the dataset by default anyway if not told otherwise, gives different results:

PC1 PC2 PC3 mod07_b2 0.4372107 0.83099407 -0.3439413 mod07_b6 0.7210155 -0.09527873 0.6863371 mod07_b7 0.5375718 -0.54806096 -0.6408165

**Note:** the output of *prcomp()* delivers the Principal Components column-wise, while *i.pca* delivers them row-wise.

- Further checking reveals that centering the data manually in grass, e.g. using
r.mapcalc "mod_band_centered = mod_band - mean(mod_band)"

gives (almost) the same results as the *prcomp()* function with the parameter *center=TRUE* (example above). The numbers talk for themself:

PC1 270343.07 (-0.4403,-0.7222,-0.5335) [79.11%] PC2 67140.50 (-0.8275, 0.0957, 0.5533) [19.65%] PC3 4258.14 ( 0.3485,-0.6851, 0.6397) [ 1.25%]

The question is what causes *i.pca*, in this specific case, not to center the dataset?

Thanks, Nikos

**Sources:**

The data are available at: grass location with MODIS bands and MODIS bands as geotiff files
More details in the archive: Testing i.pca (continued...)
and in grass-wiki: Principal Component Analysis

### Change History (4)

### comment:1 follow-up: 2 Changed 8 years ago by

### comment:2 Changed 8 years ago by

Replying to mmetz:

I get identical results with i.pca in trunk r49090 if I create a MASK first, thus excluding any NULL cells (some cells are NULL in one band but not NULL in another band).

Calculating a MASK in order to get results identical to *prcomp(x, center=TRUE, scale=FALSE)* is no longer needed as of r49092.

Markus M

### comment:3 Changed 6 years ago by

Milestone: | 6.5.0 → 6.4.4 |
---|

### comment:4 Changed 6 years ago by

Resolution: | → fixed |
---|---|

Status: | new → closed |

I confirm that this works fine now (GRASS 6.4.4svn (2014) / Revision: 50937), ie:

`i.pca input=mod07_b2,mod07_b6,mod07_b7 rescale=0,0 out=pca_mod_b267_2014`

gives

PC1 778244.03 (-0.4372,-0.7210,-0.5376) [79.20%] PC2 192494.58 (-0.8310, 0.0953, 0.5481) [19.59%] PC3 11876.45 ( 0.3439,-0.6863, 0.6408) [ 1.21%]

which is essentially identical to R's equivalent action via

`prcomp(mod07_b267, center=TRUE, scale=FALSE)`

which in turn gives

Standard deviations: [1] 882.1814 438.7420 108.9791 Rotation: PC1 PC2 PC3 mod07_b2 0.4372107 0.83099407 -0.3439413 mod07_b6 0.7210155 -0.09527873 0.6863371 mod07_b7 0.5375718 -0.54806096 -0.6408165

Just for clarity/completeness, R reports: 1) column-wise and b) the "square roots of the eigenvalues", which in this case match GRASS' i.pca reported eigenvalues. Some R code:

> ev <- c ( 778244.03, 192494.58, 11876.45) # GRASS-reported eigenvalues > sapply(ev, sqrt) [1] 882.1814 438.7420 108.9791

**Note:**See TracTickets for help on using tickets.

Replying to nikos:

[...]

I get identical results with i.pca in trunk r49090 if I create a MASK first, thus excluding any NULL cells (some cells are NULL in one band but not NULL in another band).

`i.pca`

gives nowwhich is identical to the above output of

prcomp(x, center=TRUE, scale=FALSE)Markus M