26 ene. 2015

Calculating the Regression Coefficients in PCR

This is a simply exercise, indeed to use pls regression, we use this time "pcr" (principal components regression), in "R" we can use  this script:
One of the outputs we get from this calculation is mod3.pcr$coefficients, this output contains the regression coefficients, one for every wavelength. The coefficients change depending of the number of principal components we use. In this case we have chosen a maximum of 10, and we the cross validation we will decide the number of PCs to use.
When we calculate the PCs, we get from the original spectra matrix, two matrices: The score matrix (T) and the loading matrix (P) and they are related with the formula X=T.Pt + E.
We use the T matrix for the calculation of the regression coefficient in the PCR.
Here I do the steps to calculate the regression coefficients in R and I will check finally if they match with the coefficients calculated with the PCR function in the PLS package. I use for this exercise the shootout 2002 data as in previous posts.

Another output from the PCR function is mod3.pcr$scores, which is the T matrix.
This matrix has been calculated in another post:

It is easy to check that: T.pcr==T.pca  give TRUE for all.
Now we develop in R (with  the T matrix) this formula:

xhat=(Tt.T)-1. Tt.Y

T.pca<-X1.sg2dmsc_svd_T[,1:10]     (n.a) matrix
T.pca.t<-t(T.pca)                  (a.n) matrix
Tt.T<-T.pca.t%*%T.pca              (a.a) matrix
Tt.T.inv<-solve(Tt.T)              (a.a) matrix
Tt.T.inv.Tt<-Tt.T.inv%*%T.pca.t    (a.n) matrix
x.hat<-Tt.T.inv.Tt%*%Y             (a.1) matrix

and finally

reg.coef<-P.pca%*%x.hat            (k.1) matrix

We check if they match:

> pcr.coef.10[1:10,]
-1.010183  8.223292 14.806965 26.005365 33.214725 20.101344 14.456345 
 9.498212  4.154216 -5.061061 
> reg.coef[1:10,]
-1.010183  8.223292 14.806965 26.005365 33.214725 20.101344 14.456345
 9.498212  4.154216 -5.061061

No hay comentarios:

Publicar un comentario