This is
a simply exercise, indeed to use pls regression, we use this time "pcr" (principal
components regression), in "R" we can use this script:
mod3.pcr<-pcr(Y~X,data=nir.tr1.2dmsc,ncomp=10,validation="LOO")
One of the outputs we get from this
calculation is mod3.pcr$coefficients,
this output contains the regression coefficients, one for every wavelength. The
coefficients change depending of the number of principal components we use. In
this case we have chosen a maximum of 10, and we the cross validation we will
decide the number of PCs to use.
When we calculate the PCs, we get from the
original spectra matrix, two matrices: The score matrix (T) and the loading
matrix (P) and they are related with the formula X=T.Pt + E.
We use the T matrix for the calculation of
the regression coefficient in the PCR.
Here I do the steps to calculate the
regression coefficients in R and I will check finally if they match with the
coefficients calculated with the PCR function in the PLS package. I use for
this exercise the shootout 2002 data as in previous posts.
Another output from the PCR function is mod3.pcr$scores, which is the T matrix.
This matrix has been calculated in another
post:
T.pca<-X1.sg2dmsc_svd_T[,1:10]
It is easy to check that: T.pcr==T.pca give TRUE for all.
Now we develop in R (with the T matrix) this formula:
xhat=(Tt.T)-1. Tt.Y
T.pca<-X1.sg2dmsc_svd_T[,1:10] (n.a) matrix
T.pca.t<-t(T.pca) (a.n) matrixTt.T<-T.pca.t%*%T.pca (a.a) matrix
Tt.T.inv<-solve(Tt.T) (a.a) matrix
Tt.T.inv.Tt<-Tt.T.inv%*%T.pca.t (a.n) matrix
x.hat<-Tt.T.inv.Tt%*%Y (a.1) matrix
and finally
reg.coef=P.xhat
reg.coef<-P.pca%*%x.hat (k.1) matrix
We check if they match:
pcr.coef.10<-as.matrix(mod3.pcr$coefficients[,,10])pcr.coef.10[1:10,]
> pcr.coef.10[1:10,]
-1.010183 8.223292 14.806965 26.005365 33.214725 20.101344 14.456345
9.498212 4.154216 -5.061061
> reg.coef[1:10,]
-1.010183 8.223292 14.806965 26.005365 33.214725 20.101344 14.456345
9.498212 4.154216 -5.061061
No hay comentarios:
Publicar un comentario