29 mar. 2016

Tutorials with Resemble (Part 4.a)

SVD (Singular Value Decomposition) is the algorithm used in Resemble to
calculate de Scores and Loading Matrix in functions like orthoProjection.
We have seen this method several times in this blog, but this tutorial is
an oportunity to repeat the steps for its calculation:

`pcProj<-orthoProjection(Xr=X_train,X2=NULL,Yr=Y_train,method="pca",`
`        + pcSelection=list("opc",40))  `

`names(pcProj)`
`      "scores"       "X.loadings"   "variance" `
`      "sc.sdv"       "n.components" "pcSelection"  `
`      "center"       "scale"        "method"       `
`      "opcEval"   `
`> pcProj\$variance (for the first 3 principal components)     `
`                 pc1        pc2         pc3  `
`sdv        2.0206860 0.30988115 0.115319110  #Standard deviation of each component `
`cumExplVar 0.9712854 0.99412767 0.997291052  #Cumulative explained variance `
`explVar    0.9712854 0.02284228 0.003163383  #Explained variance`

`We can check how resemble calculate this values using all this script:`
`############### SINGULAR VALUE DESCOMPOSITION ##################`
`Xt_mean<-colMeans(X_train)`
`#We substract the Mean Spectrum to every spectrum of the Training Set.`
`#This is called as "Center"`
`#We can use this function for that:         `
`X_train_c<-scale(X_train,center = TRUE,scale =FALSE)`
`#Let´s calculate the Matrix "d", "u" and "v" with "svd"`
`Xt_svd<-svd(X_train_c)`
`#now we have the three matrices: "d" "u" "v"`
`#In order to save memory R use a diferent convention`
`#for the matrix dimensions q<-min(n,m)`
`Xt_svd_U<-Xt_svd\$u  #Matrix U  (dim= n.q)`
`Xt_svd_d<-Xt_svd\$d  #diagonal elements of D:#square root of eigenvalues`
`Xt_svd_d[1:20]`
`Xt_svd_d<-Xt_svd_d[1:20]`
`36.87405917  5.65480039  2.10437628  1.62599314  0.74187387  0.54914598 `
`0.28188014   0.27391151  0.20568644  0.16270108  0.15283486  0.13692129 `
`0.08754608   0.07632065  0.07013343  0.05292563  0.05069955  0.04231237   `
`0.03851919   0.02561542`
`Xt_svd_d2<-Xt_svd_d^2  #d^2 (explained variance)`
`1.359696e+03 3.197677e+01 4.428400e+00 2.643854e+00 5.503768e-01 `
`3.015613e-01 7.945642e-02 7.502752e-02 4.230691e-02 2.647164e-02 `
`2.335850e-02 1.874744e-02 7.664316e-03 5.824841e-03 4.918698e-03 `
`2.801123e-03 2.570444e-03 1.790337e-03 1.483728e-03 6.561497e-04`
`explVar<-round(Xt_svd_d2/sum(Xt_svd_d2),digits=7)`
`0.9712877 0.0228423 0.0031634 0.0018886 0.0003932 0.0002154 0.0000568 `
`0.0000536 0.0000302 0.0000189 0.0000167 0.0000134 0.0000055 0.0000042 `
`0.0000035 0.0000020 0.0000018 0.0000013 0.0000011 0.0000005`
`Xt_svd_V<-Xt_svd\$v              #Matrix V  (dim= m.q)  `
`Xt_svd_T<-Xt_svd_U %*%Xt_svd_D  #Score Matrix (T)`
`Xt_svd_T<-Xt_svd_T[,1:20]       #dim 334*20`
`sdev<-apply(Xt_svd_T,2,sd) #standard deviation of each component`
`2.020685995 0.309881152 0.115319110 0.089103875 0.040654438 0.030093014 `
`0.015446937 0.015010258 0.011271547 0.008915964 0.008375299 0.007503241 `
`0.004797496 0.004182346 0.003843288 0.002900307 0.002778318 0.002318704 `
`0.002110838 0.001403716`
`Xt_svd_P<-Xt_svd_V               #Loading Matrix (P)`