29 mar 2016

Tutorials with Resemble (Part 4.a)

SVD (Singular Value Decomposition) is the algorithm used in Resemble to
calculate de Scores and Loading Matrix in functions like orthoProjection.
We have seen this method several times in this blog, but this tutorial is
an oportunity to repeat the steps for its calculation:

pcProj<-orthoProjection(Xr=X_train,X2=NULL,Yr=Y_train,method="pca",
        + pcSelection=list("opc",40))  

names(pcProj)
      "scores"       "X.loadings"   "variance" 
      "sc.sdv"       "n.components" "pcSelection"  
      "center"       "scale"        "method"       
      "opcEval"   
> pcProj$variance (for the first 3 principal components)     
                 pc1        pc2         pc3  
sdv        2.0206860 0.30988115 0.115319110  #Standard deviation of each component 
cumExplVar 0.9712854 0.99412767 0.997291052  #Cumulative explained variance 
explVar    0.9712854 0.02284228 0.003163383  #Explained variance

We can check how resemble calculate this values using all this script:
############### SINGULAR VALUE DESCOMPOSITION ##################
Xt_mean<-colMeans(X_train)
#We substract the Mean Spectrum to every spectrum of the Training Set.
#This is called as "Center"
#We can use this function for that:         
X_train_c<-scale(X_train,center = TRUE,scale =FALSE)
#Let´s calculate the Matrix "d", "u" and "v" with "svd"
Xt_svd<-svd(X_train_c)
#now we have the three matrices: "d" "u" "v"
#In order to save memory R use a diferent convention
#for the matrix dimensions q<-min(n,m)
Xt_svd_U<-Xt_svd$u  #Matrix U  (dim= n.q)
Xt_svd_d<-Xt_svd$d  #diagonal elements of D:#square root of eigenvalues
Xt_svd_d[1:20]
Xt_svd_d<-Xt_svd_d[1:20]
36.87405917  5.65480039  2.10437628  1.62599314  0.74187387  0.54914598 
0.28188014   0.27391151  0.20568644  0.16270108  0.15283486  0.13692129 
0.08754608   0.07632065  0.07013343  0.05292563  0.05069955  0.04231237   
0.03851919   0.02561542
Xt_svd_d2<-Xt_svd_d^2  #d^2 (explained variance)
1.359696e+03 3.197677e+01 4.428400e+00 2.643854e+00 5.503768e-01 
3.015613e-01 7.945642e-02 7.502752e-02 4.230691e-02 2.647164e-02 
2.335850e-02 1.874744e-02 7.664316e-03 5.824841e-03 4.918698e-03 
2.801123e-03 2.570444e-03 1.790337e-03 1.483728e-03 6.561497e-04
explVar<-round(Xt_svd_d2/sum(Xt_svd_d2),digits=7)
0.9712877 0.0228423 0.0031634 0.0018886 0.0003932 0.0002154 0.0000568 
0.0000536 0.0000302 0.0000189 0.0000167 0.0000134 0.0000055 0.0000042 
0.0000035 0.0000020 0.0000018 0.0000013 0.0000011 0.0000005
Xt_svd_V<-Xt_svd$v              #Matrix V  (dim= m.q)  
Xt_svd_T<-Xt_svd_U %*%Xt_svd_D  #Score Matrix (T)
Xt_svd_T<-Xt_svd_T[,1:20]       #dim 334*20
sdev<-apply(Xt_svd_T,2,sd) #standard deviation of each component
2.020685995 0.309881152 0.115319110 0.089103875 0.040654438 0.030093014 
0.015446937 0.015010258 0.011271547 0.008915964 0.008375299 0.007503241 
0.004797496 0.004182346 0.003843288 0.002900307 0.002778318 0.002318704 
0.002110838 0.001403716
Xt_svd_P<-Xt_svd_V               #Loading Matrix (P)

No hay comentarios:

Publicar un comentario