29 mar. 2016

Tutorials with Resemble (Part 4.a)

SVD (Singular Value Decomposition) is the algorithm used in Resemble to
calculate de Scores and Loading Matrix in functions like orthoProjection.
We have seen this method several times in this blog, but this tutorial is
an oportunity to repeat the steps for its calculation:

        + pcSelection=list("opc",40))  

      "scores"       "X.loadings"   "variance" 
      "sc.sdv"       "n.components" "pcSelection"  
      "center"       "scale"        "method"       
> pcProj$variance (for the first 3 principal components)     
                 pc1        pc2         pc3  
sdv        2.0206860 0.30988115 0.115319110  #Standard deviation of each component 
cumExplVar 0.9712854 0.99412767 0.997291052  #Cumulative explained variance 
explVar    0.9712854 0.02284228 0.003163383  #Explained variance

We can check how resemble calculate this values using all this script:
############### SINGULAR VALUE DESCOMPOSITION ##################
#We substract the Mean Spectrum to every spectrum of the Training Set.
#This is called as "Center"
#We can use this function for that:         
X_train_c<-scale(X_train,center = TRUE,scale =FALSE)
#Let´s calculate the Matrix "d", "u" and "v" with "svd"
#now we have the three matrices: "d" "u" "v"
#In order to save memory R use a diferent convention
#for the matrix dimensions q<-min(n,m)
Xt_svd_U<-Xt_svd$u  #Matrix U  (dim= n.q)
Xt_svd_d<-Xt_svd$d  #diagonal elements of D:#square root of eigenvalues
36.87405917  5.65480039  2.10437628  1.62599314  0.74187387  0.54914598 
0.28188014   0.27391151  0.20568644  0.16270108  0.15283486  0.13692129 
0.08754608   0.07632065  0.07013343  0.05292563  0.05069955  0.04231237   
0.03851919   0.02561542
Xt_svd_d2<-Xt_svd_d^2  #d^2 (explained variance)
1.359696e+03 3.197677e+01 4.428400e+00 2.643854e+00 5.503768e-01 
3.015613e-01 7.945642e-02 7.502752e-02 4.230691e-02 2.647164e-02 
2.335850e-02 1.874744e-02 7.664316e-03 5.824841e-03 4.918698e-03 
2.801123e-03 2.570444e-03 1.790337e-03 1.483728e-03 6.561497e-04
0.9712877 0.0228423 0.0031634 0.0018886 0.0003932 0.0002154 0.0000568 
0.0000536 0.0000302 0.0000189 0.0000167 0.0000134 0.0000055 0.0000042 
0.0000035 0.0000020 0.0000018 0.0000013 0.0000011 0.0000005
Xt_svd_V<-Xt_svd$v              #Matrix V  (dim= m.q)  
Xt_svd_T<-Xt_svd_U %*%Xt_svd_D  #Score Matrix (T)
Xt_svd_T<-Xt_svd_T[,1:20]       #dim 334*20
sdev<-apply(Xt_svd_T,2,sd) #standard deviation of each component
2.020685995 0.309881152 0.115319110 0.089103875 0.040654438 0.030093014 
0.015446937 0.015010258 0.011271547 0.008915964 0.008375299 0.007503241 
0.004797496 0.004182346 0.003843288 0.002900307 0.002778318 0.002318704 
0.002110838 0.001403716
Xt_svd_P<-Xt_svd_V               #Loading Matrix (P)

21 mar. 2016

Resemble Package 1.2.2 available for downloading

Resemble package version 1.2.2 is available for downloading from CRAN. 
You can get also the Reference Manual.
After you have the ZIP file in your PC, select in R-Studio: Install Packages:
It requires that R be updated to 3.2.2 or higher.
I have updated just now so I will continue the tutorials with this version which solves some bugs from the previous one.

19 mar. 2016

Importing NIRsoil spectra from Resemble into Win ISI

This is the second post where I import data from R packages (in this case the NIRsoil spectra from Resemble package) into a project in Win ISI which is the software I usually use in my job. 

First thing to do is to export the training and validation sets in a ".txt" table:

Now as explained in the post: 
How to import a TXT spectra file into Win ISI 

We import the table into WinISI with the option CONVERT, and we can work from now with these data sets in Win ISI and Resemble.

Training Set:

Validation Set:
 Follow the tutorials in NIR-Quimiometria.

6 mar. 2016

Tutorials with Resemble (Part 3)

Read firs the previous tutorials to follow this post:
“ex1” is a list, and one of the values is “pcAnalysis” which is another list containing the scores of the  training spectra matrix and the scores of the validation spectra matrix. The scores are standardized, so the variance is one in each principal component.


As we have done in other plots if we have the score matrix we can plot the different planes to check the samples in the PC space select the combinations we prefer, in this plot we represent PC1 vs. PC2:


 and we can draw ellipses of radio 1,  to see more clearly the distance to the centroid.
Training samples are in blue and validation samples are in red.

This is an example of how to build an ellipse of radio 4:

library( plotrix)
             angle=0, lty=3)