R & Chemometrics: abril 2016

30 abr 2016

Shootout 2016 files and rules, available to download

You can download the files to play, work or participate in the:
"2016 Shootout".
Spectra data is in Excel, so you can use R, Excel (remember the good book about Chemometrics in Excel from Alexey L. Pomerantsev), Win ISI, or any other software working with tools to import it from Excel or CSV to this softwares.

Comparing PC scores maps (Win ISI & Resemble)

I use very often Win ISI, and I like to compare how the algorithms work in Win ISI compares with other software and especially with the R packages. Win ISI use SVD as the method to calculate the scores and loadings matrix. Resemble use also this algorithm.

In the post "Tutorials with Resemble (part3)", we saw the scores map for the first and second PC.

In the post “Importing NIRsoil spectra fromResemble into WinISI”, we saw how to import and work with NIRsoil spectra in Win ISI.

Win ISI has the function “Create a Score file from a Spectra file”, where we calculate the PCs and also detect the outliers, where we can plot the sample scores maps.

And we can also over plot the validation samples with the training samples in order to check if the validation samples fit in the space defined by the training samples.

I show here the plot from Resemble with the training samples in blue color and the validation samples in red.

Next picture show the sample plot but in Win ISI with the training samples in blue and the validation samples in green. As you can see the are the same.

24 abr 2016

Classification of different tomato seed cultivars by multispectral visible-near infrared spectroscopy and chemometrics

Hi to all readers of this Blog.

Get this paper free from :

http://www.impublications.com/content/abstract?code=I05_a1

"Classification of different tomato seed cultivars by multispectral visible-near infrared spectroscopy and chemometrics".
S. Shrestha, L. Deleuran, R. Gislum and . , “Classification of different tomato seed cultivars by multispectral visible-near infrared spectroscopy and chemometrics”, J. Spectral Imaging 5, a1 (2016). doi: 10.1255/jsi.2016.a1

Have a nice week.

18 abr 2016

Spectroscopy Europe: "Sampling Column"

I really recommend to visit the "Sampling Column" web page of "Spectroscopy Europe Magazine", where you can find very useful articles to download and to become an expert in the theory of sampling.
Sure we can find some good advices to get a representative sample to analyze in our instruments to improve the correlation and errors of the calibrations.

LINK: http://www.spectroscopyeurope.com/articles/sampling

7 abr 2016

Reviewing some posts (Fatty Acids with R)

Thanks to Christof for his e-mail, and their interest in this blog. I know, like Christof ,other people would like to start the Tutorial about the NIT: Fatty Acids with R.

I have sent in the past the "tocino6.txt" file by mail to some blog readers, but maybe I had not read (or I lost) some e-mails asking for the file, so if this is the case, please let me know it again and I will send you the file by e-mail or trough a link to a Dropbox folder, where I will add the data of the tutorials or other information.

I deleted, also in the past some pictures accidentally, so I will review first the tutorials of the "NIT:Ftty Acids with R", and when ready I will write a post to the link of all the posts of these tutorial. Also a new label will be added to be easier to find them.

By the moment, the first three posts of this tutorial are ready with new features scripts and pictures, to everyone who wants to follow it.

NIT: Fatty acids study in R - Part 001
NIT: Fatty acids study in R - Part 002
NIT: Fatty acids study in R - Part 003
NIT: Fatty acids study in R - Part 004
NIT: Fatty acids study in R - Part 005

You would need the tocino6.txt file from post 1:5, after these posts we will validate the model eith the file "tocino6val5.txt" and we combine both in "tocino7.txt" to develop the final regression.
These files are available from Dropbox or I cand send tem by mail, just let me know your interest writing to me a mail: cuesta_joseramon@yahoo.es

4 abr 2016

Looking to the Resemble score plots (2D and 3D)

It´s always nice (when testing a new package from the several chemometric packages), use other packages to get nice plots and information.

Now, that I am using again the Principal Components, is a good occasion to use packages as: "scatterplot3d", "rgl" or "Rcmdr" (you can download it, and install it from the CRAN Servers), to see the scores, and have better idea a.

In the previous pots we saw how to get the matrix of scores "T" in the principal components projections with Resemble: "pcProj$scores".

So now we just have to load scatterplot3d library:

> library(scatterplot3d)

and plot the scores for the first 3 PCs:

>scatterplot3d(pcProj$scores[,1:3], color = "blue",
              angle = 55, scale.y = 0.7, pch = 16)

After the plot appears:

We can see also the projections of the scores over the plane formed by PC1 and PC2:

scatterplot3d(pcProj$scores[,1:3], highlight.3d =TRUE,

     angle = 120,type="h",col.axis = "blue",

     col.grid = "lightblue",cex.axis = 1.3,

     cex.lab = 1.1,pch = 20)

 Now install the package "rgl" and load it:

library(rgl)
plot3d(pcProj$scores[,1:3], col="blue", size=3)

We can rotate with the mouse the cube in order to check better the population.

Another nice way to see the scores and projections in 3D is loading the package Rcmdr, and in the function "scatter3d", we select 3 columns of the T (scores) matrix. We can move it with the mouse in order to select the best side to look to the scores.

library(Rcmdr)
scatter3d(pcProj$scores[,1],pcProj$scores[,2],pcProj$scores[,3])

1 abr 2016

Tutorials with Resemble (Part 4.b)

Using the NIRsoil demo spectra from Resemble, we can practice the function "orthoProjection", as the "resemble.pdf" manual explains.

In this case we use "orthoProjection" using the "pca method", that as we saw in the previous post, it uses the SVD PCA calcutation method.

OrthoProjection can select a maximum of 40 PCs, but an algorithm is used to select the maximum recommended value.

We can start this part of the tutorial with:

pcProj<-orthoProjection(Xr=X_train,X2=NULL,Yr=Y_train,

+ method="pca",pcSelection=list("opc",40))

As we can see we have use for “pcselection, then OPC method, and the list of possible terms goes from 1 to 40. Of course not all would be necessary and the OPC method will decide the number selected.

In the resemble manual we can read:

“When method = "opc", the selection of the components is carried out by using an iterative method based on the side information concept (Ramirez-Lopez et al. 2013a, 2013b). First let be P a sequence of retained components (so that P = 1; 2; :::; k. At each iteration, the function computes a dissimilarity matrix retaining pi components. The values of the side information of the samples are compared against the side information values of their most spectrally similar samples. The optimal number of components retrieved by the function is the one that minimizes the root mean squared differences (RMSD) in the case of continuous variables”.

If we check (after the pcProj calculation):

> pcProj$n.components

[1] 20

We can see that the number selected is 20.

We can see this more graphically in a plot:

> plot(pcProj)

For this calculation (using the "opc" principal componets selection) the reference Matrix "Yr" (reference matrix) is needed apart from the spectra matrix "Xr".
We cas see the list of values for the RMSD in
>pcProj$opcEval

In case we use other method, like cummulative variance, "Yr" will be not needed. Of course if the method used for the orthoProjection is "pls" indeed "pca", the reference matrix "Yr" will always be needed.
The principal components space (with the number of components selected) will be used for the calculation of the Mahalanobis distance (distance to the PC centroid) for every sample un the validation spectra matrix "Xu".