R & Chemometrics: junio 2015

24 jun 2015

Unscrambler Video: Chemometrics applied to NIR data

It´s nice to see always this kind of videos (software chemometric tools), explaining how to treat and proceed NIR data in this case with Unscrambler software.

22 jun 2015

How to import a TXT spectra file into Win ISI

I use in this case the Gasoline spectra from R chemometric package PLS.
Now I open in Win ISI the Convert Tool:

I select the file gasoline5.txt, and I put the output format in Win ISI.
Press Begin Conversion.
Answer if sample numbers are in the text file......................YES
Answer if there are constituents and the number..................In this case 0
Answer the total number of data points in the spectra.........401
Answer the number of Segments of the Spectra:
In this case I select two:

One from 900 to 1098 every two nanometers
Other from 1100 to 1700 every two nanometers

The convertion is completed and a new NIR file appear: gasoline5.nir

21 jun 2015

Scores and reconstruction

While looking to the first lesson of the algebra course of MIT, I capture this screen shot because it explain, in an easy way, the reconstruction of a spectrum.

The x and y would be the scores of the spectrum (0,3) , and we have to find them, knowing that the

first loading is (2,-1), and the second (-1,2), so x and y will be the solutions of the equation:

2x - y = 0

-x + 2y = 3

In this case there is a solution, and the residual is cero, but in the case of real spectra there are more variables than two and we try to fit as best as possible the solution to the unknown with the linear combinations of the loadings multiplied by the scores, and the residual is the residual vector e.

The loading matrix “P” : a good example of orthogonal matrix

We know that for an orthogonal matrix A:

A^t.A=A.A^t=I

When we calculate the loading matrix during the PCA process, each loading is orthogonal (perpendicular) to all others. So we can check for fun in R, Excel,…., this condition with the loading matrix.

P is a very large matrix, so we will check it with just a few columns (6 loadings or terms) and the same number of files (6 wavelengths):

> round(gas.loadings[1:6,1:6],digits=4)
          PC1   PC2   PC3    PC4   PC5    PC6
900 nm -0.011 0.022 0.034 -0.039 0.042 -0.020
902 nm -0.010 0.022 0.031 -0.041 0.039 -0.022
904 nm -0.011 0.022 0.030 -0.042 0.036 -0.021
906 nm -0.012 0.024 0.027 -0.045 0.031 -0.012
908 nm -0.013 0.021 0.025 -0.045 0.035 -0.013
910 nm -0.014 0.023 0.023 -0.046 0.036 -0.018

Pt is the transpose, so the columns are the wavelengths and the files the loadings:

> round(t(gas.loadings[1:6,1:6]),digits=4)
    900 nm 902 nm 904 nm 906 nm 908 nm 910 nm
PC1 -0.011 -0.010 -0.011 -0.012 -0.013 -0.014
PC2  0.022  0.022  0.022  0.024  0.021  0.023
PC3  0.034  0.031  0.030  0.027  0.025  0.023
PC4 -0.039 -0.041 -0.042 -0.045 -0.045 -0.046
PC5  0.042  0.039  0.036  0.031  0.035  0.036
PC6 -0.020 -0.022 -0.021 -0.012 -0.013 -0.018

Now we multiply the two matrix:

> round((gas.loadings[1:6,1:6])%*% solve((gas.loadings[1:6,1:6])),digits=4)
       900 nm 902 nm 904 nm 906 nm 908 nm 910 nm
900 nm      1      0      0      0      0      0
902 nm      0      1      0      0      0      0
904 nm      0      0      1      0      0      0
906 nm      0      0      0      1      0      0
908 nm      0      0      0      0      1      0
910 nm      0      0      0      0      0      1


> round(((solve(gas.loadings[1:6,1:6]))%*%(gas.loadings[1:6,1:6])),digits=4)
    PC1 PC2 PC3 PC4 PC5 PC6
PC1   1   0   0   0   0   0
PC2   0   1   0   0   0   0
PC3   0   0   1   0   0   0
PC4   0   0   0   1   0   0
PC5   0   0   0   0   1   0
PC6   0   0   0   0   0   1

14 jun 2015

Studing structure in LOCAL for validation

One way to understand the structure of the spectra population is to order the database of spectra by the constituent of interest and select different groups (in this case eight). One group is keeping it for validation and the others for calibration, so in the case of the figure I use “group_6” for validation and all the rest for calibration.

I continue with all possible combinations.

It is the same that “cross validation”, but in this case I use the LOCAL algorithm. The resulting statistics (RSQ , SEP, ….), help me to understand if the calibration will perform as it should in routine and to fine outliers.

It was very useful to improve the performance of a calibration for Process Analysis, where GLOBAL, did not perform well, and LOCAL seems to be better.

In Process we can´t expect very nice statistics, but not so high RSQ can help you to see tendencies, and take decisions immediately.