24 jun 2015
Unscrambler Video: Chemometrics applied to NIR data
22 jun 2015
How to import a TXT spectra file into Win ISI
I use in this case the Gasoline spectra from R chemometric package PLS.
Now I open in Win ISI the Convert Tool:
I select the file gasoline5.txt, and I put the output format in Win ISI.
Press Begin Conversion.
Answer if sample numbers are in the text file......................YES
Answer if there are constituents and the number..................In this case 0
Answer the total number of data points in the spectra.........401
Answer the number of Segments of the Spectra:
In this case I select two:
One from 900 to 1098 every two nanometers
Other from 1100 to 1700 every two nanometers
The convertion is completed and a new NIR file appear: gasoline5.nir
Now I open in Win ISI the Convert Tool:
I select the file gasoline5.txt, and I put the output format in Win ISI.
Press Begin Conversion.
Answer if sample numbers are in the text file......................YES
Answer if there are constituents and the number..................In this case 0
Answer the total number of data points in the spectra.........401
Answer the number of Segments of the Spectra:
In this case I select two:
One from 900 to 1098 every two nanometers
Other from 1100 to 1700 every two nanometers
The convertion is completed and a new NIR file appear: gasoline5.nir
21 jun 2015
Scores and reconstruction
While looking to the first lesson of the algebra course of MIT, I capture this screen shot because it explain, in an easy way, the reconstruction of a spectrum.
The x and y would be the scores of the spectrum (0,3) , and we have to find them, knowing that the
first loading is (2,-1), and the second (-1,2), so x and y will be the solutions of the equation:
2x - y = 0
-x + 2y = 3
In this case there is a solution, and the residual is cero, but in the case of real spectra there are more variables than two and we try to fit as best as possible the solution to the unknown with the linear combinations of the loadings multiplied by the scores, and the residual is the residual vector e.
The loading matrix “P” : a good example of orthogonal matrix
We know that for an orthogonal
matrix A:
Now we multiply the two matrix:
At.A=A.At=I
When we calculate the loading
matrix during the PCA process, each loading is orthogonal (perpendicular) to
all others. So we can check for fun in R, Excel,…., this condition with the
loading matrix.
P
is a very large matrix, so we will check it with just a few columns (6 loadings
or terms) and the same number of files (6 wavelengths):
> round(gas.loadings[1:6,1:6],digits=4) PC1 PC2 PC3 PC4 PC5 PC6 900 nm -0.011 0.022 0.034 -0.039 0.042 -0.020 902 nm -0.010 0.022 0.031 -0.041 0.039 -0.022 904 nm -0.011 0.022 0.030 -0.042 0.036 -0.021 906 nm -0.012 0.024 0.027 -0.045 0.031 -0.012 908 nm -0.013 0.021 0.025 -0.045 0.035 -0.013 910 nm -0.014 0.023 0.023 -0.046 0.036 -0.018
Pt is the transpose, so the columns are the wavelengths and the files the loadings:
> round(t(gas.loadings[1:6,1:6]),digits=4) 900 nm 902 nm 904 nm 906 nm 908 nm 910 nm PC1 -0.011 -0.010 -0.011 -0.012 -0.013 -0.014 PC2 0.022 0.022 0.022 0.024 0.021 0.023 PC3 0.034 0.031 0.030 0.027 0.025 0.023 PC4 -0.039 -0.041 -0.042 -0.045 -0.045 -0.046 PC5 0.042 0.039 0.036 0.031 0.035 0.036 PC6 -0.020 -0.022 -0.021 -0.012 -0.013 -0.018
Now we multiply the two matrix:
> round((gas.loadings[1:6,1:6])%*% solve((gas.loadings[1:6,1:6])),digits=4) 900 nm 902 nm 904 nm 906 nm 908 nm 910 nm 900 nm 1 0 0 0 0 0 902 nm 0 1 0 0 0 0 904 nm 0 0 1 0 0 0 906 nm 0 0 0 1 0 0 908 nm 0 0 0 0 1 0 910 nm 0 0 0 0 0 1
> round(((solve(gas.loadings[1:6,1:6]))%*%(gas.loadings[1:6,1:6])),digits=4) PC1 PC2 PC3 PC4 PC5 PC6 PC1 1 0 0 0 0 0 PC2 0 1 0 0 0 0 PC3 0 0 1 0 0 0 PC4 0 0 0 1 0 0 PC5 0 0 0 0 1 0 PC6 0 0 0 0 0 1 | |
14 jun 2015
Studing structure in LOCAL for validation
One way to understand the structure of the spectra
population is to order the database of spectra by the constituent of interest
and select different groups (in this case eight). One group is keeping it for
validation and the others for calibration, so in the case of the figure I use “group_6”
for validation and all the rest for calibration.
I continue with all possible combinations.
I continue with all possible combinations.
It is the same that “cross
validation”, but in this case I use the LOCAL algorithm. The resulting
statistics (RSQ , SEP, ….), help me to understand if the calibration will
perform as it should in routine and to fine outliers.
It was very useful to improve the performance of a
calibration for Process Analysis, where GLOBAL, did not perform well, and LOCAL
seems to be better.
In Process we can´t expect very nice statistics, but not
so high RSQ can help you to see tendencies, and take decisions immediately.
Suscribirse a:
Entradas (Atom)