R & Chemometrics: noviembre 2013

22 nov 2013

Interview Brad Swarbrick (CAMO): Introduction to Multivariate Data Analysis

From YouTube: "Brad Swarbrick, Vice President of Business Development at CAMO Software, gives a short introduction to multivariate data analysis, discusses some of its applications and how these powerful analytical tools are being used to improve products and manufacturing processes in a wide range of industries".

19 nov 2013

Monochromator: Grating / Encoder

One of the most common patterns in the noise test if a monochromatic instrument is when an encoder fails ( see the post: Diagnostics: Lamp Bias / Encoder Peak / Water Vapour Bands) .

It affect to the noise level scale (out of tolerance limits, for the RMS and Peak to Peak noise), but it can affect also to the wavelength scale (.

The encoder is the spare that measure with a high accuracy and precision the position of every wavelength which come out from the grating, at the same time it has a motor which moves the grating, as you can see in the video).

The video shows how the polichromatic light from a lamp comes out from a slit and hits the grating which will generate monochromatic light to another slit and to the sample and detectors. This type of monochromator is known as pre-dispersive. (See the post: NIR: Post-dispersive concept to see the difference)

One complete scan is a movement of the grating from left to right. In order to decrease the noise, normally, 32 scans are acquired to average (32 for the reference and 32 for the sample).

When the axis of the encoder motor ( which moves the grating ) does not move correctly ( due to high friction ), we can see a peak at 1500 nm aproximately.

If this persist, it can be the time to replace it.
The noise spectra of an instrument with encoder problems can be similar to this:

Here we can see crearly the peak at 1500 nm, and also other noise due to the water vapour bands.

In this picture we can see the grating and the motor-encoder

Once mounted, it is necessary adjust in order to match the peaks of the polyestirene internal filter with the correct standards wavelengths . This point is important because we don´t want any change in the performance of the calibrations, so the spectrum pof the polyestirene after and before the encoder change should match as better as possible.

We can scan (after the encoder change) a sealed check sample, and compare the spectrum and results with the previous ones (dates when the instrument was performing fine, or when it was performing bad (due to noise by the encoder) and to get our own conclusions.

The picture shows a Check Cell (a sealed cup) with a product inside (soya meal ). Seal the cup prevent that there are few changes in the composition from one day to other.

15 nov 2013

NIR Instruments aprove for grain trade (Australia)

Press this link to see the list of instruments aprove for grain trade in Australia (National Measurement Institute)

13 nov 2013

NIR to control the process

These are two NIR on-line instrument to control the production in the two lines of a Flour Mill Company.

With this two instruments all the production can be controlled for certain parameters as Moisture, Protein and Ash.

If you want to see how the product pass trough the window, you can see the post:

"On-line:Reflectance Sample presentation"

To get a representative sample for validation and calibration, we have to acquire the sample physically from a point near the sample window and in the moment that we acquire the spectrum. This way we will get the maximun correlation between the spectra and the lab data for the sample.

These instruments must be installed in a place free of turbulences, so we can see how the sample pass trough the window without any gaps of product flow. Anyway we can train the Model to discard samples where there are disturbances in the samples, or when there is no product flowing. All this sample will be outliers, with high Mahalanobis distance values, and will not count in the batch statistics.

We can use global, local, or discriminant models, to predict the samples. The last one (discriminant) is quite interesting, in cases we have quite a lot of different products. We can train the model to distinguish wich sample is flowing and you will get the name of the product at the same time that the chemical values from a particular equation for each product (this takes a lot of chemometrics involve).

LOCAL calibrations can give accurate results and are really easy to maintain.

I´m testing these two types of models and I will came back to this post as soon as I get values from some validation sets I´m waiting.

4 nov 2013

Step-up & Stepwise Regressions with R

We know that a spectrum has a serial of variables ("x1","x2",....."xm", one at each wavelength), which are the photometric responses for a certain sample. Once we have a certain number of samples "n", we want to develop ‘a model to predict a constituent "y".

Practicing with a table from the book "Introduction to Multivariate Statistical Analysis in Chemometrics" we can set the basis to understand the concepts of "Step-up" and "Stepwise" Regression.

Let´s see the RSQ of each wavelength vs. the "y" variable:

(cor(x1,y1))^2 0.4768583

(cor(x2,y1))^2 0.4112143
(cor(x3,y1))^2 0.02528684

We can see that variables "x1" and "x2" have some correlation, but the variable "x3" has a very poor correlation with "y1".

So we start the model with just one term (the x1), and we will add a new term, which will improve the model with a high F test value, we can try with "x2" and "x3", and we clearly see that the model will improve adding "x2" (better XY plot for x1_2 than for x1_3).

res1_2<-lm(y1~x1+x2)

#Intercept= 1.3528 x1= 4.4328 x2= 4.1164

Improving the RSQ to:

(cor(x1_2,y))^2 0.8880725

Adding "x3" to the model as a third term does not improve the RSQ, because it has a very low coefficient (practically cero).

x1_2_3<-1.41833 + 4.42317*x1 + 4.10108*x2 -0.03574*x3
(cor(x1_2_3,y1))^2 0.8883945

We can use the function step to find the better combination of variables to develop a model. In the case of just 3 , we can proceed as this:
lm_all<-lm(y~x1+x2+x3)
lm_step<-step(lm_all,direction="both")
summary(lm_step)

The summary give us the best variables to use and the regression coeficients: b0, b1 and b2 ( as we can see, with the same results than for "res1_2")
#(Intercept):1.3528 x1= 4.4328 x2= 4.1164

--------------------------------------------------------------------------------------------------------------------------

Let´s use now the package "Chemometrics", with NIR data:

The spectra are:

We use the stepwise function from the chemometric package to get values as:

$usedtime

$bic (Bayesian information criterion)

$models
$varnames
RSS (Residual Sum of Squares) is an statistic wich decrease as far as we add a new term, so the BIC formula has a penalty every time we add a new term.

BIC is an statistic which is decreasing as far as we improve the model, variables are added or dropped until the BIC can not be reduced more.

The wavelengths ($varnames) selected are:

"X1115.0" "X1185.0" "X1215.0" "X1385.0" "X1420.0" "X1500.0" "X1565.0" "X1585.0" "X1690.0" "X1715.0" "X1720.0" "X1815.0" "X1995.0" "X2070.0" "X2100.0" "X2195.0"

So we can go ahead with the regression:

y ~ X1115.0 + X1185.0 + X1215.0 + X1385.0 + X1420.0 + X1500.0 +
X1565.0 + X1585.0 + X1690.0 + X1715.0 + X1720.0 + X1815.0 +
X1995.0 + X2070.0 + X2100.0 + X2195.

We have found the 15 wavelengths for the model, but now we want to check how it performs, so we need some data for validation in order to calculate the SEP (standard error of prediction) that we expect for future samples predicted with this model.

We are going to do it using the Cross Validation (CV). As you know, with this type of validation, the total sample set is divided in groups, for example four, one group is used for validation and with the other three a MLR regression is made with the 15 wavelengths (X matrix) and the parameter Glucose (Y matrix). The samples which belong to each group are selected randomly. This process will be repeated for a number of times (in this case 100 times), so it seems to be a well mode to validate our model. This type of Cross Validation is called "k-fold cross validation".

This time I prepare a video, to explain the validation regression procedure.

Recomended book in NIR-Quimiometría to follow tutorials: