18 dic. 2013

4th Derivative to check noise


Derivatives add some noise to the spectra, as they increase the level (first, second, third, fourth,..), we can use this as an advantage to check for noise in the raw spectra due to scans in not stable instruments, poor sample presentation, spectra with high absorbance values, and other sources of noise.
In the video we can see the fourth derivative applied to raw spectra and how some have of them have extreme noise. Scale of the Absorbance axis is disturbed because of these extreme noisy spectra that we have to remove from the database.

22 nov. 2013

Interview Brad Swarbrick (CAMO): Introduction to Multivariate Data Analysis


From YouTube: "Brad Swarbrick, Vice President of Business Development at CAMO Software, gives a short introduction to multivariate data analysis, discusses some of its applications and how these powerful analytical tools are being used to improve products and manufacturing processes in a wide range of industries".

19 nov. 2013

Monochromator: Grating / Encoder

One of the most common patterns in the noise test if a monochromatic instrument is when an encoder fails ( see the post: Diagnostics: Lamp Bias / Encoder Peak / Water Vapour Bands) .
It affect to the noise level scale (out of tolerance limits, for the RMS and Peak to Peak noise), but it can affect also to the wavelength scale (.
The encoder is the spare that measure with a high accuracy and precision the position of every wavelength which come out from the grating, at the same time it has a motor which moves the grating, as you can see in the video).
The video shows how the polichromatic light from a lamp comes out from a slit and hits the grating which will generate monochromatic light to another slit and to the sample and detectors. This type of monochromator is known as pre-dispersive. (See the post: NIR: Post-dispersive concept to see the difference) 
One complete scan is a movement of the grating from left to right. In order to decrease the noise, normally, 32 scans are acquired to average (32 for the reference and 32 for the sample).
When the axis of the encoder motor ( which moves the grating ) does not move correctly ( due to high friction ), we can see a peak at 1500 nm aproximately.
If this persist, it can be the time to replace it.
The noise spectra of an instrument with encoder problems can be similar to this:
Here we can see crearly the peak at 1500 nm, and also other noise due to the water vapour bands.
 
In this picture we can see the grating and the motor-encoder
 
 
Once mounted, it is necessary adjust in order to match the peaks of the polyestirene internal filter with the correct standards wavelengths . This point is important because we don´t want any change in the performance of the calibrations, so the spectrum pof the polyestirene after and before the encoder change should match as better as possible.

We can scan (after the encoder change) a sealed check sample, and  compare the spectrum and results with the previous ones (dates when the instrument was performing fine, or when it was performing bad (due to noise by the encoder) and to get our own conclusions.
 
The picture shows a Check Cell (a sealed cup) with a product inside (soya meal ). Seal the cup prevent that there are few changes in the composition from one day to other.

 
 


 
 
 

13 nov. 2013

NIR to control the process

These are two NIR on-line instrument to control the production in the two lines of a Flour Mill Company.
With this two instruments all the production can be controlled for certain parameters as Moisture, Protein and Ash.
If you want to see how the product pass trough the window, you can see the post:
To get a representative sample for validation and calibration, we have to acquire the sample physically from a point near the sample window and in the moment that we acquire the spectrum. This way we will get the maximun correlation between the spectra and the lab data for the sample.
These instruments must be installed in a place free of turbulences, so we can see how the sample pass trough the window without any gaps of product flow. Anyway we can train the Model to discard samples where there are disturbances in the samples, or when there is no product flowing. All this sample will be outliers, with high Mahalanobis distance values, and will not count in the batch statistics.
 We can use global, local, or discriminant models, to predict the samples. The last one (discriminant) is quite interesting, in cases we have quite a lot of different products. We can train the model to distinguish wich sample is flowing and you will get the name of the product at the same time that the chemical values from a particular equation for each product (this takes a lot of chemometrics involve).
LOCAL calibrations can give accurate results and are really easy to maintain.
I´m testing these two types of models and I will came back to this post as soon as I get  values from some validation sets I´m waiting.


 
 

4 nov. 2013

Step-up & Stepwise Regressions with R

We know that a spectrum has a serial of variables ("x1","x2",....."xm", one at each wavelength), which are the photometric responses for a certain sample. Once we have a certain number of samples "n", we want to develop ‘a model to predict a constituent "y".
Practicing with a table from the book "Introduction to Multivariate Statistical Analysis in Chemometrics" we can set the basis to understand the concepts of "Step-up" and "Stepwise" Regression.
Let´s see the RSQ of each wavelength vs. the "y" variable:
(cor(x1,y1))^2    0.4768583
(cor(x2,y1))^2    0.4112143
(cor(x3,y1))^2    0.02528684
 
 
We can see that  variables "x1" and "x2" have some correlation, but the variable "x3" has a very poor correlation with "y1".
So we start the model with just one term (the x1), and we will add a new term, which will improve the model with a high F test value, we can try with "x2" and "x3", and we clearly see that the model will improve adding "x2" (better XY plot for x1_2 than for x1_3).
 
 
res1_2<-lm(y1~x1+x2)
#Intercept= 1.3528 x1= 4.4328 x2= 4.1164
Improving the RSQ to:
(cor(x1_2,y))^2    0.8880725
 
Adding "x3" to the model as a third term does not improve the RSQ, because it has a very low coefficient (practically cero).
x1_2_3<-1.41833 + 4.42317*x1 + 4.10108*x2 -0.03574*x3 
(cor(x1_2_3,y1))^2   0.8883945

We can use the function step to find the better combination of variables to develop a model. In the case of just 3 , we can proceed as this:
lm_all<-lm(y~x1+x2+x3)
lm_step<-step(lm_all,direction="both")
summary(lm_step)


The summary give us the best variables to use and the regression coeficients: b0, b1 and b2 ( as we can see, with the same results than for "res1_2")
#(Intercept):1.3528 x1= 4.4328 x2= 4.1164

--------------------------------------------------------------------------------------------------------------------------
Let´s use now the package "Chemometrics", with NIR data:
The spectra are:
 

 
 We use the stepwise function from the chemometric package to get values as:
$usedtime
$models
$varnames
RSS (Residual Sum of Squares) is an statistic wich decrease as far as we add a new term, so the BIC formula has a penalty every time we add a new term.
BIC is an statistic which is decreasing as far as we improve the model, variables are added or dropped until the BIC can not be reduced more.

The wavelengths ($varnames) selected are:
"X1115.0" "X1185.0" "X1215.0" "X1385.0" "X1420.0" "X1500.0" "X1565.0" "X1585.0" "X1690.0" "X1715.0" "X1720.0" "X1815.0" "X1995.0" "X2070.0" "X2100.0" "X2195.0"
So we can go ahead with the regression:
y ~ X1115.0 + X1185.0 + X1215.0 + X1385.0 + X1420.0 + X1500.0 +
X1565.0 + X1585.0 + X1690.0 + X1715.0 + X1720.0 + X1815.0 +
X1995.0 + X2070.0 + X2100.0 + X2195.
 
We have found the 15 wavelengths for the model, but now we want to check how it performs, so we need some data for validation in order to calculate the SEP (standard error of prediction) that we expect for future samples predicted with this model.
We are going to do it using the Cross Validation (CV). As you know, with this type of validation, the total sample set is divided in groups, for example four, one group is used for validation and with the other three a MLR regression is made with the 15 wavelengths (X matrix) and the parameter Glucose (Y matrix). The samples which belong to each group are selected randomly. This process will be repeated for a number of times (in this case 100 times), so it seems to be a well mode to validate our model. This type of Cross Validation is called "k-fold cross validation".
This time I prepare a video, to explain the validation regression procedure.
 
 
 

Recomended book in NIR-Quimiometría to follow tutorials:

25 oct. 2013

Applying "SNV + Detrend" to the spectra with R

Normally Detrend is used together with SNV as a Math treatment to remove the scatter. We have seen in the last post, how to apply Detrend to the spectra using the R package "pracma".
There are other posts in this blog about SNV (standard normal variate), and we can combine them to get the SNV + Detrend math treatment.
First let´s apply the SNV:
yarn_snv<-scale(t(yarn$NIR),center=TRUE,scale=TRUE)
wavelengths1<-seq(1,268,by=1)  #para "yarn"
matplot(wavelengths1,yarn_snv,type="l",main="Yarn spectra with +SNV",xlab="Wavelength (nm)",ylab="1/R (log 1/R)",lty=1,col=1)

 

and now the Detrend:
library(pracma)
yarndt_NIR<-detrend(t(yarn$NIR),tt="linear")
matplot(wavelengths1,yarndt_NIR,type="l",lty=1,xlab="Wavelength

+(nm)",ylab="1/R")


 

 


23 oct. 2013

Applying "Detrend" to the spectra with R

There is a nice math treatment I use to apply to the spectra in NIR. It is the Detrend.
Normally goes together with the SNV, but this time I´m going to apply it alone.
I´m going to use the Yarn data, with this raw spectra:
 

 
Now we load the R package "pracma":
 
library(pracma)
yarndt_NIR<-detrend(t(yarn$NIR),tt="linear")
wavelengths1<-seq(1,268,by=1)  #para "yarn"
matplot(wavelengths1,yarndt_NIR,type="l",lty=1,
+xlab="Wavelength(nm)",ylab="1/R")
 
The spectra change to this one:
 
We can see hou the "trend" effect ( a shift in the baseline) is corrected and the baseline adjusted.

:

25 sept. 2013

1-VR , R squared or coefficient of Determination

One of the most important statistics when we develop a model, to check the performance is the, R square (RSQ) or Coefficient of determination. This video (from the Kahn Academy), describe its calculation and reminds me the statistic 1 - VR from Win ISI.
The statistic called 1-VR (one minus variance ratio)  describes the variance explaind during the cross validation process. 
Beeing:
VR=(SECV*SECV) / (SD*SD)
where:

SECV = Standard Error Cross Validation
SD = Standard Deviation of Reference Values

So
1-VR = 1 - ((SECV*SECV) / (SD*SD))



17 sept. 2013

Minimizing Squared Error to Regression Line (Kahn Academy)

In all this series of videos from the Kahn Academy, we can see  how complex is the calculation of the slope and intercept, compared with the way we are use to do it with a scientific calculator or using the formulas from Excel or any other program. Anyway the first video is really interesting and sure you want to continue to the end to see the results of all these calculations.
 
 
 
 
 

13 sept. 2013

Sort and Select samples in Win ISI

Sometimes it is necessary to sort our sample sets for a better selection of the Calibration and
Validation Sets, so they will be almost similar distributed. Let´s see how to do it in Win ISI.
1)We are going to sort this CAL file by Protein ascending constituent value.
2) Now we are going to select every 5th sample for a validation set and the rest for a calibration set

9 sept. 2013

The importance of well trained operators for NIR success

(Imagine) We have develop a good calibration taking care of a lot of issues, like the performance of the instrument: precision and accuracy, the variability of the samples and source of variance of our calibration set, the instrument drifts using a check sample, the lab conditions: temperature, humidity,...., the way to present the sample to the instrument: grounded, well homogenized, diluted, clean without impurities, chopped, dry,....

Now this model is in routine and different operators are going to prepare and scan the samples in routine. According to the spectra and results, certain samples will be send to the laboratory for reference analysis in order to increase our database and to improve the calibration. At this point we must be careful, and we have to train the operators in order to present the sample as better as possible.

We have to explain them the importance of a good sampling, how to grind and homogenize the sample correctly, how to pack it, etc. This includes the importance to clean dust from the instrument, to run the diagnostics and check cell periodically, the cleaning of the cups,....

It occurs very often that because of very busy operators, lack of personnel, carelessness, boredom, and other things the results, correlation of the spectra with the constituents, ...., etc is quite poor and we won´t get the expected results from the NIR, thinking that the problem is the model. So we are going to add all these variables to aour calibration set, and the results will decrease in value.

So take all this into account and keep the operators as well trained as possible and tell him the importance of their work in the success of the NIR.

1 sept. 2013

Shootout 2002 files

If somebody is interested to play with the 2002 shootout, the files can be still downloaded from this link:
http://www.idrc-chambersburg.org/ss20022012.html.

I will write some post about these files, so you can follow better the posts. There are different literature about this shootout 2002, as this one from Eigenvector:
Development of an NIR Method for Determining the Active Ingredient in Pharmaceutical Tablets

NSAS Files are imported into  Win ISI or Vision.
JCAMP can be imported into other Chemometric software´s.

If you want to dowloaded in a Matlab format, visit:
http://www.eigenvector.com/data/tablets/

The post: Calibrating and validating with VISION (a quick view) has a video showing the files loaded in Vision.

 

8 ago. 2013

Electromagnetic Spectrum Basics

 
Nice video which explain the electromagnetic spectrum basics.

Lockheed Martin Ships NIR Camera for James Webb Space Telescope

"Lockheed Martin [NYSE: LMT], under a contract from the University of Arizona, has completed assembly and testing of the Near Infrared Camera (NIRCam) and has shipped the instrument to the NASA Goddard Space Flight Center in Greenbelt, Md. NIRCam is the prime near-infrared imaging instrument for NASA’s James Webb Space Telescope (JWST)".
Read more in the link.
Lockheed Martin · Lockheed Martin Ships Near Infrared Camera For James Webb Space Telescope to NASA Goddard Space Flight Center

6 ago. 2013

Calibrating and validating with VISION (a quick view)



This is a quick exercise about how to use VISION to develop a Regression and to validate it with an independent test file.
I used in this case the Shootout files from 2002, where we have the same samples for calibration scanned in two instruments (same type), this files are CALIBRA1 (scanned in Instrument 1) and CALIBRA2 (scanned in instrument 2).
We have  another sample set for testing the calibration, TEST1 (test samples scanned in Instrument 1) and TEST2 (test samples scanned in Instrument 2). The idea of the Shootout exercise was to develop a calibration in Instrument 1 and transfer this calibration to Instrument 2, the best way possible.
In this exercise I just develop a PLS calibration with the CALIBRA1 and test it with TEST 1, but how this calibration performs with TEST2, we will see it in other post.

24 jul. 2013

How it is made: Pet Food


We know the importance of NIR to controil our raw ingredients and the final product, for parameters as protein, fat, ash and moisture. Also the challenge to predict new parameters as cook value, peroxides,...., (see the post: Managing the production of consistent quality pet foods with NIR ). This can be done off-line in the LAB, or at line
Anyway the challenge is to use NIR at the production line (on line), meassuring parameters as the moisture at the output of the dryer in order to stay in constant limits for all our batches.

Managing the production of consistent quality pet foods with NIR

 
This is an article from Ben Helm, Commercial Manager at Premier Pet Nutrition and published in nirperformance.com.
 
Apart from the classical parameters  meassured in dry pet food: protein, moisture,ash and fat,  Ben talks about three new pet food NIR innovations;
1: Cook Value
2: Peroxide Value
3: Water activity 
 
The article talks about the importance to control our raw materials in order to maintain the quality in the batches.
NIR limitations (due to the low level contents: ppb) in parameters as Micotoxines.
 

19 jul. 2013

Fundamentals: IR vs. Raman

This video from show us the basic differences between Raman Spectroscopy and Infrared Spectroscopy.
 

12 jul. 2013

PHOTOGRAPHY: Interview to Michael Nichols



He is one of the main Nature photographer worldwide, so this is an amazing interview for all of us who love Nature photography and his pictures.
Exciting the moment when the tiger goes into the water pool and all the effort is compensated.
Amazing all the tips and resorces he use to capture their fantastic pictures.

23 jun. 2013

Score Matrix "T" in 3D

We have seen the score matrix, several times in this blog. Now, I create a vector with the scores values for every sample for the first 3 Principal components, or Loadings.
Tcomp1<-T[,1]  ........................T values for the first PC
Tcomp2<-T[,2]  ........................T values for the second PC
Tcomp3<-T[,3]  ........................T values for the third PC

Now there are diferent options to see this T matrix in 3D:
One of them can be:
library(scatterplot3d)scatterplot3d(Tcomp1,Tcomp2,Tcomp3)
 and we get a plot.

Other way is to use the package:
library(rgl)
plot3d(Tcomp1,Tcomp2,Tcomp3,col="red",size=5)

We can move the plot with the mouse and see the different faces of the cube. 

19 jun. 2013

"II Jornada NIR" - Trouw Nutrition - Masterlab


 
 Hoy  he participado como ponente en las "II Jornadas NIR" organizadas por Trouw Nutrition y Masterlab España con una ponencia sobre "Fundamentos de la Tecnología NIR" y "Buenas Prácticas en el uso del NIR". Aparte de como ponente el resto de las Jornadas, estuve muy atento a las distintas ponencias de Oscar Benito (Responsable NIR de Masterlab España), Javier Lallana (Jefe de Laboratorio de Masterlab España), Angela Gutierrez (ASC Nutreco), Dña Elena Albanell (UAB de Barcelona) y Dña Begoña de la Roza (Responsable del centro SERIDA).
D. Angel Sacristan (Director de Masterlab España) hizó una interesante introducción de Trouw Nutrition y Masterlab España que son sin duda empresas pioneras en el uso del NIR, y que actualmente disponen de una impresionante Red de equipos NIR conectados a traves de la plataforma RINA .
Sus proyectos para el futuro son sin duda muy interesantes y de mucho crecimiento a nivel Nacional e Internacional.
Gracias a Trouw, Masterlab y FOSS por permitirme asistir con una ponencia .
 

11 jun. 2013

Video: Movimiento Armónico Simple

Estos días estoy preparando una ponencia y presentación sobre los fundamentos teóricos del NIR, y siempre es interesante repasar estos videos, pues el movimiento armónico simple es la base para entender la vibración molecular y toda su aplicación en el Infrarrojo Medio (MIR) y en el Infrarrojo Cercano (NIR). Espero, cuando ya la tenga completada compartirla con los lectores del blog.

Otros videos a repasar son:

William Herschell experiment
Video: Middle Infrared (MIR) Theory Fundamentals
Vibración Molecular (Infrarrojo Medio)
Ley de Hooke (Walter Lewin)

y algunos otros posts con la etiqueta Fundamentos

10 jun. 2013

Working in transflectance


In the case of  some liquid and viscous samples when the light (in the case of this post-dispersive NIR instrument) cross the samples, we can lose a lot of energy (transmited light which does not reach the detectors)
 See this case (in the reallity is not so much light, because the photograph over-exposed the core of light)  looking the sample from the top side:
 
so maybe it become necessary a reflector, to work in transflectance. Video show how the reflector is placed over the sample cup looking from the down side.
When doing this take care that not bubbles or any other non desired material is on the pathlength under the reflection area.
 
 
In the case of chocolat it is not necessary to use the reflector:
 
 
 

6 jun. 2013

Definitions to describe process control

off-line -  Manual sampling from the process sent to the laboratory for NIR  analysis. In this case (normally) one or a few samples are analyzed for every batch, and when we get the result it can be late to change  the regulation of the process, with all the disadvantages that it means (re-process all the batch,.....).


at-line - Manual sampling from the process analysed in a NIR, sited close to the production line. It is normally used by the plant operators. It can be installed in the process control room, and the operators scan samples when they (based on their experience in the process) consider. Response to problems in the process is quicker than in the off-line case.
on-line - Automated sampling sent  through a sample line (by-pass) to be analyzed by a process NIR. This system is more used in liquids where there are a lot of issues related to the pressure, vacuum, and requires of a well design of the bypass to remove bubbles and other problems to get a representative spectrum.
We can use this system to carry flour or feed, by gravity to a bypass cell (when a valve opens) this way it can be analyzed in transmittnce or reflectance  after the analysis another other two valves can open, one in case we want ro return the sample to the process pipe or other to take out the sample for laboratory reference analysis. This can be a way to get a more representative sample with more correlation with the spectra acquired, but the disadvantage is that the percentage of sample analyzed for the batch  is less than in the in-line system.
in-line - A fiber optic probe connected to a process NIR, and inserted into the process line (Reactor, pipes,..), it can work in transmitance or transflectance . It can be also a process  NIR window placed in the process line  to work in difusse reflectance.

28 may. 2013

19 may. 2013

Mahalanobis: "This time with the NIPALS " T " matrix"

Using NIPALS algorithm for PCA, I get two objects:
T : score matrix
P : loadings matrix
library(chemometrics)
sflw.msc3.tra_nipals<-nipals(sflw.msc3.tra$NIRmsc,a=10,it=160)
"sflw.msc3.tra$NIRmsc" is my training set spectra treated with MSC.
names(sflw.msc3.tra_nipals)
[1] "T" "P"
dim(sflw.msc3.tra_nipals$T)
#T matrix dimension (n . p)
[1] 107  10
n = number of samples
p = number of PCs
dim(sflw.msc3.tra_nipals$P)
#P matrix dimension (m . p)
[1] 2800   10
m = number of wavelengths
p = number of PCs

Let´s use the drawMahal (classical) function in this case in the T matrix:
T_nipals<-sflw.msc3.tra_nipals$T
Xn.pc1pc2<-T_nipals[,1:2]
Xn.pc1pc3<-T_nipals[,c(1,3)] 
Xn.pc1pc4<-T_nipals[,c(1,4)]
Xn.pc2pc3<-T_nipals[,c(2,3)]
Xn.pc2pc4<-T_nipals[,c(2,4)]
Xn.pc3pc4<-T_nipals[,c(3,4)]   
par(mfrow=c(2,3))
drawMahal(Xn.pc1pc2,center=apply(Xn.pc1pc2,2,mean),
            covariance=cov(Xn.pc1pc2),quantile=0.975)
drawMahal(Xn.pc1pc3,center=apply(Xn.pc1pc3,2,mean),
            covariance=cov(Xn.pc1pc3),quantile=0.975)
drawMahal(Xn.pc1pc4,center=apply(Xn.pc1pc4,2,mean),
            covariance=cov(Xn.pc1pc4),quantile=0.975)
drawMahal(Xn.pc2pc3,center=apply(Xn.pc2pc3,2,mean),
            covariance=cov(Xn.pc2pc3),quantile=0.975)
drawMahal(Xn.pc2pc4,center=apply(Xn.pc2pc4,2,mean),
            covariance=cov(Xn.pc2pc4),quantile=0.975)
drawMahal(Xn.pc3pc4,center=apply(Xn.pc3pc4,2,mean),
            covariance=cov(Xn.pc3pc4),quantile=0.975)

 
To understand better these plots we can have a look to the P (loadings) matrix:
 P_nipals<-sflw.msc3.tra_nipals$P
par(mfrow=c(2,2))
matplot(data.points,P_nipals[,1],type="l",lty=1,xlab="nm",
        ylab="log 1/R")
matplot(data.points,P_nipals[,2],type="l",lty=1,xlab="nm",
        ylab="log 1/R",col="blue")
matplot(data.points,P_nipals[,3],type="l",lty=1,xlab="nm",
        ylab="log 1/R",col="green")
matplot(data.points,P_nipals[,4],type="l",lty=1,xlab="nm",
        ylab="log 1/R",col="brown")
 
 
Looking to the loading plots we can have an idea of which type of variability the loading is representing. In the first and the second (specially in the first one we can see some bands related to the water and fat.
 

 

submit to reddit

15 may. 2013

Robust Mahalanobis Ellipse

In this plot we compare the Mahalanobis ellipse, using as center, the mean of the columns and as covariance, the classical covariance (is the same plot than in the previous post, but in this case I added colors and symbols to the samples, according to their sample set), and the Mahalanobis ellipse, based on the robust statistics (in this case the Minimum Covariance Determinant).
The calculation of the MCD is done with the function: covMcd( ). This function gives new robust values: center and cov to use indeed the classical ones.

We see how the distribution of the samples, in the ellipse, change, and also the number of outliers detected.


12 may. 2013

Detecting Outliers (Mahalanobis)


This post is to continue with other post (Median Absolute Deviation). We have our score and loading matrix, and I want to check for outliers. We are going to use the Mahalanobis distance for this purpose, using our score matrix.
What is the dimension of our score matrix in this case. We used 4 PCs, so:
> dim(sflw.msc.rpc$scores)
[1] 211   4

We have in the rows the samples, and in the columns the scores of the samples for each PC.
We have seen how to plot the pairs for all the combinations of these four PCs, and now, what I want is to draw ellipses based in the Mahalanobis distance to detect outliers.
It is really helpful to have the book “Introduction to Multivariate Statistical Analysis in Chemometrics”,  from Kurt Varmuza and Peter Filzmozer, I recommend really to have it. They have developed the R package “chemometrics”, let´s use it:

>library(chemometrics)

This is a subset for the firsts PCs: PC1 and PC2

>X.pc1pc2<-sflw.msc.rpc$scores[,1:2]

Now let’s use the function:
 
>drawMahal(X.pc1pc2,center=apply(X.pc1pc2,2,mean),
 +covariance=cov(X.pc1pc2),quantile=0.975)

This plot appears, showing this nice ellipse and some samples out (outliers).

 
 

8 may. 2013

Median absolute deviation

Median absolute deviation - Wikipedia, the free encyclopedia

   
This is a robust statistic to use indeed the standard deviation. You can see in this Wikipedia link details of the theory.
This post is just to see how we can use this statistic with R for the calculation of the Principal Components.
Using the R Library: pcaPP, we use the function:
PCAgrid (x, k = 2, method = c ("mad", "sd", "qn"),.............)
In this function, we select the method to use, beeing "mad", the default one. "X" is our spectral matrix, and "k" is the number of components to compute.
Doing this calculations, we get a matrix with the loadings, and another matrix with the scores, apart from other staistics and values.
To follow some rules let´s call P to our loading matrix, and T to our scores matrix.
library(pcaPP)
sflw.msc.rpc<-PCAgrid(sflw.msc$NIRmsc,k=4,scale=mad)
P<-sflw.msc.rpc$loadings
T<-sflw.msc.rpc$scores
pairs(T[,1:4],col=c("red","blue","green","brown")[sflw.msc$Set])
I can see the samples used in the calibration set in green color and other comming from diferent instruments in other colors, in order to study the patterns and to know better my database.


 
 Continue this post with: Detecting outliers (Mahalanobis)
 
 

28 abr. 2013

Validating R-PLS Sunflower Seed Model (Part 01)

I have ten new sunflower seed samples, with laboratory data and I´m going to use them to validate the performance of a model developed in R with PLS:
Sunflower seed Regressions with "R" - 001
First, I  have a look to the spectra of the validation set (red spectra) compares with the training spectra (blue spectra), without any math treatment applied:

and after, with the MSC applied:

I see clearly some differences, but the idea is to check if the calibration is robust enough to predict the samples according to the statistics we got in the summary of the regression.
In the summary of Sunflower seed Regressions with "R" - 001 , we decide to use 7 terms for our predictions, so:

predict(sflw.g00rmn,ncomp=4,newdata=sflw.msc2.val)

                     G00rmn
171     46.25923
173     53.07202
176     53.48508
177     53.27027
178     46.05511
179     46.73826
180     50.95862
181     52.44956
182     47.59493
183     46.51557

The error is:

Let´s have a look to the "Reference vs Predicted" plot:

predplot(sflw.g00rmn,ncomp=7,newdata=sflw.msc3.val,
asp=1,line=TRUE,col=c("red"))