R & Chemometrics: octubre 2012

29 oct 2012

Working with Shootout - 2012 in R (001)

I have downloaded (from the IDRC) the ASCII files of the Shootout 2012 (see: Shootout 2012 files), so I can work with the data to develop a model and predict a Validation Set.
For that task I have a "Calibration Set", and a "Test Set".
We can read details for this task in the IDRC web page: "instructions".
Spectra is acquire in an FTIR instrument, and the space between wavelengths (X axis) is non linear, so I changed it by values 1.0, 2.0,.......,372.0.
wavelengths<-seq(1.0,372.0,by=1)
I had to arrange the data to import it into R, and to organize the data frame in order to start with the observation of the spectra and the distribution.
As in other posts I am going to use "Chemometrics with R" package.
If we plot the calibration samples without any treatment we see like two sets of samples. This is an indication (as we work in transmittance) that probably there are differences in the pathlength:

Now we can apply the MSC (Multiple Scatter Correction) to reduce this physical proprieties and to enhance the chemical changes:

MSC here works really well and we can see that most of the variability is in the area from 200 to 240 aproximatelly.

wave_var<-seq(200.0,240.0,by=1)

matplot(wave_var,t(NITmsc[,200:240]),lty=3,pch=20,

+ lwd=0.1,xlab="wavelengths",ylab="T%")

Now we can see at less 3 clusters.

Let´s have a look now to the histogram:

hist(Active,col="blue")

We can start to get some conclusions to continue.

26 oct 2012

Getting a representative sample – 02

Sunflower seed requires special procedure to be analysed by NIR. It is important to get a representative sample to be analysed (see: Getting a representative sample – 01 ), and the sample must be presented to the instrument in the best posible way in order to reduce the sampling error.Customers require a representative prediction value, because a lot of money is involve.

La semilla de girasol requiere de un procedimiento especial para su análisis por NIR. Es importante obtener una muestra representativa para ser analizada (ver: Getting a representative sample – 01 ), y dicha muestra debe de ser presentada al equipo NIR de la mejor manera posible para reducir el error de muestreo. Los clientes requieren un valor de predicción representativo, ya que mucho dinero está en juego.

Grind the sample / Moliendo la muestra

Be sure to remove everything from the grinder

Asegurarse de vaciar bien el molino

Good homogenizing of the sample

Homogenizar bien la muestra

After this step we fill the cup and analyze it in the NIR instrument

Después de este paso, ponemos la muestra en la cápsula y la analizamos por el NIR.

Agradecimientos a Helena Rios de AVICON

25 oct 2012

Shootout 2012 files

Visit the shootout-2012 webpage, to get the files and practice chemometrics with softwares like Unscrambler, Vision, Matlab, WinISI,...

Data is also in ASCII format so we can import it into programs like R.

This is a tradicional event of the IDRC (International Diffuse Reflectance Conference), which takes place every two years al Chambersburg - Pennsylvania (USA).

Participants work with the data in order to get the better statistic and originals approachs. This year the data is spectra of pharmaceutical tablets.

Participants get a Training Set, a Test Set (with Lab data), and a Validation Set without lab data and a Validation Set without Lab values.

After the model was created, validation file is predicted and the results send to shootout chair for evaluation with a presentation of the approach used.

Karl Norris (IDRC-2012)

21 oct 2012

Looking to the PCA scores with GGobi

In this post I continue with the unsupervised exploration of oil spectra, which we have seen in previous post ( PCA with "ChemoSpec" - 001).
In the manual "ChemoSpec:An R Package for Chemometric Analysis of Spectroscopic Data", (page 23) there is a brief description about how to get very nice plots to look to our data in the Principal Component Space using the GGobi software, and the rggobi package.
With the function
> plotScoresG(oils,class)
GGobi opens and let me see the plots in diferent ways. I can see the different PC scores planes, rotate them,...., and to get a better knowledge of the clusters.
*olive samples are yellow and the sunflower oil samples are red.

20 oct 2012

PCA with "ChemoSpec" - 001

In my last post about "ChemoSpec package" (Hierarchical Cluster Analysis (ChemoSpec) - 02), we saw the two cluster groups (one for olive oil, other for sunflower oil), and also another sub-clusters for the sunflower oil.
Continue reading the manual "ChemoSpec:An R Package for Chemometric Analysis of Spectroscopic Data" by Bryan A. Hanson, I decide to apply the PCA to the oil data.
PCA is a unsupervised discriminate method and it will give me another vision of the clusters.

Let´s have a look first to the HCA plot from (Hierarchical Cluster Analysis (ChemoSpec) - 02):

Lets calculate the PCA for the same data (remember that the spectra is math treated with the second derivative).I will use the option "classical" from the two main options (classical and robust).

>class<-classPCA(oils,choice="noscale")
>plotScores(oils,title="OilsSpectra",class,

+ pcs=c(1,2),ellipse="none",tol=0.01)

If we realize, we have similar information in both plots: One cluster for olive oil (red point to the left) and to the right other sub-clusters (3) for the sunflower oil.

This two PCs explain almost all the variance (99,4%).

16 oct 2012

Curso Win ISI 2012 (Segovia)

Estas son algunas de las fotos realizadas durante el Curso Win ISI 2012 realizado en Segovia, en el que participaron, la Doctora Begoña de la Roza (SERIDA) con una muy interesante ponencia sobre el uso del NIR para los análisis de forrajes, Antonio Serrano de "NIR Soluciones", impartiendo la formación del desarrollo de calibraciones globales y todo el trabajo previo de preparación de los conjuntos de entrenamiento, validación, tratamientos matemáticos,...etc, por último, yo mismo, con una ponencia práctica del desarrollo de modelos discriminantes y su implementación en rutina.

Gracias a los asistentes así como a Begoña y Antonio.

15 oct 2012

Equations with indicator variables - part 1

Sometimes it is necessary to merge spectra files from different instruments (standardized or not) to get a bigger data base with more variability, range,…

We would like that all the laboratory values would come from the same lab, but this is not normally the case, and the lab data comes from different labs (probably one for each instrument). In that case we can add some indicator variables to help on this to the software.

Suppose we have 2 spectra files, one from one instrument (1) with lab values from Lab1, and the other from a different instrument (2) with lab values from Lab2. In this case we create an indicator variable adding “ceros” for the instrument 1_lab1 spectra:

And “ones” to the instrument2_ lab2 spectra :

In the case of three instruments, three labs, we would need 2 indicator variables:

For instrument 1_lab1:

0 0

For instrument 2_lab2:

1 0

For instrument 3_lab3:

0 1

In the case of 4 instruments, four labs, we would need 3 indicator variables:

For instrument 1_lab1:

0 0 0

For instrument 2_lab2:

1 0 0

For instrument 3_lab3:

0 1 0

For instrument 4_lab4:

0 0 1

And so on,

Of course we can use this method for only one instrument with lab data from four labs, so in this case it will be:

Instrument 1_lab1, instrument 1_lab2, instrument 1_lab3 and instrument 1_lab4

8 oct 2012

Updating to Win ISI 4.xx (does not recognize the dongle)

When installing Win ISI 4 in a new computer, see first if the software is a full version or an upgrade.
If the software is an upgrade and you install it as a full version, it will not recognize the dongle. By default the installation program choose "Full Version", so it is easy to make this mistake.

7 oct 2012

William Herschell experiment

This is a good link to a page where the experiment of Sir William Herschell (discovering the NIR region) is developed and very well described with cheap materials.
An Example of the Herschel Experiment

3 oct 2012

Jornada sobre control de procesos con NIR

Hoy día 3 de Octubre, se ha celebrado una jornada para usuarios de equipos NIR, para mostrarles las ventajas de los equipos NIR "on line". Las jornadas se desarrollaron en un hotel centrico de Segovia donde se realizaron unas interesantes ponencias y posteriormente se visito la fábrica de Garese, donde los anfitriones (Lorenzo y Patricia) nos mostraron sus magníficas inslalaciones, y nos comentaron las ventajas que les aporta el control del pienso con un equipo NIR de procesos.
Salieron comentarios interesantes acerca de como tratar la cantidad de datos que se generan, con el fín de obtener el mayor beneficio de este tipo de equipos.

El tomamuestras utilizado para recoger una muestra representativa para analizar por el método de referencia es de Barreal y lo podéis ver en:

http://www.cmbarreal.com/index_archivos/Page1117.htm

1 oct 2012

Getting a representative sample - 01

In the NIR we analyze a small sample, which has to be representative of a huge amount, so it is important to split the first sampling from the truck into another smaller representative sample. This video shows an easy way to do it.

After this, the sample is grinded, homogenized, and placed in a small cup to be analyzed in the instrument.

En los equipos NIR, analizamos partes muy pequeñas que deben de representar a grandes cantidades. Es por tanto importante el cuartear la muestra inicial recogida con un tomamuestras o lanza del camión, para obtener una mas pequeña y representativa, que será molida, homogenizada y añadida a la cubeta para ser analizada.

Agradecimientos a Emilio de "Cereales La Almarcha".