27 ene 2013

LOCAL: Creating the RED files

This video show how to reduce a file in Win ISI, previously I recommend to see the other videos about  LOCAL Models:

LOCAL Equations with Win ISI 4 (Part: 1)
LOCAL en Win ISI 4 (Parte 2): "Monitor"

I will add other videos showing how to implement this RED files for LOCAL Models in the Routine Software´s.


26 ene 2013

Jeff Beck one of the Greatest

You know that from time to time I like to add some videos about my favorite guitar players in this blog. One of them is Jeff Beck.
He has in common with Adrian Belew (I added some of his videos) that they likes Beatles a lot.
He is one of the Greatest.
Enjoy!

24 ene 2013

Determinación de la acidez


He abierto una nueva etiqueta en el blog para archivar información y videos de  los métodos de referencia comúnmente usados para desarrollar calibraciones de parámetros por NIR.
En este caso el Aula Láctea en Galicia tiene un canal de Youtube dedicado a los análisis y procesos de productos lácteos, y este es uno de sus videos en el que comentan el método de referencia para el análisis de acidez en productos lácteos.


Calculando el : Extracto seco total

He abierto una nueva etiqueta en el blog para archivar información y videos de  los métodos de referencia comúnmente usados para desarrollar calibraciones de parámetros por NIR.
En este caso el Aula Láctea en Galicia tiene un canal de Youtube dedicado a los análisis y procesos de productos lácteos, y este es uno de sus videos en el que comentan el método de referencia para el análisis del Extracto Seco Total en productos lácteos.

23 ene 2013

Tools to use and share "R"

Reading the blog Revolutions and his post "A beginner's guide to sharing and collaboration with R",  I found interesting information about how to share projects with R, how to develop R packages,....

The guide from Noam Ross blog post: "Don't R alone! A guide to tools for collaboration with R" is really useful.

20 ene 2013

Looking to boxplots (Shootout 2012)

Boxplots are a nice way to compare the three sample sets of the Shoot-out 2012 data files.
There is a category variable (Set) in the data frame with the labels (Cal = Training Set, Test = Test Set and Val = Validation Set).

# IMPORTING THE SAMPLE SETS #
shootout2012.raw<-read.csv("Shootout2012_R.csv",header=TRUE)
# ORGANIZING THE DATA-FRAME #
NIT<-shootout2012.raw[,4:375]
Active<-shootout2012.raw[,3]
Set<-shootout2012.raw[,2]

shootout2012<-data.frame(Set=I(Set),Active=I(Active),
+ NIT=I(NIT))

names(shootout2012)   
#  "Set"    "Active" "NIT"                 
attach(shootout2012)
boxplot(Active ~ Set,main="Shootout 2012",xlab="Sample Sets",
+ col="grey")
 


aggregate(Active ~ Set, summary, data=shootout2012)
 Set     Active.Min.   Active.1st Qu.  Active.Median  Active.Mean
1  Cal       4.740          6.680         8.390         7.550
2 Test       5.120          7.050         7.950         7.386
3  Val       4.610          7.240         8.000         7.520
     Active.3rd Qu. Active.Max.
1          8.750       9.790
2          8.142       8.480
3          8.135       8.580


Previous posts in this blog about the Shoot-out 2012 data.
See also Label "Shootout 2012)"

Sample Sets" plots (Shootout-2012)
Shootout 2012: Test & Val Sets proyections
Working with Shootout - 2012 in R (001)
Shootout 2012 files

11 ene 2013

Revolutions: Elements of Statistical Learning

Interesting information from the blog Revolutions, about some files we can download from a web site. 

Go to the Revolution post:

Elements of Statistical Learning: free book download

where you will find details about how to use all these material.


8 ene 2013

LOCAL en Win ISI 4 (Parte 2): "Monitor"

We have seen in a previous post how to get the best configuration for a local calibration (Min and Max number of samples and factors). See: LOCAL Equations with Win ISI 4 (Part: 1)

We have a RED file which has the math-treatments applied, and a reduction of the number of data points if necessary. This RED file can have also a certain  range of wavelengths (NIR, NIR + VIS,..., or selected ranges into these areas). The better prepared the RED file, better and faster results we will get.

Of course the predictions will change depending of the RED file, so it is important to check with trial and error the best configuration for this file before to implement it in routine.

7 ene 2013

LOCAL Equations with Win ISI 4 (Part: 1)

There is quite a lot of theory behind the LOCAL algorithms (from Infrasoft International), anyway this is a first and quick approach to them.

The idea is to develop an equation for a certain constituent, based on the most adequate samples from a database (red file which has the math treatments already applied). Of course if there are not adequate samples we won´t get a prediction, in that case we can send that sample to the lab, and add the spectrum with its corresponding lab value to the database for future reprocessing.

During the calibration (on the fly) procedure, a certain number of factors, and a certain number of samples are chosen,  always between the max. and min. that we have configured in the prediction model configuration.

There is not risk of overfitting, calibration is PLS.

I will include more post and videos about this interesting calibration method.


3 ene 2013

R and Data Mining: Examples and Case Studies

I really recommend to download the "pdf" files for some of the chapters of the book:
“R and Data Mining: Examples and Case Studies”. The link is:
http://www.rdatamining.com/docs

You can download also the R code and other interesting material to practice with R. It is very good for beginners.

You can see all the table of contents in the blog: RDataMining

1 ene 2013

Standard Normal Variate (SNV: Other way)

This is another way to pre-treat aspectra set with the SNV math-treatment
 (Standard Normal Variate). You can see the other one in the post :

In this post, I use the R function "sweep".

library(ChemometricsWithR)

#in a first step I calculate the average value
#of all the data points for every spectrum and
#subtract it to every data point of the
#spectrum using the function "colMeans"
#from the package "ChemometricsWhithR"
#the mean value for every spectrum is now cero.
NIR.1<-sweep(gasoline$NIR,MARGIN=1,
+colMeans(t(gasoline$NIR)),FUN="-")

#sd function calculates the SD for all the data
#points of every #spectrum.
#We divide now the value of every data point
#by the SD of all the values of that spectrum.
NIR.2<-sweep(NIR.1,MARGIN=1,
+sd(t(gasoline$NIR)),FUN="/")

#Now the spectrum has a mean of cero and a SD of 1.
#Use matplot to plot the spectra. 
matplot(wavelengths,t(NIR.2),type="l",lty=1,
+xlab="nm",ylab="log 1/R",
+main="SNV Gasoline Spectra",col="blue")

We have to take consider that in the Gasoline matrix, the rows are the
samples and the columns the wavelengths, so we have to transpose the matrix
for some calculations.

Gasoline is a data set included in the “pls” package. Is not a set to see the benefits
 of the SNV math treatment (not enough scatter), but you can try
with other data sets as "yarn".