R & Chemometrics: mayo 2012

29 may 2012

Exporting from Win ISI / Importing into R

Chemometric software´s have the option to export a matrix to a TXT file (in this case a constituents matrix), in a way we can import it easily into R, to work with. It is the first step to go into the R world.
I use in this case Win ISI Software, but sure you can do it with othersoftware´s as well.
See the video:
Exporting from Win ISI / Importing into R

Mahalanobis distance with "R" (Exercice)

I have developed this exercise with Excel in another post for the same calculations , I am going to develop it this time with "R".

These are data of lead concentration in fish

     Age Length Weight   mg/Kg

1    28    31   130.0    68.12
2    24    28   143.0   127.89
3    28    20   136.0   89.03
4    32    34   130.5    78.28
5    22    15   125.0 134.08
6    26    37   147.5   135.31
7    24    19   135.0   130.48
8    28    22   125.0   86.48
9    24    26   127.0   129.47
10   30    21   139.0   82.43
11   22    20   121.5   127.41
12   30    38   150.5    71.21
13   24    17   120.0   132.06
14   26    20   125.0    90.85
We import the data into R.
x<-read.table("C:\\lead_fish.txt",header=TRUE)
We are going to apply the Mahalanobis Distance formula:

D^2 = (x - μ)' Σ^-1 (x - μ)

We calculate μ (mean) with:

mean<-colMeans(x)

Age Length Weight mg/Kg
26.28571 24.85714 132.50000 105.93571

We calculate Σ (covariance matrix (Sx)) with:

Sx<-cov(x)

> Sx
Age Length Weight mg/kg
Age 9.758242 12.81319 12.07692 -72.15407
Length 12.813187 56.90110 49.11538 -70.62066
Weight 12.076923 49.11538 92.80769 -46.06962
mg/Kg -72.154066 -70.62066 -46.06962 714.00118

The default value for the Mahalanobis function is inverted=FALSE, so the function will calculate the inverse of Sx. If we calculated appart remember to change to TRUE.

See R help:

http://127.0.0.1:25215/library/stats/html/mahalanobis.html

O.K. Let´s go:

>D2<-mahalanobis(x,mean,Sx)

> D2
[1] 5.571677 2.863499 2.686127 7.766153 2.379621 6.366793 2.135347 1.538248
[9] 2.018812 5.143830 3.082734 5.470313 3.158651 1.818195

These are the values in the Diagonal Matrix we saw with the calculations in Excel.

25 may 2012

Monitor: Adding "RER" and "RPD" statistics

I continue developing the Monitor function in R. The idea is to get statistics which help me to understand the performance of my model.

Of course the validation set must be free of outliers (X or Y).

I add this time two new statistics: RER and RPD.

These statistics must be treated with caution, because depends of the range, standard deviation and number of samples for the validation set.

See the new video for the calculation of these statistics in "R":
Monitor: Adding "RER" and "RPD" statistics

23 may 2012

First Derivative (Prof. Gilbert Straw Lecture)

We talked in this blog quite often about the first derivative as one of the main math treatments to apply to the spectra. This video is good way to understand the calculus behind this math treatment.

Ferris Bueller's Day Off

It is always good to see Gilbert Strang (MIT) lectures. At one point of the lecture, Prof. Strang talks about one film, and justifies the idea of the protagonist.

This is the film Prof. Gilbert Strang talks about on his lecture,.., I remember the film but not the title so I have to use Google to find it.

Sadly correct calculus was not applied by the manufacturer and a disaster happens.

22 may 2012

Water Spectrum - "Aquaphotomics"

As a reply to an e-mail I have received from Ronald (thanks for reading this blog regularly) I write this post, hoping it will be helpful for people working with spectra of water.
Water Spectrum is as you can see in this link quite complex, that is the reason a new term appears to describe the understanding of interaction of water with light: Aquaphotomics. This name was given by Prof. Roumiana Tsenkova.
You will find insteresting this link:
http://nirslab.org/en/research/aquaphotomics-introduction.html
There are some articles in the NIRnews magazine about "Aquaphotonics", and one of them about the Aquaphotomics: wavelengths involved in the speciation of metal ions (Zn 2+ , Pb 2+ and Ag + ) in aqueous solutions.

20 may 2012

Campeonato IBER-GT (Circuito del Jarama)

Automovilismo, un álbum en Flickr.

From time to time I like to add pictures I take to this blog.
I´ve been this sunday morning testing the new lens of my camera.

Jornada de Setas: "Perrechico"

Gracias Nacho por transmitirnos una parte de tus conocimientos sobre este fascinante mundo de la Micología. Un abrazo.

18 may 2012

Homemade Spectroscopy

Can you imagine, a spectrophotometer with a digital photo frame for the source, salt and pepper shakers for the sample presentation and a digital camera for the detector?.

This work:

Spectroscopy for Secondary Students: Exploring the Limits of Analysis at Home

was awarded from CAMO at the Eight Winter Symposium on Chemometric with the second place in the poster presentations.

Eight Winter Symposium on Chemometrics (WSC-8) took place in Moscow oblast, Feb 27-March 2 2012 in park hotel Drakino with about 60 participants

CAMO awards for the best poster presentation were given to
             1st place: Alexej Skvortsov (SPb Polytechnic University)
             2d place: Chris Marks Richmond USA
             3d place: Elena Vostroknutova (Ural Electrochemical Integrated Plant)

Chemometrics in Excel - Training

http://rcs.chph.ras.ru/Tutorials/courses/CAC2012/CAC2012.pdf

"Some considerations about NIR Spectroscopy"

This is the title of an article by Pierre Dardenne, about his closing speech during the NIR 2009 Conference. Sure it will be very helpful for all of you.
It can be downloaded free in "pdf" format from the Web:

Some considerations about NIR Spectroscopy: Closing speech at NIR 2009.

17 may 2012

Monitor: Removing zero values from the data set.

I continue developing the Monitor function. This time a video from "r twotorials”:

“how to access different records within a data frame by using logical tests in r”, gave me the idea to remove the zero values from a data set.

When somebody give you a table like the one I show, if the laboratory did not analyze a sample for any reason, I write a cero. So cero is not a reference value.

It will be diferent if we have for example: 0,001 or 0,00005, in this case we are managing very low concentrations, but we don`t say just "0".

Anyway we can do a similar approach if we prefer "NA" indeed "0".

So, I prepare a new video with how the function performs right now:

Monitor: Removing zero values from the data set.

Hope you like.

15 may 2012

Improving script_002: “Monitor”

I read in an article that Ian Cowe said that what normally chemometricians do is to look to the graphics, of course interpret those graphics. So I still go on trying to develop a function can help me to understand the graphics and all the statistics there are behind.

I add some more lines to the monitor function:

plot(x~y,main="X-Y plot",xlab="predicted",ylab="reference")

abline(0,1,col="blue")

abline(intercept,slope,col="red")

abline(intercept+(2.5*sep),slope,col="red",lty=4)

abline(intercept-(2.5*sep),slope,col="red",lty=4)

I can do the same for the residual plot.

There are two warning lines which advice if the residual exceeds 2,5*SEP value. That is the T value warning limit.

Another line can be add, called the T value action limit (3*SEP).

Graphics show the 0-1 abline (blue) and the calculated slope-intercept abline (red). Limits are with dashed red lines.

We can see that almost both lines red and blue are almost over-plotted in this case.

14 may 2012

Livestock Food Productivity: Farmeron Foundations #3

Farmeron is a company founded by Matija Kopic. Really an interesting company.

Livestock Food Productivity: Farmeron Foundations #3

On Line: Reflectance Sample presentation

This is a way to present the sample to the instrument to work "on line" in reflectance mode. Diode Array detectors guarantee to acquire spectra every few seconds, and if the model is working properly all the production is under control.

10 may 2012

Should I adjust the Bias?

A bias or systematic error is quite common when monitoring predictions vs reference data. Anyway we must have certain control limits to decide if the Bias is significant or not.
Procedures (as for example ISO 12099 )give details about how to calculate the Bias Control Limits (BCL). The idea is a "T test" to calculate if the differences between the mean predicted values mean and the mean reference values are significant different from cero.

This limits will be a certain percentage of the SEP (Standard Error of Prediction).

We can add all this calculations into R and improve the Monitor function to receive a warning if the adjustment of the Bias is or not necessary.
I record this video: Should I adjust the Bias? to see how fast is R to run these monitor, and how we can customize the plots the way we like.
Function can be expanded with more warnings, like (for example) if it is necessary to adjust the slope.

6 may 2012

Improving script_001: “Monitor”

After having a look to this video: http://www.screenr.com/UxH8 from rtwotutorials, and reading some tutorials, I decided to modified the script from the previous post: Practicing Script with “ R”: Monitor ,

in order to make it more robust .

If there are NA values in our X and Y variables, the results for all the statistics will be NA (see also the video: http://www.screenr.com/loS8), that is not nice, so it´s better to write a warning and stop the analysis to check our data set.

So the new script is:

monitor3<-function(x,y){
x1<-is.na(x)
y1<-is.na(y)
if(mean(x1|y1)>0){
         print("There are NA values in X or Y, remove these samples for calculation")
        }else{
n<-length(y)
res<-y-x
par(mfrow=c(2,2))
hist(res,col="blue")
plot(x~y,xlab="predicted",ylab="reference")
abline(0,1,col="blue")
l<-seq(1:n)
plot(res~l,col=2)
abline(h=0,col="blue")
boxplot(x,y,col="green")
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}
}

After having some samples with NA values (not reference value for that sample or not predicted value) the output will be:

"There are NA values in X or Y, remove these samples for calculation"

If not it will continue with the calculations:

RMSEP: 0.4373108
Bias : 0.03814815
SEP : 0.4439425
Corr : 0.4864254
RSQ : 0.2366097

Possible improvements I think right now:

The function gives me the position of the samples with NA values, so location is easier

that the function removes these samples and the calculations can be done without them.

4 may 2012

Practicing Script with “ R”: Monitor

These are samples analyzed by a reference method (column: Protein) and by an analytical method with a certain model (column: IFTpro). The idea is to create a Monitor Report for some basic statistics (RMSEP, Bias, SEP, R,RSQ) to see how well the model performs.
Sample Protein  IFTpro
3      12.85    12.95
4      12.68    12.59
5      11.94    12.12
6      12.07    12.25
7      12.53    12.35
8      11.82    12.20
9      12.58    12.18
10     12.35    12.27
11     12.38    12.32
12     12.15   12.31
13     12.75    12.28
14     12.51    12.07
15     11.92    12.20
16     12.14   12.24
17     12.33    12.27
18     12.15    12.10
20     11.82    11.94
21     11.82   12.05
22     12.36    12.05
23     12.06    11.91
24     11.87    11.98
25     11.81   11.80
26     11.53    11.64
27     11.75    11.84
I take this as a practice with R to write some script.
This is the script:
monitor2<-function(x,y){
n<-length(y)
res<-y-x
par(mfrow=c(2,2))
hist(res,col="blue")
plot(x~y,xlab="predicted",ylab="reference",lty=1)
abline(0,1,col="blue")
l<-seq(1:n)
plot(res~l)
abline(0,0,col="blue")
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}The statistics for this case are:
> monitor2(semola1$Protein,semola1$IFTpro)
RMSEP: 0.2219797
Bias : -0.01083333
SEP : 0.2264838
Corr : 0.772607
RSQ : 0.5969215
And the plots:

I realized that there are a lot of things to improve. to make this script more robust. So I will continue reading tutorials, R help pages, and posts from R blogger,...looking at videos, webinars, reading books,.... to continue improving. Anyway, feel free to take this scrip and add to me feedback.

1 may 2012

Introduction to R Statistical Software: Application to Plant Breeding Webinar - eXtension

Introduction to R Statistical Software: Application to Plant Breeding Webinar - eXtension

Monitoring some statistics with "R"

I´ve been practicing after reading a couple of tutorials:

R: A self-learn tutorial

Programming in R

to create a basic function to monitor some basic statistics as RMSEP, Bias, SEP, Correlation and RSQ.

I´ve been doing this with other software`s, so it´s time for “R”. This is the script, please add feedback to improve it.

monitor2<-function(x,y){

n<-length(y)

res<-y-x

{rmsep<-sqrt(sum((y-x)^2)/n)

cat("RMSEP:",rmsep,"\n")}

{(bias<-mean(res))

cat("Bias :",bias,"\n")}

{sep<-sd(res)

cat("SEP :",sep,"\n")}

{r<-cor(x,y)

cat("Corr :",r,"\n")}

{rsq<-(r^2)

cat("RSQ :",rsq,"\n")}

}

Example:

> x<-c(1:10)

> y<-c(2:11)

> monitor2(x,y)

RMSEP: 1

Bias : 1

SEP : 0

Corr : 1

RSQ : 1