Chemometric software´s have the option to export a matrix to a TXT file (in this case a constituents matrix), in a way we can import it easily into R, to work with. It is the first step to go into the R world.
I use in this case Win ISI Software, but sure you can do it with othersoftware´s as well.
See the video:
Exporting from Win ISI / Importing into R
29 may 2012
Mahalanobis distance with "R" (Exercice)
I have developed this exercise with Excel in another post for the same calculations , I am going to develop it this time with "R".
These are data of lead concentration in fish
Age Length Weight mg/Kg
1 28 31 130.0 68.12
2 24 28 143.0 127.89
3 28 20 136.0 89.03
4 32 34 130.5 78.28
5 22 15 125.0 134.08
6 26 37 147.5 135.31
7 24 19 135.0 130.48
8 28 22 125.0 86.48
9 24 26 127.0 129.47
10 30 21 139.0 82.43
11 22 20 121.5 127.41
12 30 38 150.5 71.21
13 24 17 120.0 132.06
14 26 20 125.0 90.85
We import the data into R.
x<-read.table("C:\\lead_fish.txt",header=TRUE)
We are going to apply the Mahalanobis Distance formula:
D^2 = (x - μ)' Σ^-1 (x - μ)
We calculate μ (mean) with:
mean<-colMeans(x)
Age Length Weight mg/Kg
26.28571 24.85714 132.50000 105.93571
26.28571 24.85714 132.50000 105.93571
We calculate Σ (covariance matrix (Sx)) with:
Sx<-cov(x)
> Sx
Age Length Weight mg/kg
Age 9.758242 12.81319 12.07692 -72.15407
Length 12.813187 56.90110 49.11538 -70.62066
Weight 12.076923 49.11538 92.80769 -46.06962
mg/Kg -72.154066 -70.62066 -46.06962 714.00118
Age Length Weight mg/kg
Age 9.758242 12.81319 12.07692 -72.15407
Length 12.813187 56.90110 49.11538 -70.62066
Weight 12.076923 49.11538 92.80769 -46.06962
mg/Kg -72.154066 -70.62066 -46.06962 714.00118
The default value for the Mahalanobis function is inverted=FALSE, so the function will calculate the inverse of Sx. If we calculated appart remember to change to TRUE.
See R help:
O.K. Let´s go:
>D2<-mahalanobis(x,mean,Sx)
> D2
[1] 5.571677 2.863499 2.686127 7.766153 2.379621 6.366793 2.135347 1.538248
[9] 2.018812 5.143830 3.082734 5.470313 3.158651 1.818195
These are the values in the Diagonal Matrix we saw with the calculations in Excel.
[1] 5.571677 2.863499 2.686127 7.766153 2.379621 6.366793 2.135347 1.538248
[9] 2.018812 5.143830 3.082734 5.470313 3.158651 1.818195
These are the values in the Diagonal Matrix we saw with the calculations in Excel.
25 may 2012
Monitor: Adding "RER" and "RPD" statistics
I continue developing the Monitor function in R. The idea is to get statistics which help me to understand the performance of my model.
Of course the validation set must be free of outliers (X or Y).
I add this time two new statistics: RER and RPD.
These statistics must be treated with caution, because depends of the range, standard deviation and number of samples for the validation set.
See the new video for the calculation of these statistics in "R":Monitor: Adding "RER" and "RPD" statistics
23 may 2012
First Derivative (Prof. Gilbert Straw Lecture)
Ferris Bueller's Day Off
It is always good to see Gilbert Strang (MIT) lectures. At one point of the lecture, Prof. Strang talks about one film, and justifies the idea of the protagonist.
This is the film Prof. Gilbert Strang talks about on his lecture,.., I remember the film but not the title so I have to use Google to find it.
Sadly correct calculus was not applied by the manufacturer and a disaster happens.
22 may 2012
Water Spectrum - "Aquaphotomics"
As a reply to an e-mail I have received from Ronald (thanks for reading this blog regularly) I write this post, hoping it will be helpful for people working with spectra of water.
Water Spectrum is as you can see in this link quite complex, that is the reason a new term appears to describe the understanding of interaction of water with light: Aquaphotomics. This name was given by Prof. Roumiana Tsenkova.
You will find insteresting this link:
http://nirslab.org/en/research/aquaphotomics-introduction.html
There are some articles in the NIRnews magazine about "Aquaphotonics", and one of them about the Aquaphotomics: wavelengths involved in the speciation of metal ions (Zn 2+ , Pb 2+ and Ag + ) in aqueous solutions.
Water Spectrum is as you can see in this link quite complex, that is the reason a new term appears to describe the understanding of interaction of water with light: Aquaphotomics. This name was given by Prof. Roumiana Tsenkova.
You will find insteresting this link:
http://nirslab.org/en/research/aquaphotomics-introduction.html
There are some articles in the NIRnews magazine about "Aquaphotonics", and one of them about the Aquaphotomics: wavelengths involved in the speciation of metal ions (Zn 2+ , Pb 2+ and Ag + ) in aqueous solutions.
20 may 2012
Campeonato IBER-GT (Circuito del Jarama)
Automovilismo, un álbum en Flickr.
From time to time I like to add pictures I take to this blog.I´ve been this sunday morning testing the new lens of my camera.
Jornada de Setas: "Perrechico"
Gracias Nacho por transmitirnos una parte de tus conocimientos sobre este fascinante mundo de la Micología. Un abrazo.
18 may 2012
Homemade Spectroscopy
Can you imagine, a spectrophotometer with a digital photo frame for the source, salt and pepper shakers for the sample presentation and a digital camera for the detector?.
This work:
was awarded from CAMO at the Eight Winter Symposium on Chemometric with the second place in the poster presentations.
This work:
was awarded from CAMO at the Eight Winter Symposium on Chemometric with the second place in the poster presentations.
Eight Winter Symposium on Chemometrics (WSC-8) took place in Moscow oblast, Feb 27-March 2 2012 in park hotel Drakino with about 60 participants
CAMO awards for the best poster presentation were given to
1st place: Alexej Skvortsov (SPb Polytechnic University)
2d place: Chris Marks Richmond USA
3d place: Elena Vostroknutova (Ural Electrochemical Integrated Plant)
1st place: Alexej Skvortsov (SPb Polytechnic University)
2d place: Chris Marks Richmond USA
3d place: Elena Vostroknutova (Ural Electrochemical Integrated Plant)
"Some considerations about NIR Spectroscopy"
This is the title of an article by Pierre Dardenne, about his closing speech during the NIR 2009 Conference. Sure it will be very helpful for all of you.
It can be downloaded free in "pdf" format from the Web:
It can be downloaded free in "pdf" format from the Web:
Some considerations about NIR Spectroscopy: Closing speech at NIR 2009.
17 may 2012
Monitor: Removing zero values from the data set.
I continue developing the Monitor function. This time a video from "r twotorials”:
“how to access different records within a data frame by using logical tests in r”, gave me the idea to remove the zero values from a data set.
When somebody give you a table like the one I show, if the laboratory did not analyze a sample for any reason, I write a cero. So cero is not a reference value.
It will be diferent if we have for example: 0,001 or 0,00005, in this case we are managing very low concentrations, but we don`t say just "0".
Anyway we can do a similar approach if we prefer "NA" indeed "0".
So, I prepare a new video with how the function performs right now:
Hope you like.
15 may 2012
Improving script_002: “Monitor”
I read in an article that Ian Cowe said that what normally chemometricians do is to look to the graphics, of course interpret those graphics. So I still go on trying to develop a function can help me to understand the graphics and all the statistics there are behind.
I add some more lines to the monitor function:
plot(x~y,main="X-Y plot",xlab="predicted",ylab="reference")
abline(0,1,col="blue")
abline(intercept,slope,col="red")
abline(intercept+(2.5*sep),slope,col="red",lty=4)
abline(intercept-(2.5*sep),slope,col="red",lty=4)
I can do the same for the residual plot.
There are two warning lines which advice if the residual exceeds 2,5*SEP value. That is the T value warning limit.
Another line can be add, called the T value action limit (3*SEP).
Graphics show the 0-1 abline (blue) and the calculated slope-intercept abline (red). Limits are with dashed red lines.
14 may 2012
Livestock Food Productivity: Farmeron Foundations #3
On Line: Reflectance Sample presentation
This is a way to present the sample to the instrument to work "on line" in reflectance mode. Diode Array detectors guarantee to acquire spectra every few seconds, and if the model is working properly all the production is under control.
10 may 2012
Should I adjust the Bias?
A bias or systematic error is quite common when monitoring predictions vs reference data. Anyway we must have certain control limits to decide if the Bias is significant or not.
Procedures (as for example ISO 12099 )give details about how to calculate the Bias Control Limits (BCL). The idea is a "T test" to calculate if the differences between the mean predicted values mean and the mean reference values are significant different from cero.
Procedures (as for example ISO 12099 )give details about how to calculate the Bias Control Limits (BCL). The idea is a "T test" to calculate if the differences between the mean predicted values mean and the mean reference values are significant different from cero.
This limits will be a certain percentage of the SEP (Standard Error of Prediction).
We can add all this calculations into R and improve the Monitor function to receive a warning if the adjustment of the Bias is or not necessary.
I record this video: Should I adjust the Bias? to see how fast is R to run these monitor, and how we can customize the plots the way we like.
Function can be expanded with more warnings, like (for example) if it is necessary to adjust the slope.
I record this video: Should I adjust the Bias? to see how fast is R to run these monitor, and how we can customize the plots the way we like.
Function can be expanded with more warnings, like (for example) if it is necessary to adjust the slope.
6 may 2012
Improving script_001: “Monitor”
After having a look to this video: http://www.screenr.com/UxH8 from rtwotutorials, and reading some tutorials, I decided to modified the script from the previous post: Practicing Script with “ R”: Monitor ,
in order to make it more robust .
If there are NA values in our X and Y variables, the results for all the statistics will be NA (see also the video: http://www.screenr.com/loS8), that is not nice, so it´s better to write a warning and stop the analysis to check our data set.
So the new script is:
monitor3<-function(x,y){x1<-is.na(x)
y1<-is.na(y)
if(mean(x1|y1)>0){
print("There are NA values in X or Y, remove these samples for calculation")
}else{
n<-length(y)
res<-y-x
par(mfrow=c(2,2))
hist(res,col="blue")
plot(x~y,xlab="predicted",ylab="reference")
abline(0,1,col="blue")
l<-seq(1:n)
plot(res~l,col=2)
abline(h=0,col="blue")
boxplot(x,y,col="green")
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}
}
After having some samples with NA values (not reference value for that sample or not predicted value) the output will be:
"There are NA values in X or Y, remove these samples for calculation"If not it will continue with the calculations:
RMSEP: 0.4373108
Bias : 0.03814815
SEP : 0.4439425
Corr : 0.4864254
RSQ : 0.2366097
Possible improvements I think right now: Bias : 0.03814815
SEP : 0.4439425
Corr : 0.4864254
RSQ : 0.2366097
The function gives me the position of the samples with NA values, so location is easier
or
that the function removes these samples and the calculations can be done without them.
4 may 2012
Practicing Script with “ R”: Monitor
These are samples analyzed by a reference method (column: Protein) and by an analytical method with a certain model (column: IFTpro). The idea is to create a Monitor Report for some basic statistics (RMSEP, Bias, SEP, R,RSQ) to see how well the model performs.
Sample Protein IFTpro
3 12.85 12.95
4 12.68 12.59
5 11.94 12.12
6 12.07 12.25
7 12.53 12.35
8 11.82 12.20
9 12.58 12.18
10 12.35 12.27
11 12.38 12.32
12 12.15 12.31
13 12.75 12.28
14 12.51 12.07
15 11.92 12.20
16 12.14 12.24
17 12.33 12.27
18 12.15 12.10
20 11.82 11.94
21 11.82 12.05
22 12.36 12.05
23 12.06 11.91
24 11.87 11.98
25 11.81 11.80
26 11.53 11.64
27 11.75 11.84
I take this as a practice with R to write some script.
This is the script:
monitor2<-function(x,y){
n<-length(y)
res<-y-x
par(mfrow=c(2,2))
hist(res,col="blue")
plot(x~y,xlab="predicted",ylab="reference",lty=1)
abline(0,1,col="blue")
l<-seq(1:n)
plot(res~l)
abline(0,0,col="blue")
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}The statistics for this case are:
> monitor2(semola1$Protein,semola1$IFTpro)
RMSEP: 0.2219797
Bias : -0.01083333
SEP : 0.2264838
Corr : 0.772607
RSQ : 0.5969215
And the plots:
Sample Protein IFTpro
3 12.85 12.95
4 12.68 12.59
5 11.94 12.12
6 12.07 12.25
7 12.53 12.35
8 11.82 12.20
9 12.58 12.18
10 12.35 12.27
11 12.38 12.32
12 12.15 12.31
13 12.75 12.28
14 12.51 12.07
15 11.92 12.20
16 12.14 12.24
17 12.33 12.27
18 12.15 12.10
20 11.82 11.94
21 11.82 12.05
22 12.36 12.05
23 12.06 11.91
24 11.87 11.98
25 11.81 11.80
26 11.53 11.64
27 11.75 11.84
I take this as a practice with R to write some script.
This is the script:
monitor2<-function(x,y){
n<-length(y)
res<-y-x
par(mfrow=c(2,2))
hist(res,col="blue")
plot(x~y,xlab="predicted",ylab="reference",lty=1)
abline(0,1,col="blue")
l<-seq(1:n)
plot(res~l)
abline(0,0,col="blue")
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}The statistics for this case are:
> monitor2(semola1$Protein,semola1$IFTpro)
RMSEP: 0.2219797
Bias : -0.01083333
SEP : 0.2264838
Corr : 0.772607
RSQ : 0.5969215
And the plots:
I realized that there are a lot of things to improve. to make this script more robust. So I will continue reading tutorials, R help pages, and posts from R blogger,...looking at videos, webinars, reading books,.... to continue improving. Anyway, feel free to take this scrip and add to me feedback.
1 may 2012
Monitoring some statistics with "R"
I´ve been practicing after reading a couple of tutorials:
to create a basic function to monitor some basic statistics as RMSEP, Bias, SEP, Correlation and RSQ.
I´ve been doing this with other software`s, so it´s time for “R”. This is the script, please add feedback to improve it.
monitor2<-function(x,y){
n<-length(y)
res<-y-x
{rmsep<-sqrt(sum((y-x)^2)/n)
cat("RMSEP:",rmsep,"\n")}
{(bias<-mean(res))
cat("Bias :",bias,"\n")}
{sep<-sd(res)
cat("SEP :",sep,"\n")}
{r<-cor(x,y)
cat("Corr :",r,"\n")}
{rsq<-(r^2)
cat("RSQ :",rsq,"\n")}
}
Example:
> x<-c(1:10)
> y<-c(2:11)
> monitor2(x,y)
RMSEP: 1
Bias : 1
SEP : 0
Corr : 1
RSQ : 1
Suscribirse a:
Entradas (Atom)