## 9 mar. 2012

### NIT: Fatty acids study in R - Part 005 (Calibration)

There are several algorithms to run a PLS regression (I recommend to consult the books: “Introduction to Multivariate Analysis in Chemometrics - Kurt Varmuza & Peter Filzmozer” and “Chemometrics with R – Ron Wehrens”).
We are going to use the PLS package, and we are going to develop, maybe the constituent which looks more promising: Oleic Acic (C18:1).
Of course we are going to use the MSC pretreatment. For vcross validation we are going to use “leave one out”.

I decide a maximum of 16 PLS terms.
[1] C18_1_reg<- plsr(C18_1~NITmsc, ncomp = 16,data =fattyac_msc,
+ validation = "LOO")

Looking to the Cross Validation statistics, it seems that 12 is the best number of terms to use, anyway lets see the plots.
par(mfrow=c(1,2))
plot(C18_1_reg,"validation",estimate="CV")
abline(v=12,col="red")
plot(C18_1_reg,ncomp=12,"prediction")
abline(0,1,col="red")

We can see how after Term 12 the RMSEP increase.
We can see in the XY plot, how we have a few samples (like the sample 219) with a high residual (probably wrong lab value in some cases). If you prefer to see the residual plot, where you can see sample 219 in the upper right corner:

If we check the residual list (C18_1_reg\$residuals),the extreme sample 66, fits well in the model, so we decide to keep it. No we can remove sample 219 from the sample training set removing this row from the data frame.
C18_1_reg<- plsr(C18_1~NITmsc,ncomp = 16,
+ data =fattyac1_msc,validation = "LOO")
We repeat steo [1] again, but thid time we change "fattyas_msc" for "fattyas1_msc", and look to the plots and statistics again. In this case I overplot the plots with the outlier (red colour) and without the outlier (Blue):

We keep this last model and we will test it in the future with new samples (independent test set).

If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached.