## 3 mar. 2012

### NIT: Fatty acids study in R - Part 002

> library(chemometrics)
> fatmsc_nipals<-nipals(fat_msc,a=10,it=160)
> CPs<-seq(1,10,by=1)
> matplot(CPs,t(fatmsc_nipals\$T),lty=1,pch=21,
+ xlab="PC_number",ylab="Explained_Var")
In the 2D plot, we can see that with 3 or 4 principal components, almost all the variance is explained. We see also how samples are well projected over the first PC, but how one sample seems to be a clear outlier when projected over the second PC.
Also another (or the same) sample seems to be an outlier when projected over the 4th PC.

A look to the PC planes will show us the distribution of the samples, for all the combinations of the first four Principal Components.
> pairs(fatmsc_nipals\$T[,1:4],col="red")

Let´s calculate the Mahalanobis distances to study better our sample population and to find outliers:
> fatmsc_nipals4pc<-fatmsc_nipals\$T[,1:4]
> Moutlier(fatmsc_nipals4pc,quantile =0.975, plot=TRUE)
\$md
[1]  3.1025142  1.6787234  1.1155225  1.7130001  2.6460755  1.3341133
[7]  1.6411590  1.8790763  0.8541428  1.1332246  2.3236420  2.8866269
[13]  1.3146715  1.2759619  1.4875943  1.0603605  1.4880568  0.9665029
[19]  1.8239722  1.6562767  1.9359163  1.6708538  2.2936930  1.4845431
[25]  2.6051294  1.1436901  0.3571686  2.0533289  1.4248877  0.7107637
[31]  1.2504387  2.0814050  1.8502729  3.1433570  2.2836840  0.3547710
[37]  2.9424866  2.6208234  0.6914464  2.9649615  1.7914432  2.2730964
[43]  3.0530848  1.1569603  1.5458923  1.3590878  1.9744677  1.8299434
[49]  1.6564926  2.7850876  2.9147344  2.9858931  2.0337672  2.6220121
[55]  2.4169714  0.9873321  2.7614810  3.4578931  4.3510717  2.1840045
[61]  3.4219424  2.6471133  2.5050841  1.6500068  2.2036638 11.6174968
[67]  3.1637271  3.0938694  2.3489142  2.5605777  1.7651892  0.7602064
[73]  0.8169349  1.1276683  1.0530317  0.9008947  1.5501520  0.8291586
[79]  1.8831524  0.5590048  2.3312774  2.0025709  0.7148548  1.9298735
[85]  1.7581300  1.9388953  1.4556749  2.0408671  1.7715642  2.5011261
[91]  4.4534119  2.8088303  4.2640203  0.9677583  6.1127505  1.2239764
[97]  5.9621142  0.9987361  1.5365592  0.8917701  0.6152401  0.8996054
[103]  1.8370282  1.3580873  0.7873400  0.9220825  1.8619488  1.9298884
[109]  1.4912294  0.9832971  0.9842641  1.2018128  0.7935046  0.8925428
[115]  1.2003102  1.4462257  1.2691323  1.8269249  1.2838734  0.9981628
[121]  1.9145605  1.7954542  1.5230153  1.3347716  1.1156095  1.5871748
[127]  1.4889242  1.1780966  1.4165463  1.0057897  1.6742841  1.7999796
[133]  1.2231126  1.3167038  1.7676869  1.7475316  1.5718934  0.7844088
[139]  0.7250911  0.8394164  0.9434329  1.3583476  0.9143295  1.5666855
[145]  0.8250539  0.5027369  1.6273106  1.8940848  0.8493707  1.4611669
[151]  0.3644340  0.7813530  1.6332761  1.0557438  1.2848675  1.0695355
[157]  1.7891441  0.6474083  0.8387371  0.9655893  1.6508979  1.4765710
[163]  2.6846350  1.9820580  2.0689903  1.5834826  1.2542036  0.8494160
[169]  1.3529783  0.8451586  1.6718654  2.5892144  1.3678979  1.4070544
[175]  1.3870741  1.2010282  1.3446915  1.4648297  1.4599712  1.5161282
[181]  1.2140609  2.1280737  1.1751724  1.5939065  0.8337121  1.0548981
[187]  1.2061079  1.1519596  1.4011917  1.1339365  1.3009569  1.1758361
[193]  0.9313623  0.9973675  1.3783733  1.3145118  1.4065661  2.2898204
[199]  1.3149368  1.6195627  1.3458978  1.1028901  1.5325457  1.4918670
[205]  1.6747645  1.0730898  1.3003462  2.2767521  1.2188084  1.4188156
[211]  1.2551781  1.1094945  1.7552917  1.6537534  1.0851287  1.1067528
[217]  1.4062079  1.6325028  2.0682626

\$cutoff
[1] 3.338156

We can see that sample 66 has a MD>11, so we can suspect which sample is looking to the plots, but you can selected and overplotted to be sure:

matplot(wavelengths,fat_msc[66,],lty=1,type="l",
col="red",lwd=3,pch=NULL,xlab="nm",ylab="abs")
par(new=TRUE)
matplot(wavelengths,t(fat_msc),lty=2,type="l",
pch=NULL,xlab="nm",ylab="abs")

If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached, or a dropbox link.