> library(chemometrics)
> fatmsc_nipals<-nipals(fat_msc,a=10,it=160)
> CPs<-seq(1,10,by=1)
> matplot(CPs,t(fatmsc_nipals$T),lty=1,pch=21,
+ xlab="PC_number",ylab="Explained_Var")
In the 2D plot, we can see that with 3 or 4 principal components, almost all the variance is explained. We see also how samples are well projected over the first PC, but how one sample seems to be a clear outlier when projected over the second PC.Also another (or the same) sample seems to be an outlier when projected over the 4th PC.
A look to the PC planes will show us the distribution of the samples, for all the combinations of the first four Principal Components.
> pairs(fatmsc_nipals$T[,1:4],col="red")
Let´s calculate the Mahalanobis distances to study better our sample population and to find outliers:
> fatmsc_nipals4pc<-fatmsc_nipals$T[,1:4]
> Moutlier(fatmsc_nipals4pc,quantile =0.975, plot=TRUE)
$md
[1] 3.1025142 1.6787234 1.1155225 1.7130001 2.6460755 1.3341133
[7] 1.6411590 1.8790763 0.8541428 1.1332246 2.3236420 2.8866269
[13] 1.3146715 1.2759619 1.4875943 1.0603605 1.4880568 0.9665029
[19] 1.8239722 1.6562767 1.9359163 1.6708538 2.2936930 1.4845431
[25] 2.6051294 1.1436901 0.3571686 2.0533289 1.4248877 0.7107637
[31] 1.2504387 2.0814050 1.8502729 3.1433570 2.2836840 0.3547710
[37] 2.9424866 2.6208234 0.6914464 2.9649615 1.7914432 2.2730964
[43] 3.0530848 1.1569603 1.5458923 1.3590878 1.9744677 1.8299434
[49] 1.6564926 2.7850876 2.9147344 2.9858931 2.0337672 2.6220121
[55] 2.4169714 0.9873321 2.7614810 3.4578931 4.3510717 2.1840045
[61] 3.4219424 2.6471133 2.5050841 1.6500068 2.2036638 11.6174968
[67] 3.1637271 3.0938694 2.3489142 2.5605777 1.7651892 0.7602064
[73] 0.8169349 1.1276683 1.0530317 0.9008947 1.5501520 0.8291586
[79] 1.8831524 0.5590048 2.3312774 2.0025709 0.7148548 1.9298735
[85] 1.7581300 1.9388953 1.4556749 2.0408671 1.7715642 2.5011261
[91] 4.4534119 2.8088303 4.2640203 0.9677583 6.1127505 1.2239764
[97] 5.9621142 0.9987361 1.5365592 0.8917701 0.6152401 0.8996054
[103] 1.8370282 1.3580873 0.7873400 0.9220825 1.8619488 1.9298884
[109] 1.4912294 0.9832971 0.9842641 1.2018128 0.7935046 0.8925428
[115] 1.2003102 1.4462257 1.2691323 1.8269249 1.2838734 0.9981628
[121] 1.9145605 1.7954542 1.5230153 1.3347716 1.1156095 1.5871748
[127] 1.4889242 1.1780966 1.4165463 1.0057897 1.6742841 1.7999796
[133] 1.2231126 1.3167038 1.7676869 1.7475316 1.5718934 0.7844088
[139] 0.7250911 0.8394164 0.9434329 1.3583476 0.9143295 1.5666855
[145] 0.8250539 0.5027369 1.6273106 1.8940848 0.8493707 1.4611669
[151] 0.3644340 0.7813530 1.6332761 1.0557438 1.2848675 1.0695355
[157] 1.7891441 0.6474083 0.8387371 0.9655893 1.6508979 1.4765710
[163] 2.6846350 1.9820580 2.0689903 1.5834826 1.2542036 0.8494160
[169] 1.3529783 0.8451586 1.6718654 2.5892144 1.3678979 1.4070544
[175] 1.3870741 1.2010282 1.3446915 1.4648297 1.4599712 1.5161282
[181] 1.2140609 2.1280737 1.1751724 1.5939065 0.8337121 1.0548981
[187] 1.2061079 1.1519596 1.4011917 1.1339365 1.3009569 1.1758361
[193] 0.9313623 0.9973675 1.3783733 1.3145118 1.4065661 2.2898204
[199] 1.3149368 1.6195627 1.3458978 1.1028901 1.5325457 1.4918670
[205] 1.6747645 1.0730898 1.3003462 2.2767521 1.2188084 1.4188156
[211] 1.2551781 1.1094945 1.7552917 1.6537534 1.0851287 1.1067528
[217] 1.4062079 1.6325028 2.0682626
We can see that sample 66 has a MD>11, so we can suspect which sample is looking to the plots, but you can selected and overplotted to be sure:
matplot(wavelengths,fat_msc[66,],lty=1,type="l",
col="red",lwd=3,pch=NULL,xlab="nm",ylab="abs")
par(new=TRUE)
matplot(wavelengths,t(fat_msc),lty=2,type="l",
pch=NULL,xlab="nm",ylab="abs")
If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached, or a dropbox link.
No hay comentarios:
Publicar un comentario