Send the 47 samples to the lab will be too costly, so we want to send a set of samples, which captures the structure of the new fish3 samples, and at the same time add new variability to the actual database. Doing this, the next time we create a new calibration it will be more representative for the fish meal population, hopping to get better accuracy for the fish4 set and so on.
The idea this time, is to make it in a manual way, projecting the samples and selecting a certain number based on the money we want to expend in laboratory analisis.
We have seen the calculation for the projections in a previous post, so we just draw the new season scores (in red) over the PC space of the previous seasons 1 and 2, showing the scores for fish1 and fish 2 in blue color.
#PC1 vs PC2
drawMahal2(fish_1d_all_pc$T[,c(1,2)],
center=apply(fish_1d_all_pc$T[,c(1,2)],2,mean),
covariance=cov(fish_1d_all_pc$T[,c(1,2)]),
quantile=0.975,xlab="PC1",ylab="PC2",col="blue",
main="Fish3 projected on Fish1+2 PCs")
par(new=TRUE)
drawMahal2(fish3_1d_T[,c(1,2)],
center=apply(fish_1d_all_pc$T[,c(1,2)],2,mean),
covariance=cov(fish_1d_all_pc$T[,c(1,2)]),
quantile=0.975,xlab="PC1",ylab="PC2",col="red")
We notice that many of the Fish3 samples are outside of the ellipse formed by Fish1 + Fish2 samples. We can check the other score maps:
The score maps show clearly that many of the samples are outliers.
We can calculate the Mahalanobis distance for every sample to check how many are outside the cutoff limits:
fish1y2_T_mean<-colMeans(fish_1d_all_pc$T)
fish3_d_mahal<-sqrt(mahalanobis(fish3_1d_T[,c(1:5)],
fish1y2_T_mean[c(1:5)],
cov(fish_1d_all_pc$T[,c(1:5)])))
1F3 2F3 3F3 4F3 5F3 6F3 1.808546 2.410371 1.817420 2.674472 3.882431 8.894017 7F3 8F3 9F3 10F3 11F3 12F3 3.956726 4.129471 4.194344 4.608712 5.637123 4.561250 13F3 14F3 15F3 16F3 17F3 18F3 5.020347 3.427444 3.330089 3.348741 4.039378 4.006860 19F3 20F3 21F3 22F3 23F3 24F3 7.992560 8.237418 5.813657 5.672932 6.258595 5.949936 25F3 26F3 27F3 28F3 29F3 30F3 5.763188 6.121845 7.013205 6.370521 6.109572 3.944506 31F3 32F3 33F3 34F3 35F3 36F3 2.576640 2.675160 3.993163 6.187790 7.092179 7.118996 37F3 38F3 39F3 40F3 41F3 42F3 3.460574 4.061820 4.202667 2.000492 1.806180 3.880191 43F3 44F3 45F3 46F3 47F3 3.871592 4.937029 3.415351 4.433763 8.009396
Without any doubt there are samples in these sample will expand the database but we have to capture the structure of this variability, removing samples that could be neighbours.
We proceed with this option in the next post.
No hay comentarios:
Publicar un comentario