15 mar 2020

Tidyverse and Chemometrics (part 16): Projecting a new season

We have developed the calibration for fish meal with Fish1+2 seasons (see previous post) and meanwhile we have new spectra for samples (47) from a new season (Fish3) that we acquire on the NIR instrument. These samples are processed by the model and give some results, and one of the questions could be: how accurate are those results?.

Send the 47 samples to the lab will be too costly, so we want to send a set of samples, which captures the structure of the new fish3 samples, and at the same time add new variability to the actual database. Doing this, the next time we create a new calibration it will be more representative for the fish meal population, hopping to get better accuracy for the fish4 set and so on.

The idea this time, is to make it in a manual way, projecting the samples and selecting a certain number based on the money we want to expend in laboratory analisis.

We have seen the calculation for the projections in a previous post, so we just draw the new season scores (in red) over the PC space of the previous seasons 1 and 2, showing the scores for fish1 and fish 2 in blue color.

#PC1 vs PC2
drawMahal2(fish_1d_all_pc$T[,c(1,2)],
          center=apply(fish_1d_all_pc$T[,c(1,2)],2,mean),
          covariance=cov(fish_1d_all_pc$T[,c(1,2)]),
          quantile=0.975,xlab="PC1",ylab="PC2",col="blue",
          main="Fish3 projected on Fish1+2 PCs")
par(new=TRUE)
drawMahal2(fish3_1d_T[,c(1,2)],
          center=apply(fish_1d_all_pc$T[,c(1,2)],2,mean),
          covariance=cov(fish_1d_all_pc$T[,c(1,2)]),
          quantile=0.975,xlab="PC1",ylab="PC2",col="red")



 We notice that many of the Fish3 samples are outside of the ellipse formed by Fish1 + Fish2 samples. We can check the other score maps:

The score maps show clearly that many of the samples are outliers.
We can calculate the Mahalanobis distance for every sample to check how many are outside the cutoff limits:

fish1y2_T_mean<-colMeans(fish_1d_all_pc$T)
fish3_d_mahal<-sqrt(mahalanobis(fish3_1d_T[,c(1:5)],

                    fish1y2_T_mean[c(1:5)],
                    cov(fish_1d_all_pc$T[,c(1:5)])))

     1F3      2F3      3F3      4F3      5F3      6F3 
1.808546 2.410371 1.817420 2.674472 3.882431 8.894017 
     7F3      8F3      9F3     10F3     11F3     12F3 
3.956726 4.129471 4.194344 4.608712 5.637123 4.561250 
    13F3     14F3     15F3     16F3     17F3     18F3 
5.020347 3.427444 3.330089 3.348741 4.039378 4.006860 
    19F3     20F3     21F3     22F3     23F3     24F3 
7.992560 8.237418 5.813657 5.672932 6.258595 5.949936 
    25F3     26F3     27F3     28F3     29F3     30F3 
5.763188 6.121845 7.013205 6.370521 6.109572 3.944506 
    31F3     32F3     33F3     34F3     35F3     36F3 
2.576640 2.675160 3.993163 6.187790 7.092179 7.118996 
    37F3     38F3     39F3     40F3     41F3     42F3 
3.460574 4.061820 4.202667 2.000492 1.806180 3.880191 
    43F3     44F3     45F3     46F3     47F3 
3.871592 4.937029 3.415351 4.433763 8.009396 

Without any doubt there are samples in these sample will expand the database but we have to capture the structure of this variability, removing samples that could be neighbours.

We proceed with this option in the next post. 

No hay comentarios:

Publicar un comentario