7 feb 2020

Tidyverse and Chemometrics (part 11): Removing redundant samples

We have seen in the last post, how the fish2 spectra was projected on the principal component space designed by the fish1 spectra. We saw clearly than some of the fish2 samples were out of the ellipse and others were in, but not very close to the fish1 samples. At the same time, some samples of fish2 are neighnors between them or some probably are neighbors of fish1 samples.
We can measure the distance from one sample to another by the Mahalanobis distance. We use to call this distance “Mahalanobis Neighborhood Distance”, and depending of the threshold we use around each sample, more or less samples are selected as neighbors.
The first I want to do is to remove neighbors in fish2, in order to select a percentage (50% approx.) of the fish2 samples to send to the lab for reference analysis. With this intention, I use the function “puchwein” from “prospecr” package and I note the selected samples to make a “fish2_selected” file. In this case, “puchwain” creates a new PC space with the fish2 samples and measure the distance between their scores. With this configuration the selection are 21 samples, so 19 are redundant.
 
pchw_fish2<-puchwein(nir_fish2_2d,pc=0.95,k=0.2,
                     min.sel=20,details=TRUE,
                     .center = TRUE,.scale = FALSE)
Now we have to check if the selected samples have neighbours into the samples of fish1, so we use fDiss from prospect to compare the distances between the scores of fish1 and the scores we got with the projections of the 21 selected samples on the PC space of fish1. If we set a threshold of 0.2 we see that four of the selected samples have neighbors on fish 1 (positions 5, 13, 19 and 20), so we remove them from the selected set, so finally we have 19 samples to send to the laboratory.
 
summary(fDiss(Xr=nir2d_pc$T[,1:5],
             X2=fish2_2d_T_sel1[,1:5],
             method = "mahalanobis",
             center = TRUE, scaled = FALSE))
neig_f2f1<-fish2_2d_T_sel1[c(5,13,19,20),]
fish2_2d_T_sel2<-fish2_2d_T_sel1[-c(5,13,19,20),]
Once we have done it, we can overplot again all the samples:
  • Fish1 samples in blue.
  • Fish2 redundant samples in red.
  • Fish2 selected samples in green
  • Fish2 neighbors of Fish1 in orange.



Now we wait for the laboratory values, ….,see you in the next post.

No hay comentarios:

Publicar un comentario