10 ene 2020

Tidyverse and Chemometrics (part 3)

Once that we have defined the math treatment that we consider a good option for our spectra, we can check visually if we find some outliers, and that was not the case in our second derivative spectra for fish meal, so we continue checking for the samples in the principal component space.

For this calculation I recommend to load the library "chemometrics" and to run the "nipals" algorithm, where we select the X matrix (in our case treated with the second derivative) and select also the number of PCs we want to check.

library(chemometrics)
nir2d_pc<-nipals(nir_2d,10)

#Two matrices are generated:
# T = Scores Matrix
# P = Loadings Matrix


Now we are going to work with the T matrix and we will see how it describes ellipses in the PC space which is a curious characteristic of the NIR spectra. We will define the limits for those elipses based on the Mahalanobis distance.

Chemometrics packages has the function "drawMahal", where we use the T matrix calculated with Nipals to draw the ellipses. There is an ellipse per two PCs, which i what we called PC scores map, but in general an ellipsoid is drawn in all PC space.

Now I write the code to check the maps PC1 vs PC2, PC1 vs PC3 and PC2 vs PC3:

drawMahal(nir_pc$T[,c(1,2)],
          center=apply(nir_pc$T[,c(1,2)],2,mean),
          covariance=cov(nir_pc$T[,c(1,2)]),
          quantile=0.975,xlab="PC1",ylab="PC2",

          col="blue")


drawMahal(nir_pc$T[,c(1,3)],
          center=apply(nir_pc$T[,c(1,3)],2,mean),
          covariance=cov(nir_pc$T[,c(1,3)]),
          quantile=0.975,xlab="PC1",ylab="PC3",

          col="blue")
 
drawMahal(nir_pc$T[,c(2,3)],
          center=apply(nir_pc$T[,c(2,3)],2,mean),
          covariance=cov(nir_pc$T[,c(2,3)]),
          quantile=0.975,xlab="PC2",ylab="PC3",

          col="blue")




As we can see with 40 samples we have a wide distribution, being one of them an outlier in the PC1, but we want to fill the space with more samples spending as less as possible in laboratory analysis and this will be the idea along this tutorial.
What is the reason for the outlier?. Checking the raw spectra and specially the math treated spectra we can realize that one of the samples moves apart from the rest, being the water band, at 1940 nm aprox. lower than the rest, this mean that this sample is more dry than the others and that is the reason the sample is an outlier in the first PC component.

 


No hay comentarios:

Publicar un comentario