17 oct 2021

Modelling complex spectral data (soil) with the resemble package (V)

 Continuing with the vignette Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux) 

Continuing from last post: Once we have calculated the PC terms or components (11 in the case of the last PCA analysis using the method OPC), we define planes defined by the combinations of two of those terms (for example: PC1-PC2, PC2-PC3, PC1-PC3,…), and the training spectra is projected on the plane to get the scores of every spectrum vs. each PC component. All those scores are kept in a score matrix “T”. All the projections form a cloud that in the case of just two terms would be a 2D cloud, making easy the interpretations of the distances between every sample and the mean or their neighbors. But in the case of more dimensions it is a multivariate cloud, making the visual inspection more difficult, so we have to check the projections individually in 2D planes or 3D planes.

Algorithms like the Mahalanobis distance to the mean or to the neighbors will help us to check if the sample can be an outlier, it has very close neighbors (so it is represented by samples in theory similar), or if the sample has not closer neighbors and is a good sample to improve the structure of the database and make it more robust.

Let´s see in the case of the previous code one of those score planes, the one formed by the PC1 and PC2 terms:

plot(pca_tr_opc$scores[,1],pca_tr_opc$scores[,2],
    xlim = c(min(pca_tr_opc$scores[,1]),
    max(pca_tr_opc$scores[,1])),
    ylim = c(min(pca_tr_opc$scores[,2]),
    max(pca_tr_opc$scores[,2])),
    xlab="PC1 ", ylab="PC2 ")

We can project the testing data on the same plane, getting the scores of the samples:

pca_projected <- predict(pca_tr_opc, newdata = testing$spc_p)

par(new=TRUE)

plot(pca_projected[,1],pca_projected[,2], col = "red", 
    xlim = c(min(pca_tr_opc$scores[,1]),
    max(pca_tr_opc$scores[,1])),
    ylim = c(min(pca_tr_opc$scores[,2]),
    max(pca_tr_opc$scores[,2])),
    xlab=" ", ylab=" ")

If we had only two PCs, this plane would be enough to show us the cloud, but we have 11  PCs. We can add graphically one more dimension (3D) and we see the cloud more clearly.

library(plotly)
T_training <- as.data.frame(pca_tr_opc$scores)
plot_ly(T_training, x=~T_training[,1], y=~T_training[,2],
             z=~T_training[,3], alpha = 0.7)

Practice yourself and rotate the plot to different angles.

No hay comentarios:

Publicar un comentario