Continuing with the vignette Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux)
Continuing from last post: Once we have calculated the PC terms or components (11
in the case of the last PCA analysis using the method OPC), we define planes
defined by the combinations of two of those terms (for example: PC1-PC2,
PC2-PC3, PC1-PC3,…), and the training spectra is projected on the plane to get
the scores of every spectrum vs. each PC component. All those scores are kept
in a score matrix “T”. All the projections form a cloud that in the case of
just two terms would be a 2D cloud, making easy the interpretations of the
distances between every sample and the mean or their neighbors. But in the case
of more dimensions it is a multivariate cloud, making the visual inspection
more difficult, so we have to check the projections individually in 2D planes
or 3D planes.
Algorithms like the Mahalanobis distance to the mean or to the neighbors will help us to check if the sample can be an outlier, it has very close neighbors (so it is represented by samples in theory similar), or if the sample has not closer neighbors and is a good sample to improve the structure of the database and make it more robust.
Let´s see in the case of the previous code one of those score planes, the one formed by the PC1 and PC2 terms:
plot(pca_tr_opc$scores[,1],pca_tr_opc$scores[,2],ylim = c(min(pca_tr_opc$scores[,2]),
We can project the testing data on the same plane, getting the scores of
the samples:
pca_projected <- predict(pca_tr_opc, newdata = testing$spc_p)
par(new=TRUE)
plot(pca_projected[,1],pca_projected[,2], col = "red",ylim = c(min(pca_tr_opc$scores[,2]),
xlab=" ", ylab=" ")
plot_ly(T_training, x=~T_training[,1], y=~T_training[,2],
z=~T_training[,3], alpha = 0.7)
No hay comentarios:
Publicar un comentario