This is the number 7 of the posts about the vignette " Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux) ", where we try to understand the way Resemble package works, and to use their functions to analyze complex products as the case of soil.
We continue from post 6, were we saw how the dissimilarity matrix is calculated in the principal component space (orthogonal space), and how in that space we can project new samples to get their scores, and calculate the dissimilarity matrix between the training sample set and the test sample set.
Now the idea is to select for every sample, from the test set, a certain number of training samples which are neighbors of that sample defining how many neighbors to choose (by "knn"). These selected training samples are taken apart and a new principal component space is calculated, calculating a new dissimilarity matrix with new values for the distances . In this new dissimilarity matrix will have NA values for the training samples which are not selected.
This is the part of the vignette called: "Combine k-nearest neighbors and dissimilarity measures in the orthogonal space" where, different examples choosing a "knn" value of 200, are developed using different PCA methods, so you can practice.
In the case of the first test set sample the histogram of the neighbors distances between of the sample itself and the rest of the training samples is:
we have chosen one test sample which is quite apart from the majority of the training samples , but you can try with other test set samples and you get different distributions.
Taking apart the 200 most closer samples and developing a new PCA, the NH distances of this sample to the rest is:
No hay comentarios:
Publicar un comentario