Continuing with the vignette Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux)
Imagine that just two Principal Components would be enough to explain 99% of the variance, so we can see the samples in a unique plane. It is easy to calculate the distance between a sample and all the rest, just drawing lines and calculating their distance. After that we can write their value in a matrix where the diagonal would be cero (distance between the sample and itself). In this case, it is the training set so we have 618 samples (618 dots) and the matrix would be a matrix with 618 rows and 618 columns (618x618).
We can see cases where the samples are very close (blue circles), so their neighbor distance is very small (very low values), and we can consider (we saw as well in previous post) that their constituents’ values would be very similar.
In the case that we
have more components to explain the variance (11 as we saw in previous post),
the dimension of the matrix would be the same (618x618), but the distances
would be not in a plane, if not in a multidimensional space.
This matrix is called "dissimilarity matrix" in the vignette, and has a great importance in the development of calculations and algorithms.
No hay comentarios:
Publicar un comentario