19 nov 2021

Modelling complex spectral data (soil) with the resemble package (X)

Let´s continue with the vignette: " Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux) 

All along the tutorial we have seen how to measure the distance of a spectrum in a orthogonal space to all the spectra of a certain training sample set. There are different kind of distances, but usually in the orthogonally space I use the Mahalanobis distance, but you can use others like the Euclidian for example. We just have to select the distance method we want when calculating dissimilarities.

Indeed the a distance we can calculate the correlation "R" of one sample versus all the samples in the training set. For this we use the spectrum (all the wavelengths we select) with or without math treatments. Normally we apply some math treatments to remove the scatter or to increase the resolution of the overlapped bands.

Other approach is to select a certain number of samples (for example from 100) but this way we select the 100 closer to the new sample but some of them can be far enough to be a different sample in composition and not good enough to create a custom calibration to predict accurately this new sample. Other approach is to select between a range of samples (for example 100 and 200), and apart from that the sample selected must confirm the requisite to be below a certain distance value (threshold) or over in the case of correlation. With the selected samples we can develop a regression (PLS) to predict the new sample. In the case not enough samples are found, we won´t get any result.

Of course, we can find with this method some drawbacks, as for example that the selected samples are very similar in composition and we won´t have enough variability to develop the PLS models.

In the case of the distance option we do it in a PLS space where the response variable is considered, so different samples can be chosen for every constituent of the same sample.

The vignette shows all this process very well, so it is just straight forward using the code.

There are cases where we want to force that some samples take part of the model, and this action is called “spike the neighborhoods”.



No hay comentarios:

Publicar un comentario