13 oct 2021

Modelling complex spectral data (soil) with the resemble package (III)

Continuing with the vignette Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux) 

(from previous post) - These 825 samples are divided in two sets, one for training (value equal 1 in the variable train) and another for testing (value = 0). 

 NIRsoil %>%
    count(train)

    train       n
     0         207
     1         618

Let´s create these two dataframes:

training<- NIRsoil[NIRsoil$train == 1, ]
testing<- NIRsoil[NIRsoil$train == 0, ]


Now the vignette explain how to proceed with the dimensionality reduction and the methods we can use for that purpose: pca, pca_nipals and pls.

You can practice each one (as in the vignette code) using the default values for the explained  variance  for every component  (by default > 1%) or you can fit the cumulative variance you want.

PCA method use the Singular Value Decomposition algorithm, PCA_NIPALS, the NIPALS algorithm and PLS use (apart of the predictor variables (absorbances)) the response variables , maximizing the covariance of latent variables from  predictors with the response (we have to specify which response variable to use).

Just follow the vignette examples for these three methods, and check how many components are recommended on each one.

For a more advanced use of the methods we can configure the "pc_selection" method, generating a list with the method ("var", "cumvar", "manual" or "opc"), and the value of the variance each component  must explain ("var" case). In the case of "cumvar" we set the value to the total cumulative variance we want to explain, and we will see in more detail the case of the "opc" in another post.

Take into account that we will create the ortho-projections with the training set, and after, we can project new data (testing set) on the planes created with the training set, to get the scores of this new testing data.

No hay comentarios:

Publicar un comentario