R & Chemometrics: Using Random Forest models in Soil Near Infrared Analysis (part 3)

24 may 2022

Using Random Forest models in Soil Near Infrared Analysis (part 3)

Once we have tuned the model with the cross validation and the batch best value for "mtry", we can develop the final model we will use for routine and to check the performance with the test set we have leave apart. That is what we will do in the part 4 in the next post.

In this one I show the code for the model and the plot of the importance of every predictor variable (wavelength) in the model.

I compare the importance scores with the SG second derivative, with the raw Calcite spectrum.

CaCO3_rf_NIRfit <- randomForest(CaCO3 ~., data = CaCO3spcSG_train,
importance = TRUE, ntree = 500,
mtry = 28)

rfImp <- varImp(CaCO3_rf_NIRfit, scale = FALSE)

matplot(seq(1110, 2488, 2), rfImp, type = "l", ylab = "Importance",

xlab = "wavelengths", col = "blue", lwd = 2)
par(new = TRUE)
#Overplot the Calcite spectrum
matplot(seq(1110, 2488, 2), calcite_spectrum_2nm[356:1045, ], type = "l",

       xlab = " ", ylab = " ", yaxt='n', col = "red" )
legend("topleft", # Add legend to plot
       legend = c("Importance Scores", "Calcite spectrum"),
       col = c("Blue", "red"),
       lty = 1)

R & Chemometrics

24 may 2022

Using Random Forest models in Soil Near Infrared Analysis (part 3)

No hay comentarios:

Publicar un comentario