Once we have tuned the model with the cross validation and the batch best value for "mtry", we can develop the final model we will use for routine and to check the performance with the test set we have leave apart. That is what we will do in the part 4 in the next post.
In this one I show the code for the model and the plot of the importance of every predictor variable (wavelength) in the model.
I compare the importance scores with the SG second derivative, with the raw Calcite spectrum.
CaCO3_rf_NIRfit <- randomForest(CaCO3 ~., data = CaCO3spcSG_train,
importance = TRUE, ntree = 500,
mtry = 28)
matplot(seq(1110, 2488, 2), rfImp, type = "l", ylab = "Importance",
par(new = TRUE)
#Overplot the Calcite spectrum
matplot(seq(1110, 2488, 2), calcite_spectrum_2nm[356:1045, ], type = "l",
legend("topleft", # Add legend to plot
legend = c("Importance Scores", "Calcite spectrum"),
col = c("Blue", "red"),
lty = 1)