20 ene 2020

Tidyverse and Chemometrics (part 8)

Today time to develop a calibration with our first set of fishmeal data math treated with SNV + Second Derivative.
 
We have just 40 samples and one outlier in the PCA calculation using Mahalanobis distance (is an extreme sample for dry matter content), so I use all the samples in the training set this time, and I will have a look to the Cross Validation Error this time to select the terms I will use in the PLS model.
 
We will have soon an independent validation set and we will see how this model performs.
 
To develop the model for Protein, we use Caret this time, with the Leave One Out Cross Validation:

library(caret)
model_fish1_prot_pls <- train(protein~nir_1d,data=fish1_1d_df1,
                         method = "pls",scale=FALSE,
                         trControl =trainControl("LOOCV"),
                         tuneLength = 10)


Now let´s see how many PLS factors we need to develop our first model for protein.

library(tidyverse)
     model_fish1_pls %>%
           ggplot(aes(RMSEP))+
                  geom_point()



As we can see 8 PLS terms is the best option for the model, so we can keep it to see how it predict new future samples.

To see how the model performs lets see how every sample is predicted with all the remaining 39 in the calibration:
loocv_prot_8<-subset(model_fish1_prot_pls$pred,ncomp==8)
loocv_prot_8<-loocv_prot_8[,c(1,2)]
plot(loocv_prot_8$obs,loocv_prot_8$pred,

      main= "Caret PLS 8t")

It seems not to be a very nice fitted model, but it is our first starter calibration for protein in fish meal.

No hay comentarios:

Publicar un comentario