15 ago 2021

Tidymodels: Modeling hotel bookings in R using tidymodels and recipes

In order to use the tidymodels and tidyverse packages to the spectroscopic data we have to familiarize with their functions , so we have to practice with tutorials from other data sets, like in this case with data from hotels bookings.
So lets see Julia working with this data set, and try to practice doing yourself with the available material in tidytuesday and tidimodels websites.

There is a tutorial about Infratec meat spectra with tidymodels (not in video), using PLS , so we will have to look at it in other post.

12 ago 2021

Time based validation improvement for pH in cocoa paste

 The best way to check the performance of a calibration is with a new time based validation set. In this case one calibration has been developed to predict pH in cocoa paste. The calibration has been developed choosing the best performance with a cross validation using groups. 

After this the model has been installed in routine and after 2 months, new samples with reference values attached have been collected so we can run the validation to see the statistics. This is the performance:

One of the sample with a high GH is classified as outlier, so we can think that it was a lab error value and we can excluded without a good reason and that must not be done.

Why not to develop again the model with the old training set trying to find a better math-treatment or configuration and validate with this new time based set. The results are surprising with a "None-0-0-1-1" math and 15 PLS terms:

RSQ increase and the sample with the high pH is predicted fine. Take into account that we have a very sort range for pH so we can not expect a high RSQ value.