R & Chemometrics: junio 2018

29 jun 2018

Plots in Resemble (Part 1)

Resemble allow a certain number of plots which are very useful for your works or personal papers. In this case I use the same sets than of the previous post and I plot the PCA scores, where I can see the training matrix (Xr) scores and the validation matrix (Xu) scores overlapped.

Validation set is a 35% (randomly selected) of the whole sample population obtained from a long time period.

We can see how the validation samples cover more or less the space of the training samples.

> par(mfrow=c(1,2))
> plot(local.mbl, g = "pca", pcs=c(1,2)) 
> plot(local.mbl, g = "pca", pcs=c(1,3))

28 jun 2018

Using correlation in LOCAL (Resemble package)

> ctrl.mbl <- mblControl(sm = "cor",

                  pcSelection = list("cumvar", 0.999),

                  valMethod = "NNv",

                  scaled = FALSE, center = TRUE)

 > local.mbl <- mbl(Yr = Yr, Xr = Xr, Yu = Yu, Xu = Xu,

                   mblCtrl = ctrl.mbl,

                   dissUsage = "none",

                   k = seq(40, 150, by = 10),

                   pls.c = c(5, 15),

                   method = "wapls1")

Predicting sample: 1  ----------
Predicting sample: 2  ----------
Predicting sample: 3  ----------
Predicting sample: 4  ----------
Predicting sample: 5  ----------

--------------------------------

--------------------------------

> plot(predicted.local,Yu)

This time I use correlation (as Win ISI use) and try to find the best number of samples to select for the LOCAL algorithm with a sequence.

As we can see the predictions improve with more samples in the calibration (red dots), maybe could be better win more samples by at the end of the plot it start to stabilize.

27 jun 2018

LOCAL Calibrations with Resemble

Really interesting the Resemble package so I am trying to work and understand it better even to help me in the case of Win ISI LOCAL calibrations.

We can get predictions for different combinations of local selected samples for the calibration to predict the unknown, so we can see the best option. We use a certain number of terms (min. and max.) and a weighted average is calculated.

In this case I use an external validation set of petfood Xu with Reference data (protein) Yu, and I want to know the statistics (RMSE and R square) for the case of 90 local samples selected:

predicted.local <- as.numeric(Yu_anl$Nearest_neighbours_90)
> rmse.local <- sqrt(mean((Yu - predicted.local)^2))
> R2.local <- cor(Yu, predicted.local)^2
> R2.local
[1] 0.9507232
> rmse.local
[1] 1.163304

plot(predicted.local,Yu)

There are a lot of options to explore, so I will continue checking this package.

26 jun 2018

Memory based learning methods and tools (Resamble / LOCAL)

This is the link to this presentation which help us to understand the concept of LOCAL that we will treat during next posts with the "Resamble package" and we have treated and we will continue with LOCAL in Win ISI.

Developing LOCAL calibrations with R

We can use also LOCAL in R with the Resemble package. I am testing the package these days with a set of petfood spectra (with protein reference values) imported from Win ISI with SNV and a second derivative math treatment. After, I select 65% for training and the rest for test.

The get predictions process of Resemble allow a configuration to check for the better number of sample or factors for the better prediction, so there are a lot of options and functions to check in this package.

This is a plot of the results for a standard configuration from the reference manual, that I would try to go more deep into, trying to find the best configuration.

ctrl <- mblControl(sm = "pls",
                   pcSelection = list("opc", 40),
                   valMethod = c("NNv"),
                   scaled = FALSE, center = TRUE)

ex1 <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu,
           mblCtrl = ctrl,
           distUsage = "predictors",
           k = seq(30, 150, 15),
           method = "wapls1",
           pls.c= c(7, 20))
Yu_anl<-getPredictions(ex1)

Clearly seems that some of the configurations have overfitting, but I am just starting to learn the package so more post will come up giving my progress with this package.

14 jun 2018

Precission and Accuracy LAB vs NIR

Finally happy with this plot trying to explain the precision and accuracy of the laboratory vs. the NIR predictions for a set of samples and subsamples. I will explain more detail of this plots in coming posts.

6 jun 2018

Subsample Predictions Boxplot

This is a boxplot where there are four subsamples of meat meal predictions. A representative of a certain batch has been divided in four subsamples and analyzed in a NIR. So we get four predictions, one for every subsamples, so the the boxplot gives an idea of the variance in the predictions for every sample based on their subsamples.

The colors are because the subsamples had been send to two different labs, so one is represented by one color. Colors had certain transparency because in some cases, two samples went to a lab and two to the other, in other cases the four subsamples went to the same lab and even in some cases three to one lab and one to another.

All these studies give an idea of the complexity of the meat meal product.

3 jun 2018

Average and subsample residuals

In order to understand better the performance of a model, different blind subsamples of a sample had been sent to a laboratory, so in some cases we have the lab values of four subsamples of a sample and in other cases two subsamples of a sample. There are two cases with only one subsample.

For every subsample we calculate the average for the lab, and the average for the predictions, to get the red dot residuals.

We have also the residual of every subsample vs its prediction and those are the gray dots.

The plot (with R) gives a nice information about the performance of the model and how the average performs better in most cases than the individual subsamples.

We can see the warning (2.SEL) and action limits (3.SEL), and how the predictions for the average fall into the warning limits.