13 jul. 2018

Validating Resemble Model with Monitor function

Continuing with this post evaluating the LOCAL model developed in Resemble. This time I use the Monitor function (one of the Monitor functions I am developing).

I create different subsets from the validation sample set for the different categories, In this case is for one time of puppies, and I am evaluating the moisture content. We can see that there are two outliers that increase the SEP, so we have to see if we remove this samples for some reasons.

Let´s validate first with this type of puppy subset and check the statistics:

> val1.moi.pup1<-subset(val1.moi,ID1u_moi=="PUPPY-1")
> val1.moi.pup1<-cbind.data.frame(val1.moi.pup1$Sample.u_moi, 
+                                 val1.moi.pup1$Yu_moi, 
+                                 val1.moi.pup1$predicted.moi.local)
> monitor10c24xyplot(val1.moi.pup1)

Samples with the Sample IDs 463 y 456 are out of the action limits and the monitor function shows their position in the table:
$ResWarning [1] id ref pred res <0 rows> (or 0-length row.names) $ResAction id ref pred res 34 456 3.7 7.793351 -4.093351 32 463 4.9 7.881543 -2.981543

Now we can remove this samples knowing their position and recalculate:

6 jul. 2018

Plots in Resemble (part 2)

Good results for the prediction of the validation samples (Xu, Yu) for protein. This is the XY plot where we can see in different colors the classes of the validation samples (different types of petfood). the SEP is 0.88 (without removing outliers) . Defining the data frame by classes will allow us to see the SEP for every class so we can check which class needs for more samples in the training database (Xr, Yr) or to check for other reasons.


El error SEP de las muestras de validación de proteína es   :  0.887 
El R cuadrado para las muestras de validación de proteína es:  0.962 

29 jun. 2018

Plots in Resemble (Part 1)

Resemble allow a certain number of plots which are very useful for your works or personal papers. In this case I use the same sets than of the previous post and I plot the PCA scores, where I can see the training matrix (Xr) scores and the validation matrix (Xu) scores overlapped.
Validation set is a 35% (randomly selected) of the whole sample population obtained from a long time period.
We can see how the validation samples cover more or less the space of the training samples.

> par(mfrow=c(1,2))
> plot(local.mbl, g = "pca", pcs=c(1,2)) 
> plot(local.mbl, g = "pca", pcs=c(1,3))

28 jun. 2018

Using correlation in LOCAL (Resemble package)

> ctrl.mbl <- mblControl(sm = "cor",
                  pcSelection = list("cumvar", 0.999),
                  valMethod = "NNv",
                  scaled = FALSE, center = TRUE)
 > local.mbl <- mbl(Yr = Yr, Xr = Xr, Yu = Yu, Xu = Xu,
mblCtrl = ctrl.mbl,
                   dissUsage = "none",
                   k = seq(40, 150, by = 10),
                   pls.c = c(5, 15),
                   method = "wapls1")

Predicting sample: 1  ----------
Predicting sample: 2  ----------
Predicting sample: 3  ----------
Predicting sample: 4  ----------
Predicting sample: 5  ----------
> plot(predicted.local,Yu)


This time I use correlation (as Win ISI use) and try to find the best number of samples to select for the LOCAL algorithm with a sequence. 

As we can see the predictions improve with more samples in the calibration (red dots), maybe could be better win more samples by at the end of the plot it start to stabilize.

27 jun. 2018

LOCAL Calibrations with Resemble

Really interesting the Resemble package so I am trying to work and understand it better even to help me in the case of Win ISI LOCAL calibrations.
We can get predictions for different combinations of local selected samples for the calibration to predict the unknown, so we can see the best option. We use a certain number of terms (min. and max.) and a weighted average  is calculated.
In this case I use an external validation set of petfood Xu with Reference data (protein) Yu, and I want to know the statistics (RMSE and R square) for the case of 90 local samples selected:

predicted.local <- as.numeric(Yu_anl$Nearest_neighbours_90)
> rmse.local <- sqrt(mean((Yu - predicted.local)^2))
> R2.local <- cor(Yu, predicted.local)^2
> R2.local
[1] 0.9507232
> rmse.local
[1] 1.163304
There are a lot of options to explore, so I will continue checking this package.

26 jun. 2018

Memory based learning methods and tools (Resamble / LOCAL)

This is the link to this presentation which help us to understand the concept of LOCAL that we will treat during next posts with the "Resamble package" and we have treated and we will continue with LOCAL in Win ISI.

Developing LOCAL calibrations with R

We can use also LOCAL in R with the Resemble package. I am testing the package these days with a set of petfood spectra (with protein reference values) imported from Win ISI with SNV and a second derivative math treatment. After, I select 65% for training and the rest for test.
The get predictions process of Resemble allow a configuration to check for the better number of sample or factors for the better prediction, so there are a lot of options and functions to check in this package.
This is a plot of the results for a standard configuration from the reference manual, that I would try to go more deep into, trying to find the best configuration.

ctrl <- mblControl(sm = "pls",
                   pcSelection = list("opc", 40),
                   valMethod = c("NNv"),
                   scaled = FALSE, center = TRUE)

ex1 <- mbl(Yr = Yr, Xr = Xr, Yu = NULL, Xu = Xu,
           mblCtrl = ctrl,
           distUsage = "predictors",
           k = seq(30, 150, 15),
           method = "wapls1",
           pls.c= c(7, 20))


Clearly seems that some of the configurations have overfitting, but I am just starting to learn the package so more post will come up giving my progress with this package.

14 jun. 2018

Precission and Accuracy LAB vs NIR

Finally happy with this plot trying to explain the precision and accuracy of the laboratory vs. the NIR predictions for a set of samples and subsamples. I will explain more detail of this plots in coming posts.

6 jun. 2018

Subsample Predictions Boxplot

This is a boxplot where there are four subsamples of meat meal predictions. A representative of a certain batch has been divided in four subsamples and analyzed in a NIR. So we get four predictions, one for every subsamples, so the  the boxplot gives an idea of the variance in the predictions for every sample based on their subsamples.
The colors are because the subsamples had been send to two different labs, so one is represented by one color. Colors had certain transparency because in some cases, two samples went to a lab and two to the other, in other cases the four subsamples went to the same lab and even in some cases three to one lab and one to another.
All these studies give an idea of the complexity of the meat meal product.

3 jun. 2018

Average and subsample residuals

In order to understand better the performance of a model, different blind subsamples of a sample had been sent to a laboratory, so in some cases we have the lab values of four subsamples of a sample and in other cases two subsamples of a sample. There are two cases with only one subsample.
For every subsample we calculate the average for the lab, and the average for the predictions, to get the red dot residuals.
We have also the residual of every subsample vs its prediction and those are the gray dots.
The plot (with R) gives a nice information about the performance of the model and how the average performs better in most cases than the individual subsamples.
We can see the warning (2.SEL) and action limits (3.SEL), and how the predictions for the average fall into the warning limits.