16 ago. 2018

Checking the slopes in Validation Groups

This is an study to develop calibrations for meat in a reflectance instrument from 1100 to 1650 nm. Normally meat are measured in transmitance but this is an approach to do it in reflectance.
 
I have just 64 samples with fat laboratory data. I split the spectra into 4 sets of 16 samples and merge 3 of leaving the other three for external validation. So I have 48 samples for training and 16 for validation and I can develop four calibrations and validate with 4 external sets.
 
Considering that we have few samples are in the training set, I have to use few terms. The SEPs for external or Cross Validation are quite high , but the idea here is to see the changes in the slope for the four validation sets.
 
The reason is that we have few samples and the slope value will stabilize as soon as more samples are included into the calibration and validation sets.
 
 
 
To improve the SEP we have to check the sample presentation method for this product and the procedure to obtain the laboratory reference method.

3 ago. 2018

Monitoring the performance with the histogram

NIR can be used to detect levels food additives and check if they are in the right limits.
In this cases there are several types of doughs, and they use two levels of additive concentration depending on the type. So we have always the same reference data.
A calibration is developed and we have new data to validate. NIR will give  results which I expect to be covering the reference value with a Gauss distribution.
Using the Monitor function I can see the prediction distribution vs. the reference distribution and check if the expectations are fine.
 

In the case of the higher concentration is fine, and in the lower concentration is skewed (that is why the S/I adjust is suggested).This can be a first approach to continue with this application with mor accurate reference values.

13 jul. 2018

Validating Resemble Model with Monitor function

Continuing with this post evaluating the LOCAL model developed in Resemble. This time I use the Monitor function (one of the Monitor functions I am developing).

I create different subsets from the validation sample set for the different categories, In this case is for one time of puppies, and I am evaluating the moisture content. We can see that there are two outliers that increase the SEP, so we have to see if we remove this samples for some reasons.

Let´s validate first with this type of puppy subset and check the statistics:

> val1.moi.pup1<-subset(val1.moi,ID1u_moi=="PUPPY-1")
> val1.moi.pup1<-cbind.data.frame(val1.moi.pup1$Sample.u_moi, 
+                                 val1.moi.pup1$Yu_moi, 
+                                 val1.moi.pup1$predicted.moi.local)
 
> monitor10c24xyplot(val1.moi.pup1)



Samples with the Sample IDs 463 y 456 are out of the action limits and the monitor function shows their position in the table:
 
$ResWarning [1] id ref pred res <0 rows> (or 0-length row.names) $ResAction id ref pred res 34 456 3.7 7.793351 -4.093351 32 463 4.9 7.881543 -2.981543

Now we can remove this samples knowing their position and recalculate:
val1.moi.pup1<-val1.moi.pup1[-c(32,34),]
monitor10c24xyplot(val1.moi.pup1)





6 jul. 2018

Plots in Resemble (part 2)

Good results for the prediction of the validation samples (Xu, Yu) for protein. This is the XY plot where we can see in different colors the classes of the validation samples (different types of petfood). the SEP is 0.88 (without removing outliers) . Defining the data frame by classes will allow us to see the SEP for every class so we can check which class needs for more samples in the training database (Xr, Yr) or to check for other reasons.

plot(predicted.local,Yu_prot,col=val1.prot$ID1u_prot,lwd=2)

El error SEP de las muestras de validación de proteína es   :  0.887 
El R cuadrado para las muestras de validación de proteína es:  0.962 
 
 

29 jun. 2018

Plots in Resemble (Part 1)

Resemble allow a certain number of plots which are very useful for your works or personal papers. In this case I use the same sets than of the previous post and I plot the PCA scores, where I can see the training matrix (Xr) scores and the validation matrix (Xu) scores overlapped.
 
Validation set is a 35% (randomly selected) of the whole sample population obtained from a long time period.
 
We can see how the validation samples cover more or less the space of the training samples.

> par(mfrow=c(1,2))
> plot(local.mbl, g = "pca", pcs=c(1,2)) 
> plot(local.mbl, g = "pca", pcs=c(1,3))
 

28 jun. 2018

Using correlation in LOCAL (Resemble package)

> ctrl.mbl <- mblControl(sm = "cor",
                  pcSelection = list("cumvar", 0.999),
                  valMethod = "NNv",
                  scaled = FALSE, center = TRUE)
 > local.mbl <- mbl(Yr = Yr, Xr = Xr, Yu = Yu, Xu = Xu,
mblCtrl = ctrl.mbl,
                   dissUsage = "none",
                   k = seq(40, 150, by = 10),
                   pls.c = c(5, 15),
                   method = "wapls1")
 

Predicting sample: 1  ----------
Predicting sample: 2  ----------
Predicting sample: 3  ----------
Predicting sample: 4  ----------
Predicting sample: 5  ----------
--------------------------------
--------------------------------
 
> plot(predicted.local,Yu)

  
 

 
This time I use correlation (as Win ISI use) and try to find the best number of samples to select for the LOCAL algorithm with a sequence. 

As we can see the predictions improve with more samples in the calibration (red dots), maybe could be better win more samples by at the end of the plot it start to stabilize.
 
 
 
 
 

27 jun. 2018

LOCAL Calibrations with Resemble

Really interesting the Resemble package so I am trying to work and understand it better even to help me in the case of Win ISI LOCAL calibrations.
We can get predictions for different combinations of local selected samples for the calibration to predict the unknown, so we can see the best option. We use a certain number of terms (min. and max.) and a weighted average  is calculated.
 
In this case I use an external validation set of petfood Xu with Reference data (protein) Yu, and I want to know the statistics (RMSE and R square) for the case of 90 local samples selected:

predicted.local <- as.numeric(Yu_anl$Nearest_neighbours_90)
> rmse.local <- sqrt(mean((Yu - predicted.local)^2))
> R2.local <- cor(Yu, predicted.local)^2
> R2.local
[1] 0.9507232
> rmse.local
[1] 1.163304
 
plot(predicted.local,Yu)
There are a lot of options to explore, so I will continue checking this package.
 
 
 

26 jun. 2018

Memory based learning methods and tools (Resamble / LOCAL)

This is the link to this presentation which help us to understand the concept of LOCAL that we will treat during next posts with the "Resamble package" and we have treated and we will continue with LOCAL in Win ISI.