14 sept. 2018

Monitoring Validations (Case 001)

I use R regularly for study the validations of different equations, in this case is an equation of cereals which include (barley, wheat, rye, corn, oat, triticale, ..). The monitor function in this case compare the starch values of an instrument consider as the Master (Y axis) and other consider as the Host (X axis).
 
The idea is to check if there are  differences which are important in order to take an action to adjust Bias or Slope and Intercept, or also consider to standardize the instruments.
 
In this case the Monitor function gives a warning to check if there are groups or extreme samples which can recommends the adjustment of slope and intercept.
 
 
And really there is a gap with two groups of samples, so we have to consider in this case what is happening: We have a group of barley samples with lower values of starch and a group with the wheat and corn samples with higher starch values.
 
In order to evaluate better we have to make subsets and check what is going on with the predictions statistics by groups and proceed the best way.
 


30 ago. 2018

Comparing Posteriors: Estimating Practical Differences Between Models


Is not the first time Max Kuhn appears in this blog and this time with a lecture (in the last New York R Conference) about advices to estimate what is the best model based on R statistics. Sure we can get good advices  to find the best model possible for our data sets.

29 ago. 2018

2018 New York R Conference Highlights


On April 20 this year the New York R Conference has been celebrated with a great  success.
Just look to the great atmosphere in the video of the conference.

16 ago. 2018

Checking the slopes in Validation Groups

This is an study to develop calibrations for meat in a reflectance instrument from 1100 to 1650 nm. Normally meat are measured in transmitance but this is an approach to do it in reflectance.
 
I have just 64 samples with fat laboratory data. I split the spectra into 4 sets of 16 samples and merge 3 of leaving the other three for external validation. So I have 48 samples for training and 16 for validation and I can develop four calibrations and validate with 4 external sets.
 
Considering that we have few samples are in the training set, I have to use few terms. The SEPs for external or Cross Validation are quite high , but the idea here is to see the changes in the slope for the four validation sets.
 
The reason is that we have few samples and the slope value will stabilize as soon as more samples are included into the calibration and validation sets.
 
 
 
To improve the SEP we have to check the sample presentation method for this product and the procedure to obtain the laboratory reference method.

3 ago. 2018

Monitoring the performance with the histogram

NIR can be used to detect levels food additives and check if they are in the right limits.
In this cases there are several types of doughs, and they use two levels of additive concentration depending on the type. So we have always the same reference data.
A calibration is developed and we have new data to validate. NIR will give  results which I expect to be covering the reference value with a Gauss distribution.
Using the Monitor function I can see the prediction distribution vs. the reference distribution and check if the expectations are fine.
 

In the case of the higher concentration is fine, and in the lower concentration is skewed (that is why the S/I adjust is suggested).This can be a first approach to continue with this application with mor accurate reference values.

13 jul. 2018

Validating Resemble Model with Monitor function

Continuing with this post evaluating the LOCAL model developed in Resemble. This time I use the Monitor function (one of the Monitor functions I am developing).

I create different subsets from the validation sample set for the different categories, In this case is for one time of puppies, and I am evaluating the moisture content. We can see that there are two outliers that increase the SEP, so we have to see if we remove this samples for some reasons.

Let´s validate first with this type of puppy subset and check the statistics:

> val1.moi.pup1<-subset(val1.moi,ID1u_moi=="PUPPY-1")
> val1.moi.pup1<-cbind.data.frame(val1.moi.pup1$Sample.u_moi, 
+                                 val1.moi.pup1$Yu_moi, 
+                                 val1.moi.pup1$predicted.moi.local)
 
> monitor10c24xyplot(val1.moi.pup1)



Samples with the Sample IDs 463 y 456 are out of the action limits and the monitor function shows their position in the table:
 
$ResWarning [1] id ref pred res <0 rows> (or 0-length row.names) $ResAction id ref pred res 34 456 3.7 7.793351 -4.093351 32 463 4.9 7.881543 -2.981543

Now we can remove this samples knowing their position and recalculate:
val1.moi.pup1<-val1.moi.pup1[-c(32,34),]
monitor10c24xyplot(val1.moi.pup1)





6 jul. 2018

Plots in Resemble (part 2)

Good results for the prediction of the validation samples (Xu, Yu) for protein. This is the XY plot where we can see in different colors the classes of the validation samples (different types of petfood). the SEP is 0.88 (without removing outliers) . Defining the data frame by classes will allow us to see the SEP for every class so we can check which class needs for more samples in the training database (Xr, Yr) or to check for other reasons.

plot(predicted.local,Yu_prot,col=val1.prot$ID1u_prot,lwd=2)

El error SEP de las muestras de validación de proteína es   :  0.887 
El R cuadrado para las muestras de validación de proteína es:  0.962 
 
 

29 jun. 2018

Plots in Resemble (Part 1)

Resemble allow a certain number of plots which are very useful for your works or personal papers. In this case I use the same sets than of the previous post and I plot the PCA scores, where I can see the training matrix (Xr) scores and the validation matrix (Xu) scores overlapped.
 
Validation set is a 35% (randomly selected) of the whole sample population obtained from a long time period.
 
We can see how the validation samples cover more or less the space of the training samples.

> par(mfrow=c(1,2))
> plot(local.mbl, g = "pca", pcs=c(1,2)) 
> plot(local.mbl, g = "pca", pcs=c(1,3))