R & Chemometrics: octubre 2015

30 oct 2015

Looking to the Boxplots

Looking to the Boxplots can gives a quick idea of which of the different adjustments would work better. In this case the comparison is between the reference values, the predicted values corrected by the bias, the predicted values corrected by slope/intercept and the predicted values without any correction.
Boxplots help us as well to check the distribution and if there are outliers in the data sets.

22 oct 2015

Improving plots in the Monitor Function

I am trying to find the best way to check the performance of a model comparing the reference values with the predicted values and to see the efect of a bias adjustment, so after working on the function a plot is generated.
I will probably add two more plots, but I would not want to overcharge the plotted information.
I will see.
At the moment the info generated with the function is:

> monitor10c12(Muestra,HUM_NUT,HUM_ING,sortref=TRUE)
WARNING: More than 20 samples are needed to run the Validation 
Validation Samples  = 16 
RMSEP    : 0.742 
Bias     : -0.707 
SEP      : 0.233 
Corr     : 0.989 
RSQ      : 0.979 
Slope    : 0.907 
Intercept: 0.275 
RER      : 16.1   Fair 
RPD      : 6.17   Excellent 
BCL(+/-): 0.124 
      ***Bias adjustment is recommended***
Residual Std Dev is : 0.198 
    ***Slope adjustment is recommended***
Using  SEP as std dev the residual distibution is: 
  Residuals into 68%   prob (+/- 1SEP)    = 0     % = 0 
  Residuals into 95%   prob (+/- 2SEP)    = 3     % = 18.8 
  Residuals into 99.5% prob (+/- 3SEP)    = 7     % = 43.8 
  Residuals outside 99.5% prob (+/- 3SEP) = 9     % = 56.2 
  Samples outside UAL  = 0 
  Samples outside UWL  = 0 
  Samples inside   WL  = 3 
  Samples outside LWL  = 13 
  Samples outside LAL  = 9 
With Bias correction the Residual Distribution would be:
  Residuals into 68%   prob (+/- 1SEP)     = 13     % = 81.2 
  Residuals into 95%   prob (+/- 2SEP)     = 15     % = 93.8 
  Residuals into 99.5% prob (+/- 3SEP)     = 16     % = 100 
  Residuals outside  99.5% prob (> 3SEP)   = 0      % = 0

13 oct 2015

Is my model performing as expected? (Part 2)

Really is true when whe say that a picture explain more than a thousand words, and this can be the case when I was trying to explain in the post "Is my model performing as expected? (Part 1)", the decission that a bias should be adjusted looking to the distribution statistics and errors.

But if we overplot the current residuals without adjustment (red dots), and the residuals with the adjustment (blue dots), we can see how the distribution moves into the warning limits.

plot(res~l,main="Residuals",ylim=c(-5*sep,5*sep),
sub="orange 95% prob / red 99,8% prob",pch=15,col=2,
xlab="sample position",ylab="residual")
abline(h=0,col="blue")
abline(h=(2*sep),col="orange")
abline(h=(-2*sep),col="orange")
abline(h=(3*sep),col="red")
abline(h=(-3*sep),col="red")
par(new=TRUE)
plot(Table2$res.corr1~l,col=3,ylim=c(-5*sep,5*sep),xlab="sample position",ylab="residual")

9 oct 2015

Is my model performing as expected? (Part 1)

Hi all, I am quite busy so I have few time to expend on the blog, anyway I have continue working with R trying to develop functions in order to check if our models perform as expected or not.

Residual plots and the limits (UAL,UWL,LWL,UAL) we draw on them will help us to take decisions, but developing some functions can help us to see suggestions in order to take good decisions.
So I am trying to works on this.

We always want to compare the results from a Host to a Master, the predicted NIR results with the Lab results,….

In all these predictions we have to provide realistic statistics and not too optimistic, if not we will not understand really how our model performs. Validation statistics, and looking to the residual plots will help us to understand if: our standardization is performing fine, if we have a bias problem or if the samples of the validation should be include in the data set and recalibrate again.

In this case is important to know the RMSEP of our calibration which can be the SECV for example (standard error of cross validation), and compare this error with the RMSEP of the validation, and after this with the SEP (validation error corrected by bias).
Is important to see how the samples are distributed in the residual plot into the warning limits (UWL and LWL) and into the action limits (UAL and LAL), are they distributes randomly?, do they have a bias?, if I correct the bias the distribution becomes random and into limits?,.....There are several questions that if we have the correct answer will help us to improve the model, and to
understand and explain to others the results we obtain.

This is a case where the model performs with a Bias:

Validation Samples = 9

RMSEP : 0.62

Bias : -0.593

SEP : 0.189

Corr : 0.991

RSQ : 0.983

Slope : 0.928

Intercept: 0.111

RER : 18.8 Fair

RPD : 7.02 Excellent

BCL(+/-) : 0.143

***Bias adjustment is recommended***

The residual plot confirms that we have a bias:

Using SEP as std dev the residual distibution is:

Residuals into 68% prob (+/- 1SEP) = 0

Residuals into 95% prob (+/- 2SEP) = 1

Residuals into 99.5% prob (+/- 3SEP) = 4

Residuals outside 99.5% prob (+/- 3SEP) = 5

Samples outside UAL = 0

Samples outside UWL = 0

Samples inside WL = 1

Samples outside LWL = 8

Samples outside LAL = 5

With Bias correction the Residual Distribution would be:

Residuals into 68% prob (+/- 1SEP) =7

Residuals into 95% prob (+/- 2SEP) =9

Residuals into 99.5% prob (+/- 3SEP) =9

Residuals out 99.5% prob (> 3SEP) =0

With the bias correction the statistics are better and confirm that probably a non robust standardization has been done with these two instruments that we are comparing.

This can help us to check other standardizations or decide if we need other algorithms as repeatibility file in the calibration or to mix spectra from both instruments.