9 oct. 2015

Is my model performing as expected? (Part 1)

 Hi all, I am quite busy so I have few time to expend on the blog, anyway I have continue working with R trying to develop functions  in order to check if our models perform as expected or not.
Residual plots and the limits (UAL,UWL,LWL,UAL) we draw on them will help us to take decisions, but developing some functions can help us to see suggestions in order to take good decisions.
So I am trying to works on this.
We always want to compare the results from a Host to a Master, the predicted NIR results with the Lab results,….
In all these predictions we have to provide realistic statistics and not too optimistic, if not we will not understand really how our model performs. Validation statistics, and looking to the residual plots will help us to understand if: our standardization is performing fine, if we have a bias problem or if the samples of the validation should be include in the data set and recalibrate again.
In this case is important to know the RMSEP of our calibration which can be the SECV for example (standard error of cross validation), and compare this error with the RMSEP of the validation, and after this with the SEP (validation error corrected by bias).
Is important to see how the samples are distributed in the residual plot into the warning limits (UWL and LWL) and into the action limits (UAL and LAL), are they distributes randomly?, do they have a bias?, if I correct the bias the distribution becomes random and into limits?,.....There are several questions that if we have the correct answer will help us to improve the model, and to
understand and explain to others the results we obtain.
This is a case where the model performs with a Bias:
Validation Samples  = 9
RMSEP    : 0.62
Bias     : -0.593
SEP      : 0.189
Corr     : 0.991
RSQ      : 0.983
Slope    : 0.928
Intercept: 0.111
RER      : 18.8   Fair
RPD      : 7.02   Excellent
BCL(+/-) : 0.143
***Bias adjustment is recommended***
The residual plot confirms that we have a bias:
Using SEP as std dev the residual distibution is:
  Residuals into 68%   prob (+/- 1SEP)    = 0
  Residuals into 95%   prob (+/- 2SEP)    = 1
  Residuals into 99.5% prob (+/- 3SEP)    = 4
  Residuals outside 99.5% prob (+/- 3SEP) = 5
  Samples outside UAL  = 0
  Samples outside UWL  = 0
  Samples inside   WL  = 1
  Samples outside LWL  = 8
  Samples outside LAL  = 5
With Bias correction the Residual Distribution would be:
  Residuals into 68%   prob (+/- 1SEP) =7
  Residuals into 95%   prob (+/- 2SEP) =9
  Residuals into 99.5% prob (+/- 3SEP) =9
  Residuals out  99.5% prob (> 3SEP)   =0
With the bias correction the statistics are better and confirm that probably a non robust standardization has been done with these two instruments that we are comparing.
This can help us to check other standardizations or decide if we need other algorithms as repeatibility file in the calibration or to mix spectra from both instruments.


1 comentario:

  1. With Bias correction the Residual Distribution would be:
    Residuals into 68% prob (+/- 1SEP) = 7 % = 77.8
    Residuals into 95% prob (+/- 2SEP) = 9 % = 100
    Residuals into 99.5% prob (+/- 3SEP) = 9 % = 100
    Residuals outside 99.5% prob (> 3SEP) = 0 % = 0

    Maybe is better to look to the percentage of samples between the different limits, we can see that if the bias is adjusted the distribution is very good, and we only have to confirm this with new samples in the future.