1 dic. 2014

Recalculating the PLSR without outliers

When we developed the regression, we did nor remove any outliers from the calibration set, but now we are going to remove the 5 samples which seem clearly outliers, so we can give to results to the summary of the Shootout 2002, one will be the Standard Errors  of Prediction with all the samples, and other without these 5 samples (19,122,126,127 and 150).

These five samples are the same in the Training Set scanned in Instrument 1 and the Training Set scanned in Instrument 2, so it is clear that the problem is that the lab value does not correlate as the others with the spectra.
First, we remove the samples from the Training Set 1:


Now, the new regression model without outliers, and with the math treatments we consider apropiate as MSC + Second derivative:


Comparing the summaries of the models with and without outliers we see the logical improvement.
We decide to use 3 terms in the model to predict the other sets. First we predict the Training Set scanned in Instrument 2, but without the 5 outliers:

monit.tr2 colnames(monit.tr2a)<-c("Y.tr.lab","Y.tr2.pred")

Now with this table we can run the Monitor function:


The results show an improvement in the RMSEP and the SEP statistic tell us the error corrected by the bias. The monitor function now recommend a Bias adjustment.
The distribution of the residuals shows the bias problem, but it is quite uniform once we correct the bias.
N Validation Samples  = 150 
N Calibration Samples = 150 
N Calibration Terms   = 3 
RMSEP    : 3.642 
Bias     : -2.249 
SEP      : 2.875 
UECLs    : 3.327 
***SEP is bellow BCLs (O.K)***
Corr     : 0.9917 
RSQ      : 0.9834 
Slope    : 1.002 
Intercept: 1.874 
RER      : 29.92   Good 
RPD      : 7.759   Very Good 
BCL(+/-): 0.4637 
***Bias adjustment is recommended***
Residual Std Dev is : 2.884 
***Slope adjustment in not necessary***

No hay comentarios:

Publicar un comentario