19 jun. 2017

Comparing Residuals, GH and T when validating

When looking to the validation statistics is important to look at the same time to three values: Residual, GH and T value for every sample. From this data (fiber), we can check if our sample is extrapolating badly, it is not robust or any other issues.

In this case, as we can see there are samples with a very high GH and we can see that those samples have in common that the T statistic is negative (in the left tail of the Gaussian Bell) and the value is quite high also for the T.
These samples have also the highest residiual values.
 Something is telling us that this samples have something special and are not well represented by the equation. PCA is warking fine and is detecting these samples as outliers, but we need to know what makes tese samples special.

These samples are soy meal and have  highest fat value as the ones in the calibration so the Model did not learn enough about the interaction between the fiber bands and fat bands. So this samples are very interested to make the calibtration more robust.

After checking this, we can add these samples to the calibration to improve the results of the next validation.

Graphically in Excel we can se the interaction between the Residuals, GHs and T values:

