R & Chemometrics: Checking overfitting in validation

3 feb 2021

Checking overfitting in validation

Validation is an important tool to improve the calibration. When developing the calibration we use the cross validation to select the number of terms, and we do not do any further action to avoid overfitting. Later we have the surprise that our predictions for new samples have a bias for certain tip e of samples and the error is much more than expected compared to the SECV (standard error of cross validation) that we use as reference.

This is the case, for example, for wheat bran where the CV for moisture (Humedad) suggested 9 terms, and we keep that decision. Some time later we have 61 new samples and the validation gives this results:

In the case we have chosen 3 terms when we develop the calibration the results would be:

We can see that we have a high improvement.

I suggest you use some more techniques apart from the cross validation to select the number of terms. Lately I am trying with some bootstrap techniques.

R & Chemometrics

3 feb 2021

Checking overfitting in validation

No hay comentarios:

Publicar un comentario