R & Chemometrics: Shootout 2012: Test & Val Sets proyections

7 nov 2012

Shootout 2012: Test & Val Sets proyections

It is obvious (after seeing the spectra of the calibration set), that we have at least three clusters, and that this can be related with the concentration of the active ingredient in the tablets. If we see the scores in the PC1-PC2 score map we will see the three clusters.

I have imported the test set into R, and I did project the test set into the PC1-PC2 score map (developed with the calibration samples), and I found another cluster.

If we read the Chemometrics Shootout rules, we see:

“This year’s challenge will consist in developing the best model for the active

ingredient using the calibration data. However, the most important task will be to build a

model that will be robust to production scale differences. In addition, the quality of the

presentation and the reasoning behind the approach taken will be used to determine the

winner”.

So to predict as accurate as possible this test set is important to approach the challenge.

And what about the Validation Set.We don´t know the reference values, but we can project the samples again into the PC1-PC2 score map (developed with the calibration samples) in order to see more clusters, or if the samples are represented in the Training Set.

As we can see some test and validation samples do not overlap with any samples of the calibration set, so we have to consider this when developing the model.

R is really wonderful making these plots:

Black circles: Calibration Samples

Red triangles: Test Samples

green crosses: Validation samples

R & Chemometrics

7 nov 2012

Shootout 2012: Test & Val Sets proyections

No hay comentarios:

Publicar un comentario