14 sept. 2012

Unscrambler (Jam Exercise) - 004

In the posts:
I ´ve been practicing Unscramber with some of the Demo files (Jam), used in the book “Multivariate Data Analysis - in practice” and following the tutorials.
I continue in this post with an important part: Compare the models in order to be sure which one is better, PCR or PLS1, to predict the Y parameter “preference”. For this is clear that we have to look to the residual variance left by the models, taking into account of course the number of terms, over-fitting,…
If we have a look to the plot for the Y residual variance for the PCR, we see an increase in the residual variance for the first PC. That is not good….but think about it.
The PCA does not take into account the Y matrix, so the first PC can be related to some important X structure which cannot be related to the Y parameter. Once extracted, the second PC correlates better with the Y matrix,but still not as good as the first PLS1 term . So this type of plots helps us to understand what is happening.
 Let´s see now the PLS1 residual variance plot for Y, we have a much better prediction with the first term, because the Y matrix was a part of the calculation process in the PLS1.
We have to decide for the model, the best number of terms, and software’s as Unscrambler can decide by you the best option, but you can change the number up or down. You have the control, but we have to check more plots and statistics, before to decide the best option.

