Once we
have the terms, samples are projected over the several PC terms and every
sample has a score for every term. Therefore, we have a score matrix with “N”
samples (rows) and “A” components (columns).
This variance
can be due to different sources or mixture of sources.
In the
case of PLS we are looking for a compromise explaining the maximum possible
variance in X, at the same time that we explain a maximum variance in Y. We
have also a score matrix when developing the PLS algorithm and this scores have
more correlation with the constituent that the scores calculated with PC.
In the
case of the soy meal in the conveyor, we can calculate the correlation between
the scores for every of the four PC and
the protein:
> cor(scores_4t_pc[,1:4],soy_ift_prot1r1$Prot)
[,1]
PC term 1 -0.2105997PC term 2 0.3445256
PC term 3 0.1647146
PC term 4 -0.6888083
We can
do the same, but with the scores of the PLS regression:
> cor(Prot_plsr_r1$scores[,1:4],soy_ift_prot1r1$Prot)
[,1]
Comp 1 0.2129742Comp 2 0.4193727
Comp 3 0.5348858
Comp 4 0.4425912
As I can
see the correlations are higher for the PLS, but there are some curiosities
about the PC scores that we can try to check yin future posts.
No hay comentarios:
Publicar un comentario