Once we have the terms, samples are projected over the several PC terms and every sample has a score for every term. Therefore, we have a score matrix with “N” samples (rows) and “A” components (columns).
This variance can be due to different sources or mixture of sources.
In the case of PLS we are looking for a compromise explaining the maximum possible variance in X, at the same time that we explain a maximum variance in Y. We have also a score matrix when developing the PLS algorithm and this scores have more correlation with the constituent that the scores calculated with PC.
In the case of the soy meal in the conveyor, we can calculate the correlation between the scores for every of the four PC and the protein:
We can do the same, but with the scores of the PLS regression:
As I can see the correlations are higher for the PLS, but there are some curiosities about the PC scores that we can try to check yin future posts.