I will write during the next days some posts about a famous exercise of Unscrambler describe in the book "Multivariate Data Analysis - in practice", in order to help myself improving my knowledge about this software.
This exercise has raspberry samples from 4 different locations and harvested at 3 different times.
The names of the files are C”a”H”b” where C is the indication for the location and "a" has a value of 1 for location 1, 2 for location 2, 3 for location 3, and 4 for location 4.
H is the indication for Harvest time and "b" has a value of 1 for the early harvest, 2 for the middle harvest and 3 for the late harvest.
When developing a PCR (X variables= a serial of sensory parameters, Y =average value of the preference of 114 consumers for each sample), the scores and loadings are calculated as PCA.
We visualize a group for samples harvested early (H1), clearly in the plot PC1 vs PC2.
We see the variance explained by the taste variations along PC1 (48%), 28% along PC2 and 21% along PC3.
“Y” variable is not well represented by PC1 (only 1%), but the variance explained for “Y” in PC2 is 57% and 34% for PC3.
We see how sweetness has a small loading in PC1 vs PC2 (consider as not important), but it becomes an important variable along PC3.
We can see correlations between the “X” variables:
Which variable/s, is/are inverse correlated with “thickness”? Redness and color are inverse correlated, which is a characteristic of maturity (late harvest).
Which samples are more thickness (harvested early, middle or late), why? It is clear that sample harvested early, and the samples harvested late are less value for this parameter.
We can get a lot of conclusions from these plots if we study them carefully, as which samples and from which places are preferred by the consumers. We see how samples from places 1 and 3 harvested late are preferred by their red intensity color.
See in this link details from CAMO about this Jam data set: http://www.camo.com/products/unscrambler/trial.swf