5 feb. 2018

Help of category variables to understand the spectral population

When we work with a data set, it is important to know all the information we can compile about it, so we can create category variables in order to understand better the sample population. In the soy meal data set, I know that some of them come from Brazil, others from USA and the rest... I don´t know where they can from (probably from the same countries, but the samples were not labeled when acquired). So I created a category variable called "Origin" with a group 1 (from Brazil), a group 2 (from USA) and a group 3 (origin unknown).
 
In "R", 1 correspond to black color, 2 to red and 3 to green. We can plot the Mahalanobis distance ellipse in PC1 and PC2 and to see how the samples are grouped.
 
drawMahal(T_msc,center=apply(T_msc,2,mean),
          covariance=cov(T_msc),quantile=0.975,
          col=soy_ift_conv$Origin,
          xlab="PC1",ylab="PC2")
legend("topleft",legend=c("Brazil", "USA","Unknown"),
       col=c("1","2","3"),pch=1, cex=0.8,
       title="Origin")

It´s a pity not to have all the information about the green samples. Anyway consider also another types of categories variables, for example order the samples by the constituent value and add a category for high, medium and low protein.
 

No hay comentarios:

Publicar un comentario