This plot in 2D, help us to decide the number of PCs, it is easy to create in R, once we have discompose the X matrix into a P matrix (loadings) and a T matrix (scores).
For this plot, we just need the T matrix.
> CPs<-seq(1,10,by=1)
> matplot(CPs,t(Xnipals$T),lty=1,pch=21,
+ xlab="PC_number",ylab="Explained_Var")
Every dot for every vertical line represents the score of a sample for that particular PC. We made the NIPALS calculations for 10 PCs. Every vertical line represents the projections of the samples over that particular PC. The score of a sample for that PC is the distance to the mean.
We can calculate for every PC, the standard deviation for all the scores and the variance.
As we see the firsts 2 PCs represents almost all the variance, and for the rest the projections are becoming narrower.
This plot is good to select how many components to choose, and also to detect outliers, extreme samples,.....
No hay comentarios:
Publicar un comentario