12 may 2013

Detecting Outliers (Mahalanobis)


This post is to continue with other post (Median Absolute Deviation). We have our score and loading matrix, and I want to check for outliers. We are going to use the Mahalanobis distance for this purpose, using our score matrix.
What is the dimension of our score matrix in this case. We used 4 PCs, so:
> dim(sflw.msc.rpc$scores)
[1] 211   4

We have in the rows the samples, and in the columns the scores of the samples for each PC.
We have seen how to plot the pairs for all the combinations of these four PCs, and now, what I want is to draw ellipses based in the Mahalanobis distance to detect outliers.
It is really helpful to have the book “Introduction to Multivariate Statistical Analysis in Chemometrics”,  from Kurt Varmuza and Peter Filzmozer, I recommend really to have it. They have developed the R package “chemometrics”, let´s use it:

>library(chemometrics)

This is a subset for the firsts PCs: PC1 and PC2

>X.pc1pc2<-sflw.msc.rpc$scores[,1:2]

Now let’s use the function:
 
>drawMahal(X.pc1pc2,center=apply(X.pc1pc2,2,mean),
 +covariance=cov(X.pc1pc2),quantile=0.975)

This plot appears, showing this nice ellipse and some samples out (outliers).

 
 

No hay comentarios:

Publicar un comentario