29 may. 2012

Mahalanobis distance with "R" (Exercice)

I have developed this exercise with Excel in another post for the same calculations , I am going to develop  it this time with  "R".
These are data of lead concentration in fish

     Age Length Weight   mg/Kg

1    28    31   130.0    68.12
2    24    28   143.0   127.89
3    28    20   136.0    89.03
4    32    34   130.5    78.28
5    22    15   125.0   134.08
6    26    37   147.5   135.31
7    24    19   135.0   130.48
8    28    22   125.0    86.48
9    24    26   127.0   129.47
10   30    21   139.0    82.43
11   22    20   121.5   127.41
12   30    38   150.5    71.21
13   24    17   120.0   132.06
14   26    20   125.0    90.85

We import the data into R.
 x<-read.table("C:\\lead_fish.txt",header=TRUE)
We are going to apply the Mahalanobis Distance formula:
D^2 = (x - μ)' Σ^-1 (x - μ)
We calculate μ (mean) with:
mean<-colMeans(x)
   Age     Length     Weight     mg/Kg
 26.28571  24.85714 132.50000 105.93571
We calculate Σ (covariance matrix (Sx)) with:
Sx<-cov(x)
> Sx
         Age       Length     Weight   mg/kg
Age
    9.758242   12.81319  12.07692 -72.15407
Length  12.813187  56.90110  49.11538 -70.62066
Weight  12.076923  49.11538  92.80769 -46.06962
mg/Kg  -72.154066 -70.62066 -46.06962 714.00118
The default value for the Mahalanobis function is inverted=FALSE, so the function will calculate the inverse of Sx. If we calculated appart remember to change to TRUE.
See R help:

O.K. Let´s go:
>D2<-mahalanobis(x,mean,Sx)
> D2
 [1] 5.571677 2.863499 2.686127 7.766153 2.379621 6.366793 2.135347 1.538248
 [9] 2.018812 5.143830 3.082734 5.470313 3.158651 1.818195

These are the values in the Diagonal Matrix we saw with the calculations in Excel.






 

4 comentarios:

  1. What is edad, long, peso, mg.kg ?

    It would help to have just a basic understanding of what the data represents.

    ResponderEliminar
  2. I have change the headers to english. This is data I have found in a video on Youtube with not other details.There are samples of fish giving theis age, weight, length and concentration of Lead in "ppm". I will add new exercices better documented in future posts.

    ResponderEliminar
  3. Hi! I am interested in evaluate the statistical distance to measeure the difference between two multivariate means by the Mahalanobis distance. I am following the paper: "Statistical assesment of mean differencies between two dissolution data sets". Yi Song, Drug Information Journal, 1996.

    I am struggling with the Multiple timer point dissolution. Any help could be really appreciate.
    Thanks! Elba

    ResponderEliminar
  4. Best post about Mahalanobis in "R"!!
    But I have a doubt........
    Is there an easy way to calculate the full Mahalanobis matrix with R, not only the main diagonal?
    like:
    [,1] [,2]
    [1,] 5.57 -0.7 ...
    [2,] -0.7 2.86....
    [3,] ..................

    thanks!

    ResponderEliminar