17 jul. 2012

Hierarchical Cluster Analysis (ChemoSpec) - 01

I have been in previous post using the ChemoSpec package for some oil data (olive and sunflower). My spectra has now a range from 1100nm to 2200nm and is raw (not treated mathematically) . I want to start using the ChemoSpec package to start using the “Hierarchical Cluster Analysis” in order to see  some cluster in my data. Of course I hope to see the olive oil in one cluster and the sunflower in the other. But probably other clusters can appear.
Anyway this is just a quick test and sure we´ll get much better knowledge of the data treating the spectra with derivatives (this will be done in another post).
So after importing the “csv” files into R, we can plot the raw spectra (olive oil in red and sunflower in red).

We can see some ranges of the spectra where there is a clear difference, and we explain in previous post that these differences are related to the fatty acids concentration.
 Let´s run the “Hierarchical Cluster Analysis” from ChemoSpec:
hcaSpectra(oils, title = "Raw Spectra / oils")

We can see that 3 samples of sunflower oil are quite different from the others (Olive or sunflower), and that with the rest of the samples there are two cluster (olive and sunflower oil).
We can get some other conclusions, but what I´m going to do is to treat with second derivative the spectra and try to get more conclusions with this set of the spectra at the same time I practice with R.

Related Posts:
Hierarchical Cluster Analysis (ChemoSpec) - 02

2 comentarios:

  1. Could it be that the 3 samples cluster far off because they are not properly aligned (shifts would result in all values having different euclidean distances in the column of the matrix)?

    1. You are totally right, these three samples have a shift in the baseline respect to the others, and that is the reason. It is clear that we have to remove these baseline shifts with, for example a second derivative.