27 abr. 2012

20 free R tutorials


I want to keep this post: 20 free R tutorials (and one reference card) , from the blog Revolutions in a new label created on this blog "R-Tutorials", because it´s a way to find a lot of interesting tutorials about R.
No excuse to start with R:
Download "R"  it´s free and start practising with the tutorials.

21 abr. 2012

R^2 Spectrum

We have seen in the previous post, how to calculate the correlation spectrum, but other simple way to show  how the bands correlate to the constituent of interest is to calculate R^2. This way we remove the negative part of the correlation spectrum.
Xmsc<-NIR_msc
Ymoi<-demo_raw$Moisture
cor_spec<-cor(Ymoi,Xmsc[,1:700])
rsq_spec<-(cor(Ymoi,Xmsc[,1:700]))^2
cov_spec<-cov(Ymoi,Xmsc[,1:700])*50
matplot(wave_nir,t(cor_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")
matplot(wave_nir,t(rsq_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")
matplot(wave_nir,t(cov_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")
#We merge the R /R^2/Cov spectrum with the sample spectra treated with MSC.
cor_spec<-rbind(cor_spec,NIR_msc)
rsq_spec<-rbind(rsq_spec,NIR_msc)
cov_spec<-rbind(cov_spec,NIR_msc)
matplot(wave_nir,t(cor_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")
matplot(wave_nir,t(rsq_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")
matplot(wave_nir,t(cov_spec),lty=1,pch="*",xlab="nm",ylab="log(1/R)")

In order to see better the Covariance Spectrum, I multiplied by a factor,We can see how the covariance spectrum gives sharp bands an gives us a better idea where the variation due to moisture is.

18 abr. 2012

"Correlation / Covariance" Spectrum (This time with "R")

I treat this matter with other software´s, and of course you can do the same with "R".
Once I have the spectra of my samples with a math treatment, I want to draw a correlation spectrum to see which wavelengths have better correlation with the constituent of interest.
In this example I want to see the correlation of the wavelengths treated with MSC (Multiple Scatter Correction) respect  to the Moisture value of the Demo file, but only in the NIR range (1100 to 2498 nm = 700 data points).
>Xmsc<-demoNIR_msc$NIRmsc
>Ymoi<-demoNIR_msc$Moisture
>cor_spec<-cor(Ymoi,Xmsc[,1:700])
>matplot(wave_nir,t(cor_spec),lty=1,pch="*",
+ xlab="data points",ylab="log(1/R)")
#We merge the correlation spectrum with
#the sample spectra treated with MSC.
>cor_spec1<-rbind(cor_spec,waveNIR_msc)
>matplot(wave_nir,t(cor_spec1),lty=1,pch="*",
+ xlab="data points",ylab="log(1/R)")

We can se the correlation plot between -1.00 and 1.00. We are interested in positive correlations, so it is easy to see where the bands for moisture are (combination band 1940nm aprox. and first overtone 1450 nm aprox.).
We can calculate also the covariance spectrum for moisture:
>cov_spec<-cov(Ymoi,Xmsc[,1:700])
>matplot(wave_nir,t(cov_spec),lty=1,pch="*",
+ xlab="data points",ylab="log(1/R)")


See also:
R^2 Spectrum

17 abr. 2012

More Spectra patterns (1ª derivative)

In the case of the first derivative for the absortion band, the maximum becomes a cero crossing.
Using SG filters, we can calculate it with R, and to see, like in the last posts, the Corrgram matrix.
Corrgram for the first derivative for this band:
Let´s see the three corrgram patterns together: (MSC, 1ª derivative, 2ª derivative)

Math Spectra Patterns

I was working today with "R" to get more patterns with the Corrgram. In the demo raw spectra I wanted this time to look to a band as much Gaussian as possible. I select it and trim the spectra to that region treated with the MSC ("Multiple Scatter Correction").
After I decided to apply another Math treatment (SG filter 2ª derivative). With the 2ª derivative the peak maximum in MSC, change to peak minimum.
The idea of this post is to look to the patterns and understand this with the wavelengths correlation matrix:
With MSC Treatment:
With SG 2ª derivative:

13 abr. 2012

CORRGRAM: Correlation Matrix (Wavelengths)

With the "Corrgram" package we can see patterns that can help us to recognize possible inter-correlations in a big matrix. This could be the case to see the correlation to every wavelength respect to all others. This way we can see the high correlation respect to their neighbors and between the different overtones.  
I have selected just 40 (in steps of 2nm) consecutives wavelengths in this case.

12 abr. 2012

CORRGRAM: Correlation Matrix (Constituents)

Thanks a lot to Kevin W., for his comment in my previous post.
Corrgram, it a nice package and I found very nice information to understand it a little bit better on Internet apart from the R help page.

I worked with it for a while and the plots are very nice and useful.
This is one of them with the Demo data for the fish meal parameters (previous posts).

You can choose other options. The links I wrote give you good details.



9 abr. 2012

Correlation Matrix (Constituents)

It is important to understand as better as possible our sample set before to develop the regression. Continuing with the “Y” matrix (constituent’s matrix) we have to observe the correlation matrix.
In the R Graph Gallery, we can get the code to draw a nice Correlation Matrix Plot, with the X-Y plots and the Pearson correlation values, apart from some more details.
I used the demo sample set. If we use the NA values, this code does not give correlation values for the constituents with NA values. If we have “0” indeed NA, correlation values are nor well calculated because the “0” values, so until we can modify this code, I deleted the two samples with “0” values, so we calculate the Correlation Matrix with 64 samples.
It is obvious the high inverse correlation between DM (Dry Matter) and Moisture (Moisture = 100-DM).
This sample set is of fish meal and where the protein content is very important. The way to control the protein is to add or ash, that is the reason for the high negative correlation.
All the conclusions we can get for this plot are important to study latter the loadings, regression coefficients,….

Looking to the difference spectrum

From the previous post, we can make the difference spectrum (once the samples are sorted by moisture) between the sample with the lowest moisture value (position 1), from the sample with the highest moisture value (position 66). This spectra will help us to understand where are the band positions (should be positive) for the moisture. Of course other bands will appear, because these two samples have different values for the other constituents, so we must be careful with this.
It is important to know the inter-correlations between the different constituents.
I multiplied the difference spectra by 10, in order to see better the bands.
min_moi<-moiNIR_msc$NIRmsc[1,]
max_moi<-moiNIR_msc$NIRmsc[66,]
diff_moi<-max_moi-min_moi
diff_moi10<-ext_moi*10
mix_moi<-rbind(min_moi,max_moi,diff_moi10)
matplot(wave_nir,t(mix_moi),lty=1,pch=".",
+ xlab="wavelength",ylab="log(1/R)")
We can see two big positive bands at 1940 nm (combination band), and other at 1450 nm (1º overtone) due to the water.
We can do the same procedure with other math treatments, but the spectra can be more difficult to interpret.
The difference spectra will help us to understand better the loadings spectra.

8 abr. 2012

Sorting the "Sample Sets" by constituents


I use to see the videos from:
http://www.twotorials.com/
and the video:
How to order and sort stuff in R
is really useful to apply this concept to organize and understand better our sample sets before to proceed to develop a calibration.
The idea of this post is after watching the video to create a new dataframe sorted by the "Moisture" constituent in an ascending order.
After that we can subtract the spectra with the highest moisture value from the spectra with the lowest moisture value. This way we can study he difference spectrum in order to get some conclusions about the band positions for moisture, and other constituents.
Let´s start with the "demoNIR_msc" data frame.
As we can see in the video, we can use the functions:
sort:
> sort(demoNIR_msc$Moisture)
 [1] 3.98 5.05 5.20 5.32 5.34 5.41 5.51 5.53 5.57 5.63 5.64 5.67 5.71 5.73 5.77
[16] 5.82 5.85 5.85 5.86 5.87 5.89 5.90 5.91 5.99 5.99 6.04 6.05 6.09 6.10 6.14
[31] 6.16 6.20 6.22 6.23 6.31 6.33 6.33 6.34 6.36 6.38 6.40 6.43 6.47 6.56 6.56
[46] 6.57 6.59 6.61 6.66 6.66 6.69 6.73 6.73 6.84 6.88 6.95 7.07 7.10 7.12 7.30
[61] 7.42 7.43 7.48 8.00 8.12 8.17
order:
> order(demoNIR_msc$Moisture)
 [1]  4 12 15 18 29 14 17 20 36 38 66 37 39 30 63 13 16 33 65  7  3 28  5 10 25
[26] 32  6 56 64 11 21 31 42  8  2 48 49 62 41 58 26 24 59 27 55  9 34 22 54 61
[51]  1 23 46 35 19 47 50 45 51 40 52 53 60 57 44 43
Now we prepare a new dataframe, sorted by moisture values in ascending order:
> moiNIR_msc<-demoNIR_msc[order(demoNIR_msc$Moisture),]
> moiNIR_msc[,1:5]
   Protein   Fat   Ash    DM Moisture
4    74.51 10.51 14.88 96.02     3.98
12   70.24 10.73 18.61 94.95     5.05
15   71.17 12.14 16.26  94.8      5.2
18   71.29 12.08 15.78 94.68     5.32
29   71.71 10.94 16.72 94.66     5.34
14    72.2 11.73 15.64 94.59     5.41
17   70.97 12.82  16.1 94.49     5.51
20   68.95 12.53 16.42 94.47     5.53
36      76 11.37 11.68 94.43     5.57
38   64.56  9.46 25.55 94.37     5.63
66    73.4  9.15 16.82 94.36     5.64
37   78.06  11.5 10.86 94.33     5.67
39   64.46  9.36  26.5 94.29     5.71
30   71.99 10.68 17.12 94.27     5.73
63   72.65 10.14 17.32 94.23     5.77
13   72.43 11.98 14.85 94.18     5.82
16   71.73 12.33  15.3 94.15     5.85
33   76.14 12.67 11.09 94.15     5.85
65   73.95  9.07 16.35 94.14     5.86
7    72.64 10.39 16.44 94.13     5.87
3    73.14 10.51 15.29 94.11     5.89
28   72.46 11.44 15.57  94.1      5.9
5    72.29 10.08 15.39 94.09     5.91
10   73.51 10.53 15.64 94.01     5.99
25   75.05  9.83  14.9 94.01     5.99
32   73.76 10.37 15.45 93.96     6.04
6    70.21 11.06 17.87 93.95     6.05
56   79.07 11.52  8.88 93.91     6.09
64   69.76 10.01 19.81  93.9      6.1
11   71.57 10.65 17.36 93.86     6.14
21   70.66 11.65 17.16 93.84     6.16
31   73.04 11.04  15.5  93.8      6.2
42   71.17  9.57 19.36 93.78     6.22
8    75.19 10.33 14.27 93.77     6.23
2     72.9 14.56 10.95 93.69     6.31
48   67.63   6.5  24.6 93.67     6.33
49   65.18  6.39 28.22 93.67     6.33
62   72.48  10.2 17.43 93.66     6.34
41   73.09 10.74 16.28 93.64     6.36
58      68     0     0 93.62     6.38
26   75.35 11.53 12.91  93.6      6.4
24    70.9 10.66 18.34 93.57     6.43
59   70.64 10.53 17.77 93.53     6.47
27   70.15 10.77 17.16 93.44     6.56
55      77  9.16 13.63 93.44     6.56
9    73.71 10.52 15.24 93.43     6.57
34   66.88  9.37 22.79 93.41     6.59
22   70.58 11.51 17.48 93.39     6.61
54   74.15     9 16.42 93.34     6.66
61   72.04  9.71 17.61 93.34     6.66
1    72.19 15.08 11.76 93.31     6.69
23   70.94 12.07 15.81 93.27     6.73
46    69.2  8.98  21.6 93.27     6.73
35   68.53 11.28 20.09 93.16     6.84
19   70.22 12.15 16.02 93.12     6.88
47   66.63  9.55 24.25 93.05     6.95
50   74.52  8.28 16.67 92.93     7.07
45   64.41  8.06  27.1  92.9      7.1
51   73.47  10.2 16.33 92.88     7.12
40    64.8 10.44 24.54  92.7      7.3
52   71.04  9.47 19.16 92.58     7.42
53   70.75  7.74 20.97 92.57     7.43
60   69.33 10.15 18.68 92.52     7.48
57   68.89 10.73 19.82    92        8
44   64.17  7.27 28.23 91.88     8.12
43   68.81  8.83 21.26 91.83     8.17
We can use the same procedure for any of the other constituents

3 abr. 2012

Vacaciones Semana Santa - 2012 / Holidays: Easter - 2012

Esta semana me la he tomado de vacaciones para venir a mi tierra (Avilés-Asturias), donde el Centro Cultural Niemeyer se ha convertido sin duda en un punto emblemático.
Week of Holidays, visiting Asturias. These are two pictures of the Niemeyer Center in Avilés - Asturias.