## 9 feb. 2012

### "R": Predicting a Test Set (Gasoline)

> data(gasoline)
> #60 spectra of gasoline (octane is the constituent)
> #We divide the whole Set into a Train Set and a Test Set.

> gasTrain<-gasoline[1:50,]
> gasTest<-gasoline[51:60,]

> #Let´s develop the PLSR with the Tain Set and LOO CV
> gas1<-plsr(octane~NIR,ncomp=10,data=gasTrain,validation="LOO")
> summary(gas1)
Data:   X dimension: 50 401
Y dimension: 50 1
Fit method: kernelpls
Number of components considered: 10

VALIDATION: RMSEP
Cross-validated using 50 leave-one-out segments.
(Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
CV           1.545    1.357   0.2966   0.2524   0.2476   0.2398   0.2319
adjCV        1.545    1.356   0.2947   0.2521   0.2478   0.2388   0.2313
7 comps  8 comps  9 comps  10 comps
CV      0.2386   0.2316   0.2449    0.2673

TRAINING: % variance explained
1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps  8 comps
X         78.17    85.58    93.41    96.06    96.94    97.89    98.38    98.85
octane    29.39    96.85    97.89    98.26    98.86    98.96    99.09    99.16
9 comps  10 comps
X         99.02     99.19
octane    99.28     99.39

> #For this exercice we decide 3 components
> #Let´s predict our Test Set with this 3 components Model.

> predict(gas1,ncomp=3,newdata=gasTest)
, , 3 comps     octane
51 87.94907
52 87.30484
53 88.21420
54 84.86945
55 85.24244
56 84.57502
57 87.37650
58 86.78971
59 89.10282
60 86.97223

> #To Plot these data:
>predplot(gas1,ncomp=3,newdata=gasTest,asp=1,line=TRUE)

> #Let´s look to the RMSEP Statistic.This is very nice tool to decide if 3 components is fine or we can choose more or less components.
> RMSEP(gas1,newdata=gasTest)
(Intercept)      1 comps      2 comps      3 comps      4 comps      5 comps
1.5369       1.1696       0.2445       0.2341       0.3287       0.2780
6 comps      7 comps      8 comps      9 comps     10 comps
0.2703       0.3301       0.3571       0.4090       0.6116

> #It´s fine, we can also consider to choose only two.The RMSEP is 0,234.
> #The CV for the Model with 3 components was: 0,252.
> #Really R is a wonderful tool to develop regressions, and to    understand better all what is behind the algorithms.
> #We can get a lot of literature on internet to start working with R.
> #Thanks to Bjorn-Helge Mevik & Ron Wehres for their good   tutorials about the PLS Package, they help me to understand better this program and to continue learning,(I have ordered some books).