18 may 2019

set.seed function in R and also in Win ISI

It is common to see how at the beginning of some code the "set.feed" function is fixed to a number. The idea of this is to get reproducible results when working with functions which require random sample generation. This is the case for example in Artificial Neural Networks models where the weights are selected randomly at the beginning and after that are changing during the learning process.

Let´s see what happens if set.seed() is not used:
library(nnet)
data(airquality)

model=nnet( Ozone~Wind, airquality  size=4, linout=TRUE )

The results for the weights are:

# weights:  13
initial  value 340386.755571
iter  10 value 125143.482617
iter  20 value 114677.827890
iter  30 value 64060.355881
iter  40 value 61662.633170
final  value 61662.630819
converged

 
If we repeat again the same process:

model=nnet( Ozone~Wind, airquality  size=4, linout=TRUE )

The results for the weights are different:

# weights:  13
initial  value 326114.338213
iter  10 value 125356.496387
iter  20 value 68060.365524
iter  30 value 61671.200838
final  value 61662.628120
converged
 

 
But if we fit the seed to a certain value (whichever you like) .

set.seed(1)
model=nnet( Ozone~Wind, airquality  size=4, linout=TRUE )

 # weights:  13
initial  value 336050.392093
iter  10 value 67199.164471
iter  20 value 61402.103611
iter  30 value 61357.192666
iter  40 value 61356.342240
final  value 61356.324337
converged

 
and repeat the code with the same seed:

set.seed(1)
model=nnet( Ozone~Wind, airquality  size=4, linout=TRUE )

we obtain the same results:

# weights:  13
initial  value 336050.392093
iter  10 value 67199.164471
iter  20 value 61402.103611
iter  30 value 61357.192666
iter  40 value 61356.342240
final  value 61356.324337
converged


SET.SEED es used in Chemometric Programs as Win ISI to select samples randomly:

2 may 2019

Using "tecator" data with Caret (part 4)

I add one more type of regression to the "tecator meat data" in this case is the "Ridge Regression".
Ridge Regression use all the predictors, but penalizes their values in order they can not get high values.

We can see that it not get such as best fitting as the PCR or PLS in the case of spectroscopy data, but it is quite common to use it in other data for Machine Learning Application. Ridge Regression is a type of Regularization where we have two types L1 and L2.

In the plot you can see also the RMSE for the validation set:

Of course PLS works better, but we must try other models and see how the affect to the values.