R & Chemometrics: mayo 2019

18 may 2019

set.seed function in R and also in Win ISI

It is common to see how at the beginning of some code the "set.feed" function is fixed to a number. The idea of this is to get reproducible results when working with functions which require random sample generation. This is the case for example in Artificial Neural Networks models where the weights are selected randomly at the beginning and after that are changing during the learning process.

Let´s see what happens if set.seed() is not used:
library(nnet)
data(airquality)
model=nnet( Ozone~Wind, airquality size=4, linout=TRUE )

The results for the weights are:

# weights: 13
initial value 340386.755571
iter 10 value 125143.482617
iter 20 value 114677.827890
iter 30 value 64060.355881
iter 40 value 61662.633170
final value 61662.630819
converged

If we repeat again the same process:

model=nnet( Ozone~Wind, airquality size=4, linout=TRUE )

The results for the weights are different:

# weights: 13
initial value 326114.338213
iter 10 value 125356.496387
iter 20 value 68060.365524
iter 30 value 61671.200838
final value 61662.628120
converged

But if we fit the seed to a certain value (whichever you like) .

set.seed(1)
model=nnet( Ozone~Wind, airquality size=4, linout=TRUE )

# weights: 13
initial value 336050.392093
iter 10 value 67199.164471
iter 20 value 61402.103611
iter 30 value 61357.192666
iter 40 value 61356.342240
final value 61356.324337
converged

and repeat the code with the same seed:

set.seed(1)
model=nnet( Ozone~Wind, airquality size=4, linout=TRUE )

we obtain the same results:

# weights: 13
initial value 336050.392093
iter 10 value 67199.164471
iter 20 value 61402.103611
iter 30 value 61357.192666
iter 40 value 61356.342240
final value 61356.324337
converged

SET.SEED es used in Chemometric Programs as Win ISI to select samples randomly:

2 may 2019

Using "tecator" data with Caret (part 4)

I add one more type of regression to the "tecator meat data" in this case is the "Ridge Regression".
Ridge Regression use all the predictors, but penalizes their values in order they can not get high values.

We can see that it not get such as best fitting as the PCR or PLS in the case of spectroscopy data, but it is quite common to use it in other data for Machine Learning Application. Ridge Regression is a type of Regularization where we have two types L1 and L2.

In the plot you can see also the RMSE for the validation set:

Of course PLS works better, but we must try other models and see how the affect to the values.

R & Chemometrics

18 may 2019

set.seed function in R and also in Win ISI

10 may 2019

Quitando el scatter con un segundo término en MLR

2 may 2019

Using "tecator" data with Caret (part 4)