R & Chemometrics: Dividing the Sample Set in two (Validation & Training)

29 mar 2012

Dividing the Sample Set in two (Validation & Training)

We have in the Demo sample set “66” samples. In this post we´ll see one way to divide the set in two parts: one for “Validation” and another for Training or Calibration.

The selection will be random. And we are going to use the command: “sample”. I decided to select 10 samples for validation, and the rest for training.

demo_raw_val<-demo_raw[sample(66,10),]

If you repeat this sentence several times, you will get different sets every time.

In my case the samples selected are:

Samples: 25,50,8,49,39,12,16,63,35 y 41

These samples are in rows, and we have to create a training set removing them:

demo_raw_train<-demo_raw[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]

We will create the same sample sets for the other data frame with math treatments:

demo_msc_train<-demo_msc[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]

demo_snv_train<-demo_snv[c(-25,-50,-8,-49,-39,-12,-16,-63,-35,-41),]

demo_msc_val<-demo_msc[c(25,50,8,49,39,12,16,63,35,41),]

demo_snv_val<-demo_msc[c(25,50,8,49,39,12,16,63,35,41),]

It is important to look to the summary of the sample sets to check and compare the statistics for the different constituents.

Or to look to the distribution plots, like in this case for moisture:

R & Chemometrics

29 mar 2012

Dividing the Sample Set in two (Validation & Training)

No hay comentarios:

Publicar un comentario