R & Chemometrics: Splitting spectral data into training and test sets

9 feb 2018

Splitting spectral data into training and test sets

It is common to split the spectral data into a validation or test set and a calibration or training set. This can be done in different ways (random, structurally,...), selecting different percentages for each.

This is a simple case and we are going to select 50% of the samples for the calibration or training and the rest (the other 50%) for validation or test. This way we have a Training Set and a Validation test.

One simple way to proceed is to sort the samples randomly or structurally (sort by constituent value, date of acquisition, type of product,....), and select the odd samples for the Training Set and the even samples for the Test Set.

## DIVIDE DATA SETS INTO CALIBRATION AND VALIDATION SETS
##We create a sequence with the odd samples
odd<-seq(1,nrow(X_msc),by=2)
##We create a sequence with the even samples
even<-seq(2,nrow(X_msc),by=2)
#We take the odd samples for the training set
X_msc_tr<-X_msc[odd,]
Prot_tr<-Prot[odd,]
#We take the even samples for the validation set
X_msc_val<-X_msc[even,]
Prot_val<-Prot[even,]
matplot(wavelengths,t(X_msc_tr),type="l",xlab="wavelengths",
        ylab="Absorbance",col="blue")
par(new=TRUE)
matplot(wavelengths,t(X_msc_val),lty=1,
        pch=NULL,axes=FALSE,
        type="l",col="red",xlab="",ylab="")

We can see in the plot the training spectra in blue and the test spectra in red. We will continue practicing with these sets in the next posts.

R & Chemometrics

9 feb 2018

Splitting spectral data into training and test sets

No hay comentarios:

Publicar un comentario