28 feb. 2016

Tutorials with Resemble (Part 1)

This is the first of a serial of posts coming about the resamble package. This package comes with a NIR spectra data set called NIRsoil which contains a training set and a validation set. As usual when we have spectra is always useful to look at it.
You  can follow the examples in the reference manual, in this case I follow the examples, but I change the names of the data sets.
We overplot the training spectra in black with the validation spectra in blue, in this case without any math-treatments:
require(prospectr)
data(NIRsoil)

wavelength<-seq(1100,2498,by=2)
X_train <- NIRsoil\$spc[as.logical(NIRsoil\$train),]
matplot(wavelength,t(X_train),type="l",col="black",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
X_val <- NIRsoil\$spc[!as.logical(NIRsoil\$train),]
par(new=T)
matplot(wavelength,t(X_val),type="l",col="blue",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))

It is clearly seen that the scatter affects the shape of the  spectra, so we could apply a anti-scatter math-treatment, like SNV+DT  using one of the functions of the "prospectr" package:
X_train_snvdt<-detrend(X=X_train,wav=as.numeric(colnames(X_train)))
X_val_snvdt<-detrend(X=X_val,wav=as.numeric(colnames(X_val)))
matplot(wavelength,t(X_train_snvdt),type="l",xlab="",ylab="",
col="black",ylim=c(-1,3))
par(new=T)
matplot(wavelength,t(X_val_snvdt),type="l",xlab="",ylab="",
col="blue",ylim=c(-1,3))

In this occasion I want to test how resamble performs what I know as Local regression where for every sample in the validation set a specific model is performed with similar samples founded in the training set, with a neighbourhood criteria. We can select the number of sample choose for the model, and the maximum number of factor for the PLS terms or principal components.
In this case I want to check the function "getPredictions".
We have to use lab values that we have in the reference matrix "Y".
X_val <- NIRsoil\$spc[!as.logical(NIRsoil\$train),]
Y_val <- NIRsoil\$CEC[!as.logical(NIRsoil\$train)]
Y_train <- NIRsoil\$CEC[as.logical(NIRsoil\$train)]
X_train <- NIRsoil\$spc[as.logical(NIRsoil\$train),]
X_val <- X_val[!is.na(Y_val),]
Y_val <- Y_val[!is.na(Y_val)]
X_train <- X_train[!is.na(Y_train),]
Y_train <- Y_train[!is.na(Y_train)]
An now we will follow the functions in the reference manual with the same configuration. The idea is in other examples to get into more details in the configuration of this functions.
ctrl <- mblControl(sm = "pls",
pcSelection = list("opc", 40),
valMethod = c("NNv"),
scaled = TRUE, center = TRUE)
ex1 <- mbl(Yr = Y_train, Xr = X_train, Yu = NULL, Xu = X_val,
mblCtrl = ctrl,
distUsage = "predictors",
k = seq(30, 150, 15),
method = "wapls1",
pls.c = c(7, 20))
predictions<-getPredictions(ex1)
the prediction´s matrix contains the results for the validation spectra with different models developed with 30,45,...,150 samples, as we configure in the "mbl" function ( k = seq(30, 150, 15)).

continue in "Tutorials with Resemble (Part 2)

3 comentarios:

1. Hola Jose, mira que al copiarla la 4 linea delcodigo lineas de codigo matplot(wavelength,t(X_train),type="l",col="black",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))

y no lo logro plotear.