29 feb. 2016

Tutorials with Resemble (Part 2)

In the first post of the Resemble tutorials we get with this function:
ex1 <- mbl(Yr = Y_train, Xr = X_train, Yu = NULL, Xu = X_val,
mblCtrl = ctrl,
distUsage = "predictors",
k = seq(30, 150, 15),
method = "wapls1",
pls.c = c(7, 20))

the predictions for every spectra in the NIRsoil validation set.
Now we can use the function "plot.mbl":
plot(x, g = c("validation", "pca"), param = "rmse", pcs = c(1,2), ..)
to look to some interesting plots that help us to understand better the performance of the different models:
The first plot shows the errors (RMSE) with the different configurations of neighbors:

As we can see with 120 we have the lowest error.
The next plot shows the Score plot  (PC1 vs PC2) with the training samples (Xr) in one color and the validation samples (Xu) in a different color.
We can configure the function to plot different maps of scores changing in "pcs" the numbers.
We can see with also the correlation plot, changing the param for "r2" indeed "rmse".

28 feb. 2016

Tutorials with Resemble (Part 1)

This is the first of a serial of posts coming about the resamble package. This package comes with a NIR spectra data set called NIRsoil which contains a training set and a validation set. As usual when we have spectra is always useful to look at it.
You  can follow the examples in the reference manual, in this case I follow the examples, but I change the names of the data sets.
We overplot the training spectra in black with the validation spectra in blue, in this case without any math-treatments:
require(prospectr)
data(NIRsoil)

wavelength<-seq(1100,2498,by=2)
X_train <- NIRsoil\$spc[as.logical(NIRsoil\$train),]
matplot(wavelength,t(X_train),type="l",col="black",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
X_val <- NIRsoil\$spc[!as.logical(NIRsoil\$train),]
par(new=T)
matplot(wavelength,t(X_val),type="l",col="blue",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))

It is clearly seen that the scatter affects the shape of the  spectra, so we could apply a anti-scatter math-treatment, like SNV+DT  using one of the functions of the "prospectr" package:
X_train_snvdt<-detrend(X=X_train,wav=as.numeric(colnames(X_train)))
X_val_snvdt<-detrend(X=X_val,wav=as.numeric(colnames(X_val)))
matplot(wavelength,t(X_train_snvdt),type="l",xlab="",ylab="",
col="black",ylim=c(-1,3))
par(new=T)
matplot(wavelength,t(X_val_snvdt),type="l",xlab="",ylab="",
col="blue",ylim=c(-1,3))

In this occasion I want to test how resamble performs what I know as Local regression where for every sample in the validation set a specific model is performed with similar samples founded in the training set, with a neighbourhood criteria. We can select the number of sample choose for the model, and the maximum number of factor for the PLS terms or principal components.
In this case I want to check the function "getPredictions".
We have to use lab values that we have in the reference matrix "Y".
X_val <- NIRsoil\$spc[!as.logical(NIRsoil\$train),]
Y_val <- NIRsoil\$CEC[!as.logical(NIRsoil\$train)]
Y_train <- NIRsoil\$CEC[as.logical(NIRsoil\$train)]
X_train <- NIRsoil\$spc[as.logical(NIRsoil\$train),]
X_val <- X_val[!is.na(Y_val),]
Y_val <- Y_val[!is.na(Y_val)]
X_train <- X_train[!is.na(Y_train),]
Y_train <- Y_train[!is.na(Y_train)]
An now we will follow the functions in the reference manual with the same configuration. The idea is in other examples to get into more details in the configuration of this functions.
ctrl <- mblControl(sm = "pls",
pcSelection = list("opc", 40),
valMethod = c("NNv"),
scaled = TRUE, center = TRUE)
ex1 <- mbl(Yr = Y_train, Xr = X_train, Yu = NULL, Xu = X_val,
mblCtrl = ctrl,
distUsage = "predictors",
k = seq(30, 150, 15),
method = "wapls1",
pls.c = c(7, 20))
predictions<-getPredictions(ex1)
the prediction´s matrix contains the results for the validation spectra with different models developed with 30,45,...,150 samples, as we configure in the "mbl" function ( k = seq(30, 150, 15)).

continue in "Tutorials with Resemble (Part 2)