This is the first of a serial of posts coming about the
resamble package. This package comes with a NIR spectra data set called
NIRsoil which contains a training set and a validation set. As usual when we
have spectra is always useful to look at it.
You can follow the examples in the reference manual, in this case I follow the examples, but I change the names of the data sets.
We overplot the training spectra in black with the validation
spectra in blue, in this case without any math-treatments:
require(prospectr)
data(NIRsoil)
wavelength<-seq(1100,2498,by=2)
X_train <- NIRsoil$spc[as.logical(NIRsoil$train),]
matplot(wavelength,t(X_train),type="l",col="black",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
X_val <- NIRsoil$spc[!as.logical(NIRsoil$train),]
par(new=T)
matplot(wavelength,t(X_val),type="l",col="blue",
xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
It is clearly seen that the scatter affects the shape of the spectra, so we could apply a anti-scatter math-treatment, like SNV+DT using one of the functions of the "prospectr" package:
X_train_snvdt<-detrend(X=X_train,wav=as.numeric(colnames(X_train)))
X_val_snvdt<-detrend(X=X_val,wav=as.numeric(colnames(X_val)))
matplot(wavelength,t(X_train_snvdt),type="l",xlab="",ylab="",
col="black",ylim=c(-1,3))
par(new=T)
matplot(wavelength,t(X_val_snvdt),type="l",xlab="",ylab="",
col="blue",ylim=c(-1,3))
In this occasion I want to test how resamble performs what I know as Local regression where for every sample in the validation set a specific model is performed with similar samples founded in the training set, with a neighbourhood criteria. We can select the number of sample choose for the model, and the maximum number of factor for the PLS terms or principal components.
In this case I want to check the function "getPredictions".
We have to use lab values that we have in the reference matrix "Y".
X_val <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Y_val <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Y_train <- NIRsoil$CEC[as.logical(NIRsoil$train)]
X_train <- NIRsoil$spc[as.logical(NIRsoil$train),]
X_val <- X_val[!is.na(Y_val),]
Y_val <- Y_val[!is.na(Y_val)]
X_train <- X_train[!is.na(Y_train),]
Y_train <- Y_train[!is.na(Y_train)]
An now we will follow the functions in the reference manual with the same configuration. The idea is in other examples to get into more details in the configuration of this functions.
ctrl <- mblControl(sm = "pls",
pcSelection = list("opc", 40),
valMethod = c("NNv"),
scaled = TRUE, center = TRUE)
ex1 <- mbl(Yr = Y_train, Xr = X_train, Yu = NULL, Xu = X_val,
mblCtrl = ctrl,
distUsage = "predictors",
k = seq(30, 150, 15),
method = "wapls1",
pls.c = c(7, 20))
predictions<-getPredictions(ex1)
the prediction´s matrix contains the results for the validation spectra with different models developed with 30,45,...,150 samples, as we configure in the "mbl" function ( k = seq(30, 150, 15)).
continue in "Tutorials with Resemble (Part 2)