11 oct 2021

Modelling complex spectral data (soil) with the resemble package (II)

Continuing with the vignette Modelling complex spectral data with the resemble package (Leonardo Ramirez-Lopez and Alexandre M.J.-C. Wadoux), now it is time to see the predictor variables which are reflectance values of the soil samples acquired in a NIR (Near Infrared Reflectance) instrument in the range from 1100 to 2498 nm in two nm steps, so we have 700 data points. We prepare a vector with the wavelengths and we call it "wav" (same as the vignette).

wavs<-NIRsoil$spc %>% colnames() %>% as.numeric()

Now we can se to the raw spectra (spectra without any treatment):

matplot(x = wavs, y = t(NIRsoil$spc),  
        xlab = "Wavelengths, nm",
        ylab = "Absorbance", type = "l", 
        lty = 1, col = "#5177A133")

Now it is time to treat the spectra. In the vignette, the spectra is reduced in the number of data points and treated with a first derivative (Savitzky Gollay) using a first polynomial order and a window of 5. This reduction and signal improvement techniques are very useful to prepare the spectra "X" matrix for new reduction techniques after, saving this way computation time.

We just have to use the code of the vignette, but of course we are free to use other mathematical treatments to reduce the scatter effects  or improve the resolution.

NIRsoil$spc_p <- NIRsoil$spc %>% 
  #we make a reduction of the number of data points 
resample(wav = wavs, new.wav = seq(min(wavs), 
         max(wavs), by = 5)) %>% 
  #and apply to the spectra the Savitzky Golay function
  #polynomial order =1
  #window = 5
  #first derivative
  savitzkyGolay(p = 1, w = 5, m = 1)

Let´s create a new vector considering the wavelength reduction:

new_wavs <- as.matrix(as.numeric(colnames(NIRsoil$spc_p)))

and plot the spectra to see their appearance:

matplot(x = new_wavs, y = t(NIRsoil$spc_p),
        xlab = "Wavelengths, nm",
        ylab = "1st derivative",
        type = "l", lty = 1, col = "#5177A133")

Now in the data frame "NIRsoil" we have two spectra matrices, the raw spectra (spc) and the spectra reduced and math treated with the SG first derivative (spc_p).

We can check the dimensions of these matrices:

names(NIRsoil)
    "Nt" "Ciso" "CEC" "train" "spc" "spc_p"
dim(NIRsoil$spc)
    825 700
dim(NIRsoil$spc_p)
    825 276

In the next post we will continue the preprocessing process and  preparation of the data as the vignette suggest, trying to understand the different procedures to model, as better as possible, the soil spectral data.

No hay comentarios:

Publicar un comentario