29 dic 2021

Getting accurate predictions from large and diverse spectral libraries

 


This is a webinar (organized by Soil.Spectroscopy.Org) I have joined some time ago, and now that I am working with LUCAS database I saw again. 

In my case I have taken apart the Spanish soils from the LUCAS database and I am working with them, finding very interesting conclusions I will share with you all along next year 2022.

Leonardo Ramirez Lopez is a well-known R Chemometric package developer and present a nice simulation about how to work with the Lucas database (where a lot of samples from all the Europe Union Community have been analysed using a unique NIR instrument and a unique laboratory) adding noise to the predictors (spectra) and to the responses (reference values) that is normally the case we will find in most of the situations.




24 dic 2021

Merry Christmas Everybody

Slade was one of the groups I liked, far long ago, and is the group I choose this year to wish  Merry Christmas to all the NIR_quimiometria followers. Have a very happy and safe Christmas.

15 dic 2021

Mahalanobis distance to detect the gypsum samples

With all the soil samples we calculate the Principal Components scores and after the mahalanobis distance of every sample sores to the mean, and we draw a cut-off of 3, which normally is used to select as outliers the samples which are over it.



We can take those samples apart an see the spectra, to check their characteristics and in this case the spectra show that almost all are the gypsum samples, which we could not see clearly when plotted with the rest of the samples due to the overlapping.



We can select the rest of the samples and calculate a mew PC analysis and continue studying the characteristics of the samples.

14 dic 2021

A loading related to gypsum

 Following the chain of posts related to gypsum in the soil samples, we continue calculating the principal components of the sample set, and plotting the 5 firsts loadings, after we draw vertical lines where the reference gypsum spectrum has the main peaks, and we can see that the 5th loading (light blue) shows clearly the gypsum peaks.

matplot(rownames(pcspectra_snvdt$rotation),
        pcspectra_snvdt$rotation[ ,1:5], 
        type = "l", main = "Loadings spectra", 
        ylab = "Absorbance", xlab = "wavelength" )

abline(v = c(994, 1204, 1445, 1489, 1537, 1748, 
             1944, 2215, 2266), col = "green", 
       lty = "dotted")



10 dic 2021

Are there soil samples with gypsum in our sample set?

One way to see it is overplotting the reference spectrum of gypsum on our soil sample set (in this case treated with Detrend to remove some of the scatter effects). We don´t see it clearly due to the high quantity of samples, but it seems clearly that are samples with gypsum on our sample set. 

Now we could find some ways to check which of our samples have higher correlation with the gypsum reference spectrum or looking other metrics like distances.

Having a reference sample set with gypsum, calcite, kaolinite, iron oxide, etc, is a good way to explore and play with the data, overplotting them over our sample set.

In this case due that the reference gypsum spectrum and the soil sample set are scanned on different NIR instruments, a common range was used (400 to 2500 nm)

matplot(x =colnames(lucas_spain_spcdt), t(lucas_spain_spcdt), 
        type = "l", col = rgb(red = 0.5, green = 0.5, 
        blue = 0.5, alpha = 0.3), ylim = c(-1.5, 4.0), 
        xlab = "wavelength", ylab = "Absorbance")

par(new = TRUE)

matplot(wavelength2, mineralRef2, type = "l", col = "red",                    xlab = "", ylab = "", main = " ", ylim = c(-1.5, 4.0))
abline(v = c(994, 1204, 1445, 1489, 1537, 1748, 
             1944, 2215, 2266), col = "green", 
             lty = "dotted")




8 dic 2021

Identifying gypsum peaks

 Identify the peaks in the gypsum spectrum, using the function "peaks" from the package "IDPmisc".

You can obtain the gypsum spectra from the data(mineralRef) in the package "soilspec". Th spectrum is in reflectance and I converted it to absorbance using the log 1/R transformation.

wavelength <- seq(350, 2500, by = 1)
matplot(wavelength, mineralRef$gypsum, 
        type = "l", col = "blue", 
        xlab = "wavelength", 
        ylab = "Absorbance", 
        main = "Gypsum" )

With this code we get the spectrum, and see (visually) the peaks, so we can decide the value for the arguments of the function.

Now let´s get the peaks:

ppeaks_gypsum <- peaks(wavelength, 
                 mineralRef$gypsum , 
                 minPH = 0.03)

> ppeaks_gypsum
      x         y   w
1   350 0.1364889 114
3   994 0.1623408  39
4  1204 0.2884482  67
5  1445 1.0510481 127
8  1489 0.8744469  11
7  1537 0.7541421  14
6  1748 0.7571059  54
9  1944 2.1366552  83
10 2215 1.3251868  43
11 2266 1.2514346  15
2  2488 2.1942896 191

In the X column we have the peak wavelength, in the "y" the absorbance values, and in the "w" the width at half maximum of the peaks.

Now we add the vertical lines to see the marked peaks:

abline(v = ppeaks_gypsum$x, col = "red")

Just change the minPH value or the or the other arguments in the peaks function to get more or less peaks.



We can exclude the vertical lines at the extreme wavelengths.

We can use this spectrum to compare it to our soil samples checking for any trace of gypsum on them.