R & Chemometrics: octubre 2014

28 oct 2014

Diagnostics: Some solvent vapors in the noise spectra

Some days ago, Oscar (a follower of this blog) sent me some estrange diagnostics from his NIR Instrument. It was curious that when I plotted every of the 10 noise spectra test (the final diagnostic result is an average on the ten spectra statistics), the peaks increase more and more and in the same direction (in this case negative), and at the same wavelength, it was like if the NIR was measuring something.

The way that the NIR perform every noise spectra, is measuring first the background in a ceramic plate, and after scans again the ceramic as a sample. So in the ideal case there is no difference, so we should see something similar to a flat line along cero, but if we zoom the spectra, we start to see the noise due to the instrument hardware, if the background is stable. These noise must be random, without special patterns.

There are cases that some mechanical noise give some noise peaks at certain wavelengths, so we can identify an encoder problem, a filter (order shorter) problem,…

Sometimes stray light goes into the detectors, the lamp,the laboratory temperature, or the detectors temperature, can be also unstable. These and other cases are the cause that we see special signatures in the noise spectra.

So coming back to the Oscar spectra, something seems to be in the air of the laboratory which makes those patters. It was not water vapor signature (as we saw in other cases), it was another thing, and it was obvious because of the smell (similar to ammonia).

The noise spectra shows how it was decreasing, and that date (a little bit later) the noise was fine (random), and the smell as well.

Thanks Oscar for the information

23 oct 2014

Understanding better Hyperspectral Image Analysis

In my last post I recommended the article :
"Near infrared hyperspectral image analysis using R, Part 5", which appears in the NIR News Vol. 25 No. 7 (November 2014).

In the second part of the tutorial, you can develop an animation to see the spectrum of every pixel in a single line of the bread.
We can see the noise spectra of the background, and the spectra of the different pixels in the line of the bread at a certain Y level .

The animation of the first tutorial is in the post: "Animated visualisation of hyperspectral data using R ".

It is really great to see how, how the scientific community is using R, and we hope to see more articles and papers in the future.

Authors:
Y. Dixit, R. Cama,a C. Sullivan, L. Alvarez Jubetea
School of Food Science & Environmental Health, Dublin Institute of Technology, Cathal Brugha Street, Dublin 1, Ireland
A. Ktenioudakib
Department of Food Chemistry & Technology, Teagasc Food Research Center Ashtown, Ashtown, Dublin 15, Ireland

22 oct 2014

Animated visualisation of hyperspectral data using R

Very useful article in the last issue of NIR News Vol. 25 No. 7 (November 2014), with an amazing tutorial, about how to develop animated vizualizations of hyperspectral NIR images.
See the second part of the tutorial in the post:"Understanding better Hyperspectral Image Analysis".

16 oct 2014

SG 2nd Derivative + MSC

As you know derivatives remove the baseline offset and curvature in the spectra, but the should be combined with anti-scatter math treatments if we want to remove scatter effects which affect the correlation between the constituents of interest and the spectral bands. There are some cases (especially when developing discriminant models), where it is not convenient to apply the anti-scatter math treatments and we just leave the derivatives alone.

Following the Shoot-out tutorial and following the paper "Shoot-out 2002: transfer of calibration for content of active in a pharmaceutical tablet", from David W. Hopkins (NIR news Vol14 No. 5 2003), I tried the math treatment recommended by the author and apply the MSC after the SG2D1104, just to have a look to the spectra:

You can see how the red spectra, has been calculated in the previous post, and for the green one (SG second derivative combined with MSC), I use the following script:

> X1_sg2dmsc<-msc(X1_sg2d_pracma)
> matplot(wavelength2[11:281],t(X1_sg2dmsc[,11:281]),type="l",
+ xlab="Wavelength (nm)",ylab="1/R (SG 2nd der + MSC)",lty=1,
+ col=3,main="SG-2D1104 + MSC")
If we want to see them over-plotted

15 oct 2014

Max Kuhn Interviewed by DataScience.LA at useR 2014

Learn more about this great R developer (Caret Package) in this link

Applying SG to all our X matrix (Pracma Package)

"R" is without any doubt a great and wonderful community, and it is nice to see how the package developers and maintainers help you in case you have any doubts.

It was the case some time ago when was writing some posts about the ChemoSpec package and Bryan Hanson helps me with some doubts. After the last post, I wrote a mail to Hans Werner (Pracma Package) , and he replied quickly, telling me the reason the "savgol" function use a vector indeed a matrix, and giving to me some ideas, to convert all the spectra matrix to Savitzky Golay.

Of course one of the ways is to use the apply function. When applying the SG filters there is a reduction in the number of data-points at both sides of the wavelengths, depending of the window size.

So I tried this way, to see all the spectra together:

> library(pracma)
# This script is for first derivative
> X1_sg1d_pracma<-apply(nir.training1$X,1,savgol,11,4,1)
> matplot(wavelength2[11:281],(X1_sg_pracma
+[11:281,]),type="l",xlab="Wavelength (nm)",
+ ylab="1/R (SG 1st derivative)",lty=1,col=1,main="SG-1D1104")

# This script is for second derivative
> X1_sg2d_pracma<-apply(nir.training1$X,1,savgol,11,4,2)
> matplot(wavelength2[11:281],(X1_sg_pracma
+[11:281,]),type="l",xlab="Wavelength (nm)",
+ ylab="1/R (SG 2nd derivative)",lty=1,col=1,main="SG-2D1104")

13 oct 2014

Savitzky Golay filters with Pracma Package

Pracma package has the function "savgol", where we can apply Savitzky Golay filter to a vector (in our case a spectrum).
The function is:
savgol(T, fl, forder, dorder)

And the Arguments are:
T... Vector of signals to be filtered.
fl... Filter length (for instance fl = 51..151), has to be odd.
forder... Filter order (2 = quadratic filter, 4 = quartic).
dorder... Derivative order (0 = smoothing, 1 = first der, etc.).

As you know I´m using in my last post the shoot-out 2002 data to develop a tutorial, and I read an article from the winner of this shoot-out where he use the shoot-out spectra with Savitzky Golay, a filter of 11, quartic, and second derivative using the Unscrambler software.

So I try thess values in the arguments of the Pracma SG filter and the results of the bands look exactly the same that the ones in the article, so this option looks good to work with this data. Anyway I will try also with the other functions from other packages.

In the case of the Pracma package we have to use a vector (a single spectrum), so some work has to be done to convert all the matrix of spectra, but the results looks great.

X1_sg_pracma<-as.matrix(savgol(nir.training1$X[1,],11,4,2))

10 oct 2014

PCAs with three diferent methods and projections (Test and Val Set)

The shoot-out 2012 is composed with 155 samples for the training set (Blue color), 460 for the Test Set (Red color), and other few samples for the Validation set (Green color).

I have developed 3 different ways of Principal Components Analysis and I would like to show you the score plot of PC1 vs PC2 developed with the Training Set and the projections on that space of the Test and Validation Set.

This first plot is in the case of using PRCOMP:

Second case is using NIPALS for the calculation of the PCAs:

and third using SVD for the calculation of the PCAs

As you can see no differences, and we can have the conclussions that the Training Set cover the variability for the samples of the Test Set and Validation Set, so we don´t have to extrapolate outside the calibration space.

I will writte a post with the code (quite long) if interested. Let me know.

7 oct 2014

Adding Category Variables to a Data frame in R

Normally I used data frames to manage NIR data, the data frames are composed normally in my case by a X or Spectra matrix (dataframe$X), and a Y or constituent matrix (dataframe$Y). But when we want to manage and understand plots, like score plots, it is interesting to classify the samples with some category variables.

This category variables can be: "location", "type", "customer", "product",.....

In the case of the shoot-out data the samples can be classify by their content of the main parameter, and can be classified as:

"Low"             (if the sample has less than 160 mg)
"Medium"          (between 160 y 221 mg)
"High"            (more than 221 mg)

Let´s create the variable in the data frame of the training set for instrument 1
nir.training1$type[Y <=160] <- "Low"
nir.training1$type[Y>160 & Y<221] <- "Medium"
nir.training1$type[Y>=221] <- "High"
Now we have a new variable in the data frame called "type"
Check it with:

names(dataframe)

and appart from X and Y we have Type.

We proceed the same way for the other dataframes.

Another thing is that we can create a big data frame with all the spectra from different instruments and sets and create a category variable for the instrument ( A and B), and another for the Set (Training, Test and Validation).

1 oct 2014

An introduction to the "´Resamble" package

Some time ago I wrotte a post : An introduction to the "prospectr" package . There I gave a list of some Chemometric Packages for R. Some days ago in a NIRS forum gave the name of another Chemometric package, so the options to use Chemometrics with R are growing day by day.

This package is "Resamble". This is the link to the reference Manual.

More details at:

http://cran.r-project.org/web/packages/resemble/

http://l-ramirez-lopez.github.io/resemble/

The author explains that with this package algorithms such as LOCAL and locally weighted PLS regression can be easily reproduced.
I really want to test it as soon as I can.