R & Chemometrics: 2017

20 dic 2017

Previous steps for a LOCAL calibration (part II)

Continuing with the previous post:
Previous steps for a LOCAL calibration

We can check the Mahalanobis distances (GH) from the samples from one product respect to the samples of other product, and to check the average MD, the maximum and minimum and how many samples are over a certain cutoff. The idea is to create a multiproduct library but not with products that are very different, so they have to maintain certain certain similarities in order to take the maximum benefits from the LOCALs calibrations

In the case of the green samples (pork meat meal type 2), the GH go from 1.109 to 6.375 versus the blue samples (pork meat meal type 1) with 17 samples over 3.00 (from 105) with an average of 2.43 respect to the PC space of the blue samples (pork meat meal type 1).

In the case of the green samples (pork meat meal type 3), the GH goes from 1.561 to 8.650 with 24 samples over 3.00 (from 40) with an average of 3.71 respect to the PC space of the blue samples(pork meat meal type 1).

In the case of the red samples ((pork meat meal type 4)), the GH go from 0.912 to 14.462 with 211 samples over 3.00 (from 402) with an average of 3.91 respect to the PC space of the blue samples (pork meat meal type 1).

The GH high values are due specially to the second PC because are samples with higher protein than the samples in the blue group.

I seems that all these meat meal products can be merged in a unique library to develop a LOCAL calibration and we will see their configuration, extension, maintenance and validation in more coming posts

8 dic 2017

Previous steps for a LOCAL calibration.

With four different sets of Meat meal (4 species of pork), I develop a lib file for each one. I see one of them on the 3D graph and add the others as secondary files in order to see how they match one with the others. Looking to the correlation of the scores respect their own libraries it is clear that for all of them the moisture is the main source of variation and is explained in all in the first principal component. The second principal component is the highest correlated to the second principal component in the four libraries.

Three of the families set overlap almost in the protein range, but one of them had a broad range in the low protein, so the idea is to see this in the scores maps.

In this plot, we see the scores of the four sets in the PC space of one of the libraries, overlapped.

Dark blue:   Range of protein from 44,26 to 77,3
Green:        Range of protein from 69,50 to 79,51
Light blue:   Range of protein from 69,50 to 82,90
Red:            Range of protein from 66,51 to 87

If we see the map of scores, which contain the second principal component, for all the groups and the plot for the dark blue group divided in 3rds we can get some conclusions.

This are previous studies in order to build a Local calibration, so more details will came in next posts.

26 nov 2017

Binning function in Prospectr package.

In the NIR spectrum there is a high correlation between most of the wavelengths, so we can reduce the spectra to allow more space between the wavelengths to manage better the spectra matrix.

The comercial softwares has the functions to do it, for example in Win ISI we can configure the wavelengths of a NIR 5000 which has 700 wavelengths if we configure the wavelengths selection to 1100-1498,2 , to 350 if we select 1100-2498,4. In this process we don´t lose relevant information for the development of a calibration, so it is applied quite often.

In the Prospectr there is the Binning, were we select the interval of wavelength selection in two ways:

1:
X.bin <- binning(X, bin.size = 10)
In this case we keep one data point for every ten data points.

2:
X.bin2 <- binning(X, bins = 50)
In this case we reduce the spectra matrix to 50 equal spaced data points.

23 nov 2017

Side Project: Interactive Win ISI Tutorial

See the video in:
https://www.linkedin.com/feed/update/urn:li:activity:6337683623763812353

13 nov 2017

Bandwidth in the NIR Spectral Region

A question from Gabriela (thanks for your nice words about the blog) about the importance of the resolution and bandwidth in the NIR instruments, takes me to see the paper from Karl Norris: "Limitations of Instrument Resolution and Noise on Quantitative analysis of Constituents with very Narrow Bandwidth in the NIR Spectral Region".

In this paper Karl Norris conclude that instruments with 10nm bandpass and a good signal to noise level ratio can measure constituents having a bandpass as narrow as 2 nm. It is not necessary to increase the resolution to detect them in case we increase the noise because the quantitative analysis become less accurate.

The experiment has been done with talc (2 nm bandwidth) in Avicel.

2 nov 2017

Waiting for the instrument to be warming up to calibrate (DS2500)

Before to proceed to the instrument calibration in a DS2500 it is important to check that the instrument is stabilized fine. It is not enough to see that the instrument has pass the diagnostics, yo have to run several times the diagnostics and see that the "deltas" (difference between the nominal and founded values) for the wavelengths checked is stable and it finish drifting due to the warm up of the instrument.

In case that the deltas are close to cero for all wavelengths it is not necessary the calibration with the ERC, but if there is an slope in the values or a systematic difference it is better to calibrate to came with the values close to cero for all the wavelengths.

We don´t want to see drifts in the deltas as the instrument is warming up, and the ideal is to see random differences in the deltas for the several repetitions of the wavelength checks:

At this point we can continue with the calibration of the instrument (video).

23 oct 2017

Enable Auto Archiving - ISIscan™

Testing the PC Standardization (Part 1)

PCA standardization is part of the types of standardization algorithms used in Win ISI. We start with a REP file with a certain scan spectra from two different instruments (same samples scanned on both instruments and giving the same name to them). When we select PCA standardization we go to the option "Create a Score file from a spectra file", and we see that the option to create a PCS file is activated, so when we do it, apart from to get a PCS file we get also a PCA file.

This PCS file is used after to reduce our CAL file ( the one we use to develop the calibration). When we use "Reduce", we see how we can use a PCS file as optional:

We will use the reduced output file to develop the calibration.

9 oct 2017

How many samples are needed for a calibration?

One of the questions normally asked is: how many samples are needed for a calibration?, for how long I have to add samples to a calibration.

Of course what is necessary is calibration data from different years. At the beginning we can have a nice SEC but not so nice SECV or SEP, but as soon as we have more data from next years we will see how the SEC is increasing and the SECV and SEP are decreasing and are becoming closer to the SEC and the continue to become similar, but not bellow.

The idea is to continue adding samples and variability while SECV is significantly different than the SEC and while the SEP is significantly different than the SECV.

23 sept 2017

Draft of Win ISI diagram

Working on the main diagram of Win ISI for a presentation. This is a draft and I have to add more tools from new versions.

18 sept 2017

Diagnostics : Peak to Peak (P2P)

Is the way we can see if we have extreme peaks on the noise spectra (like in this case due to encoder noise).

It is the absolute value between the absorbance in the highest peak and the absorbance in the lowest peak.

The manufacturers fix this value according to the quality of the instrument components.

7 sept 2017

PUZZLE: Spectra Reconstruction

I use to explain the concept of the spectra reconstruction as trying to fix a puzzle. We have the pieces ( loadings) which once are multiplied by the scores have difference sizes, but the same pattern.

We fit all the pieces, but it can be that the puzzle does not fit correctly, that we have some gaps or spaces not filled,...., etc. is the concept of spectra reconstruction whe we have a error matrix which is the part of the puzzle not completed. It can be small, large,...

One application of this concept is used to define if the spectra belongs to a certain category.

In this blog you will find post about the spectra reconstruction.

6 sept 2017

Sub-sample and sub-scan concept

When we analyzed heterogeneous samples, it is normal to use large cups. The cups rotate and stops in certain places called sub-samples (suppose eight). At each sub-sample several sub-scans (normally four) are acquired.

So we have eight sub-samples spectra composed of the average of four sub-scans each. In total we have thirty two scans.

We can get a prediction of each of the sub-samples to get the eight predictions and calculate the standard deviation for every constituent in order to see the heterogeneity of the sample composition.

We can export the average spectrum of all the sub-samples, the eight spectra of the eight sub-samples or the thirty two total sub-scans for further study.

5 sept 2017

Thanks a lot (more than 250.000 visits to this blog)

Thanks to all of you who read and follow this blog. We have pass the 250.000 visits in this Blog Life and I really happy about that.

Of course these are the main countries who visit the Blog, but I appreciate visits from many other places.

thanks so much

Maximum Distance (Discriminate Method)

I wrote in other posts, about the Maximum Distance algorithm, in order to discriminate products. In resume is the spectra of a set of samples for training, where we apply a math treatment, and calculate the standard deviation at each wavelength (s), in order to fit some limits to the spectra (will be added to the average spectrum). Therefore, the new samples should come into the defined limits in order to be classify as a sample of this group.

Therefore, in the areas of the spectra where there are more variability, the limits will be higher than in the area of lower variability.

We can see first the spectra (gray ones) with the mat treatment applied and draw the average spectrum (red one):

and to over plot the standard deviation spectra at each wavelength (green one) with the average spectrum (red one), in order to imagine how the limit will fit.

I will try to work on it exporting in Excel the spectra to show you better in a new post.

4 sept 2017

How to load the Check Sample product in MOSAIC

On the DSs or DAs, the check sample comes with a USB pen with some files (DA1650, DS2500 and DS2500F) with the extension "mcf". If we import a this files into Mosaic we will install the parameter, prediction model and check cell product at the same time, so it is not necessary to go step by step, so it is more quick to have the instrument ready to analyze the Check Sample.

Go to your Network, right clic with the mouse and choose "Import instrument group configuration". Open in Explorer the adequate "msc" file on the USB pen drive and the configuration will be loaded and ready to work in NOVA if we use Mosaic Solo, or after synchronization, if we use Mosaic Network.

7 ago 2017

Certified Reference Materials for Spectroscopy

A catalogue of “Starna” (www.starna.com) Certified Reference Materials for spectrometers came with the last issue of Spectroscopy Europe.

The materials have a function and a range. The function is: The purpose of the filter (check Absorbance, check accuracy, check stray light or check the resolution.

The range can cover the UV, Visible, NIR and FTIR zone of the electromagnetic zones.

In the case of NIR there are some filters like:

NIR Neutral Density Glass References (800 - 3200nm): To check the Absorbance accuracy and linearity pf NIR spectrometers.

Metal on Quartz filters (250 – 3200 nm): With Absorbance and Transmittance values certified at different wavelengths.

NIR Solution References (900 – 2600 nm): With 14 certified peaks for wavelength qualification purposes.

Chloroform Stray Light Cell (at approx. 2365 nm): To check Stray Light.

Polystyrene NIR References (NIR and MIR range): With 14 certified peaks in the MIR spectrum. In addition, eight peaks in the NIR spectrum. These calibration values are traceable to NIST SRM 2065.

Didymium Glass Filter (430 – 890 nm): This filter has 11 peaks covering this range (four peaks over 700 nm).

Wide Range Wavelength Reference (335 – 1945 nm): This filter 20 peaks in this range (nine of them over 700 nm). It is equivalent to NIST SRM 2065.

You can download the catalog from:
http://www.starna.com/images/reference_material_catalogue2017.pdf

1 ago 2017

Checking Wavelength Accuracy (XDS)

It is important to check the accuracy of the wavelength peaks using if posible a NIST Standard like in this case.

Manufacturer send a file showing the accuracy of the instrument against this standard and we have to verify periodically to see if it shifts.

The verification tell us if the deviation is more than recommend, even if the diagnostics pass. In this case, we perform an Instrument Calibration and the values come closer to the values from which the instrument leaves the factory.

Delta value is the difference between the nominal value and the found value.

11 jul 2017

Dror Sharon: This Tiny Molecular Sensor can Identify a Viagra Pill | WIR...

I am a reader of the Wired Magazine, and is nice to see how NIR technology is becoming a part of this digitalized world, so it will be amazing what this technology can bring to the future.

3 jul 2017

Lid Adjustment - NIRS™ DS2500

The firsts DS2500 (Generation 1) don´t have this system to adjust the gap for the door, and the way to do it is more dificult. If your instrument does not have the 2 screws to remove the cover showed in the video is becouse is a "Generation 1" instrument .

The instrument showed in the video is a "Generation 2".

Considerig and check subsample variation

When analyzing a heterogeneous sample, several subsamples for a large cup are acquired, and finally an average result from all the subsamples is showed as result. Is to the average result to which we give normally importance and we compared to the lab value to know the accuracy of our measurement. Anyway is important to see the several and individual results for the different subsamples.

One reason for this is to check how homogeneous is our sample, looking to the standard deviation of the predictions for each of the constituents. But we can check that depending of the math treatment we have apply in the equation, the standard deviation of the subsample predictions change and in some cases became quite large. This is something we have to consider to make a robust calibration.

Remember always to look to the subsample spectra and get conclusions comparing the spectral RMS with the SD for the different subsamples.

22 jun 2017

How to check the cooling liquid - NIRS™ DS2500

If for any reasons we have to change the liquid of the liquid circuit, we can see how the liquid is absorbed by the pump. We repeat the process several times looking that the air comes out from the circuit and the tank is filled and purged.

Filter Replacement - NIRS™ DS2500

Lamp Replacement - NIRS™ DS2500

Instrument Calibration - NIRS™ DS2500

Checking Temperatures in DS2500 (Lamp)

In order that the performance of the instrument DS2500 be optimal, we have to attend the temperature of the lamp when running the diagnostics. I consider it is fine around 35ºC.

Sometimes we find high temperatures like the one in the picture, and even seeing that the report says that is OK, this temperature can affect to the instrument itself and the results.

One of the causes that this temperature increase is that the tank of the pump has lost water, so it is a good idea to check the level, and fill it in if necessary.

Checking pump level video

Check that the pump is pumping. We should see some turbulences in the water and a small noise in the pump.

Check if the water is to dirty, or with algae’s.

Check that the fan is working, its mission is to keep cold the water and see if the filter is clean so the fan performs better its mission.

Changing the filter

It is important also the temperature of the room or laboratory where the instrument is. A higher temperature will increase also the lamp temperature.

After checking all this points, and being sure that the lamp is fine, maybe is the moment to run an instrument calibration:

Instrument Calibration

19 jun 2017

Comparing Residuals, GH and T when validating

When looking to the validation statistics is important to look at the same time to three values: Residual, GH and T value for every sample. From this data (fiber), we can check if our sample is extrapolating badly, it is not robust or any other issues.

In this case, as we can see there are samples with a very high GH and we can see that those samples have in common that the T statistic is negative (in the left tail of the Gaussian Bell) and the value is quite high also for the T.
These samples have also the highest residiual values.
Something is telling us that this samples have something special and are not well represented by the equation. PCA is warking fine and is detecting these samples as outliers, but we need to know what makes tese samples special.

These samples are soy meal and have highest fat value as the ones in the calibration so the Model did not learn enough about the interaction between the fiber bands and fat bands. So this samples are very interested to make the calibtration more robust.

After checking this, we can add these samples to the calibration to improve the results of the next validation.

Graphically in Excel we can se the interaction between the Residuals, GHs and T values:

22 may 2017

Mosaic 7.12 is now available on our Europe server

Mosaic version 7.12 is now available on our Europe server.

Once you try to connect, you should be asked to automatically download and install the new client.
User accounts, passwords remain the same.

Ports used for NOVA:

Configure correctly the ports with your IT for a successful synchonization.

7 may 2017

Easy way to check the eigen values with the T (scores) matrix

Other interesting Matrix multiplication is the product of the score matrix T by it´s transpose in this way:

Tt%*%T

This product give us a square matrix (a.a), being “a” the number of loadings or PCs chosen, and the diagonal has the eigenvalues which are related to the quantity of explained variance for every loading.

If we plot the diagonal we can see how the eigenvalue decrease by every loading. This plot can help us to decide how many loadings or PCs to choose.

Add caption

6 may 2017

Checking the orthogonality of P (loadings) matrix

One of the values we got in the script of the post:"Tutorials with Resemble (Part 3 - orthoProjection) " was the loadings matrix (X.loadings), or what we called usually in this blog the P matrix.

One of the characteristics of the loadings “P” matrix, when we develop the PCA, is that if we multiply it by its transpose we get the Identity Matrix “I”

P<-X.loadings

Pt<-t(X.loadings)

P%*%Pt = I

In the “I” matrix, its diagonal is “1”, and “0” values for all the rest cells indicating that all the loadings are orthogonal between them.

Exercise:

Check it by yourself and take out the diagonal from the P matrix.
Represent in a graphic the first loadings:

1 vs 2 : a plane
1, 2 and 3: a cube

19 abr 2017

How to load a REP file in a MOSAIC LOCAL Prediction Model

If we use the MONITOR in Win ISI or a LOCAL Prediction Model in ISI Scan, there is a field to load the REP file (is a ".nir" which include the variation we want to minimize in the model, like the temperature, differences between instruments, differences between the pathlengths of the gold reflectors,….). This way the LOCAL uses the REP file when developing the calibration.

In MOSAIC the REP file must be load in a different way.

As usual we load the ".RED" file, reduced with the appropriate math-treatment, we set the maximum and minimum number of factors and samples,...., but where I load the repeatability file (.NIR) .

😏...Easy but tricky.

Rename the extension from the repeatability file from ".NIR" to ".REP", and give to this file the same name than the ".RED" file; put them both in the same folder. Now when you import the ".RED" file to the LOCAL Prediction Model, the ".REP" file will go with it. Just check it on the Links tab of the LOCAL P.M.

As you know something similar happens when whe load a ".EQA" and load also the ".PCA" and ".LIB" files

Thanks to Montse for testing this feature...😉

24 mar 2017

Tutorials with Resemble (Part 3 - orthoProjection)

Using orthoProjection:

One of the different functions of Resemble is “orthoProjection” and we can use it with different options. Let check in this post the simplest one:

oP<-orthoProjection(Xr=der.Xr, X2 = NULL,

                    Yu = NULL,method = "pca",
                    pcSelection = list("cumvar",0.99),
                    center = TRUE, scaled = FALSE,
                    cores = 1)
We can use the training data from the previous post, with the SG filter (just for smoothing) and the first derivative: der.Xr
The method we use is “pca”, so we don´t have to use the reference data “Yr”. We don´t use any additional set so X2=NULL
The number of terms will explain a cumulative variance of 99%.
We center the spectra, and we don´t scale it.
Now run this script in R (be sure that the package Resemble is loaded, library(resemble))

Now we can check the values we get:

names(oP)
[1] "scores" "X.loadings" "variance" "sc.sdv" "n.components"
[6] "pcSelection" "center" "scale" "method"

>attach(oP)

>scores
Matrix T of scores
>X.loadings
Matrix P of Loadings
>Variance
We can see the eigenvalue, the cumulative and explained variance
>sc.sdv
eigenvalues
>n.components
Number of terms chosen to explain 99% of the variance
>pcSelection
cumvar 0,99
>center
average spectrum
>scale
1
>method
pca(svd)

Check all these values and matrices.

3.1.......Practice plotting the average spectrum. (page Exercises)
3.2.......Play with the accumulative variance.     (page Exercises)
3.3.......Plot the loadings.                                 (page Exercises)
3.4.......Plot combinations of score Maps            (page Exercises)

¡And enjoy Chemometrics with R!

23 mar 2017

Tutorials with Resemble (part 2)

If you have practise with the post : Tutorials with Resemble (part 1) , you can continue adding more script following the recomendations of the Resemble Package. This time we can add another math treatment to the previous one of the SG filter.

Once applied the "sg" function, we can calculate the first derivative to define better he variance in the spectra. The Resemble Manual show us how to convert the spectra to a first derivative using differences. We can do it for the calibration and the validation sets:

der.Xr <- t(diff(t(Xr), lag = 1, differences = 1))

der.Xu <- t(diff(t(Xu), lag = 1, differences = 1))

In this case we lose a data point on the left of the spectra so we have to define the wavelengths to see the plot of the first derivative.

wavelength_der<-seq(1112,2488,by=2)

matplot(wavelength_der,t(der.Xr),type="l",col="black",
xlab="Wavelength(nm)",ylab="Absorbance")

and we get this plot:

Practise doing the same for the validation set Xu and overplotting the spectra with the training set Xr.

Do you see significant differences?

Enjoy using Chemometrics with R.

22 mar 2017

Win ISI and ISI Scan 4.10 available (Windows 10 compatible)

Go to http://www.winisi.com and download it from the Download section

20 mar 2017

Tutorials with Resemble (part 1)

I see that some of you are interested in the package "Resemble", so I´m going to re-writte some of the post with this package, so we can understand better the LOCAL concept we have been treating with Win ISI.

The examples use the NIRsoil data that we can get from the package "prospectr".
require(prospectr)data(NIRsoil)
If we can plot the raw spectra, ..., just writte this script
wavelength<-seq(1100,2498,by=2)
matplot(wavelength,t(NIRsoil$spc),type="l",col="blue",
        xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
In the Resemble manual recomends to apply a SG filter without derivatives to smooth the spectra, so in this case we proceed as the manual:
sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)
NIRsoil$spc <- sg
Now the spectra is truncated in both sides, so we have to create:
wavelength_sg<-seq(1110,2488,by=2)
and we can plot the spectra filtered:
matplot(wavelength_sg,t(NIRsoil$spc ),type="l",col="black",
        xlab="Wavelength(nm)",ylab="Absorbance",ylim=c(0,1))
You won´t see too much difference with the raw spectra.

Now we split the data into a training (Xr , Yr) set and a validation set (Xu, Yu)
       #VALIDATION
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
    #TRAINING
Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
and we take out the data without reference values form both sets:
Xu <- Xu[!is.na(Yu),]
Xr <- Xr[!is.na(Yr),]
Yu <- Yu[!is.na(Yu)]
Yr <- Yr[!is.na(Yr)]

Practise making plots again of the spectra of the diferent sets. Overlap training and validation sets with different colors,....., and enjoy using R for chemometrics.

6 mar 2017

Neighborhood Mahalanobis distance matrix

Working with the chemometric packages in R help us to understand other chemometric commercial software’s better.

In Resemble we can use the function fDiss to get a matrix of distances between all the samples in a spectra data set, so we get a square and diagonal matrix with zeroes in the diagonal, because the distance between a sample and itself in the PCA space is cero. This way we can see redundant information and remove it from the spectra set. Finally we can get a well distributed cloud of samples and the average spectrum is more representative to all of them.

Here I just trim the matrix in order to see how close the first 10 samples spectra are between them.
The spectra used was the NIRsoil data from R.