9 oct. 2017

How many samples are needed for a calibration?

One of the questions normally asked is: how many samples are needed for a calibration?, for how long I have to add samples to a calibration.

Of course what is necessary is calibration data from different years. At the beginning we can have a nice SEC but not so nice SECV or SEP, but as soon as we have more data from next years we will see how the SEC is increasing and the SECV and SEP are decreasing and are becoming closer to the SEC and the continue to become similar, but not bellow.
The idea is to continue adding samples and variability while SECV is significantly different than the SEC and while the SEP is significantly different than the SECV.

23 sept. 2017

Draft of Win ISI diagram

Working on the main diagram of Win ISI for a presentation. This is a draft and I have to add more tools from new versions.

18 sept. 2017

Diagnostics : Peak to Peak (P2P)

Is the way we can see if we have extreme peaks on the noise spectra (like in this case due to encoder noise).
It is the absolute value between the absorbance in the highest peak and the absorbance in the lowest peak.
The manufacturers fix this value according to the  quality of the instrument components.

7 sept. 2017

PUZZLE: Spectra Reconstruction

I use to explain the concept of the spectra reconstruction as trying to fix a puzzle. We have the pieces ( loadings) which once are multiplied by the scores have difference sizes, but the same pattern.

We fit all the pieces, but it can be that the puzzle does not fit correctly, that we have some gaps or spaces not filled,...., etc. is the concept of spectra reconstruction whe we have a error matrix which is the part of the puzzle not completed. It can be small, large,...

One application of this concept is used to define if the spectra belongs to a certain category.

In this blog you will find post about the spectra reconstruction.

6 sept. 2017

Sub-sample and sub-scan concept

When we analyzed heterogeneous samples, it is normal to use large cups. The cups rotate and stops in certain places called sub-samples (suppose eight). At each sub-sample several sub-scans (normally  four) are acquired.
So we have eight sub-samples spectra composed of the average of four sub-scans each. In total we have thirty two scans.
We can get a prediction of each of the sub-samples to get the eight predictions and calculate the standard deviation for every constituent in order to see the heterogeneity of the sample composition.
We can export the average spectrum of all the sub-samples,  the eight spectra of the eight sub-samples or the thirty two total sub-scans for further study.

5 sept. 2017

Thanks a lot (more than 250.000 visits to this blog)

Thanks to all of you who read and follow this blog. We have pass the 250.000 visits in this Blog Life and I really happy about that.
Of course these are the main countries who visit the Blog, but I appreciate visits from many other places.

                  thanks so much

Maximum Distance (Discriminate Method)

I wrote in other posts, about the Maximum Distance algorithm, in order to discriminate products. In resume is the spectra of a set of samples for training, where we apply a math treatment, and calculate the standard deviation at each wavelength (s), in order to fit some limits to the spectra (will be added to the average spectrum). Therefore, the new samples should come into the defined limits in order to be classify as a sample of this group.

Therefore, in the areas of the spectra where there are more variability, the limits will be higher than in the area of lower variability.

We can see first the spectra (gray ones) with the mat treatment applied and draw the average spectrum (red one):

 and to over plot  the standard deviation spectra at each wavelength (green one) with the average spectrum (red one), in order to imagine how the limit will fit.

I will try to work on it exporting in Excel the spectra to show you better in a new post.


4 sept. 2017

How to load the Check Sample product in MOSAIC

On the DSs or DAs, the check sample comes with a USB pen with some files (DA1650, DS2500 and DS2500F) with the extension  "mcf". If we import a this files into Mosaic we will install the parameter, prediction model and check cell product at the same time, so it is not necessary to go step by step, so it is more quick to have the instrument ready to analyze the Check Sample.

Go  to your Network, right clic with the mouse and choose "Import instrument group configuration". Open in Explorer the adequate "msc" file on the USB pen drive and the configuration will be loaded and ready to work in NOVA if we use  Mosaic Solo, or after synchronization, if we use Mosaic Network. 

7 ago. 2017

Certified Reference Materials for Spectroscopy

A catalogue of “Starna” (www.starna.com) Certified Reference Materials for spectrometers came with the last issue of Spectroscopy Europe.
The materials have a function and a range. The function is:  The purpose of the filter (check Absorbance, check accuracy, check stray light or check the resolution. 
The range can cover the UV, Visible, NIR and FTIR zone of the electromagnetic zones.

In the case of NIR there are some filters like:

NIR Neutral Density Glass References (800 - 3200nm): To check the Absorbance accuracy and linearity pf NIR spectrometers.

Metal on Quartz filters (250 – 3200 nm): With Absorbance and Transmittance values certified at different wavelengths.

NIR Solution References (900 – 2600 nm): With 14 certified peaks for wavelength qualification purposes.

Chloroform Stray Light Cell (at approx. 2365 nm): To check Stray Light.

Polystyrene NIR References (NIR and MIR range): With 14 certified peaks in the MIR spectrum. In addition, eight peaks in the NIR spectrum. These calibration values are traceable to NIST SRM 2065.

Didymium Glass Filter (430 – 890 nm): This filter has 11 peaks covering this range (four peaks over 700 nm).

Wide Range Wavelength Reference (335 – 1945 nm): This filter 20 peaks in this range (nine of them over 700 nm). It is equivalent to NIST SRM 2065.

You can download the catalog from:

1 ago. 2017

Checking Wavelength Accuracy (XDS)

It is important to check the accuracy of the wavelength peaks using if posible a NIST Standard like in this case.

Manufacturer send a file showing the accuracy of the instrument against this standard and we have to verify periodically to see if it shifts.

The verification tell us if the deviation is more than recommend, even if the diagnostics pass. In this case, we perform an Instrument Calibration and the values come closer to the values from which the instrument leaves the factory.
Delta value is the difference between the nominal value and the found value.

11 jul. 2017

Dror Sharon: This Tiny Molecular Sensor can Identify a Viagra Pill | WIR...

I am a reader of the Wired Magazine, and is nice to see how NIR technology is becoming a part of this digitalized world, so it will be amazing what this technology can bring to the future.

3 jul. 2017

Lid Adjustment - NIRS™ DS2500

The firsts DS2500 (Generation 1) don´t have this system to adjust the gap for the door, and the way to do it is more dificult. If your instrument does not have the 2 screws to remove the cover showed in the video is becouse is a "Generation 1" instrument .
The instrument showed in the video is a "Generation 2".

Considerig and check subsample variation

When analyzing a heterogeneous sample, several subsamples for a large cup are acquired, and finally an average result from all the subsamples is showed as result. Is to the average result to which we give normally importance and we compared to the lab value to know the accuracy of our measurement. Anyway is important to see the several and individual results for the different subsamples.

One reason for this is to check how homogeneous is our sample, looking to the standard deviation of the predictions for each of the constituents. But we can check that depending of the math treatment we have apply in the equation, the standard deviation of the subsample predictions change and in some cases became quite large. This is something we have to consider to make a robust calibration.
Remember always to look to the subsample spectra and get conclusions comparing the spectral RMS with the SD for the different subsamples.

22 jun. 2017

How to check the cooling liquid - NIRS™ DS2500

If for any reasons we have to change the liquid of the liquid circuit, we can see how the liquid is absorbed by the pump. We repeat the process several times looking that the air comes out from the circuit and the tank is filled and purged.


Filter Replacement - NIRS™ DS2500

Lamp Replacement - NIRS™ DS2500

Instrument Calibration - NIRS™ DS2500

Checking Temperatures in DS2500 (Lamp)

In order that the performance of the instrument DS2500 be optimal, we have to attend the temperature of the lamp when running the diagnostics. I consider it is fine around 35ºC.
Sometimes we find high temperatures like the one in the picture, and even seeing that the report says that is OK, this temperature can affect to the instrument itself and the results.
One of the causes that this temperature increase is that the tank of the pump has lost water, so it is a good idea to check the level, and fill it in if necessary.

Checking pump level video
Check that the pump is pumping. We should see some turbulences in the water and a small noise in the pump.
Check if the water is to dirty, or with algae’s.
Check that the fan is working, its mission is to keep cold the water and see if the filter is clean so the fan performs better its mission.

Changing the filter

It is important also the temperature of the room or laboratory where the instrument is. A higher temperature will increase also the lamp temperature.

After checking all this points, and being sure that the lamp is fine, maybe is the moment to run an instrument calibration:

Instrument Calibration


19 jun. 2017

Comparing Residuals, GH and T when validating

When looking to the validation statistics is important to look at the same time to three values: Residual, GH and T value for every sample. From this data (fiber), we can check if our sample is extrapolating badly, it is not robust or any other issues.

In this case, as we can see there are samples with a very high GH and we can see that those samples have in common that the T statistic is negative (in the left tail of the Gaussian Bell) and the value is quite high also for the T.
These samples have also the highest residiual values.
 Something is telling us that this samples have something special and are not well represented by the equation. PCA is warking fine and is detecting these samples as outliers, but we need to know what makes tese samples special.

These samples are soy meal and have  highest fat value as the ones in the calibration so the Model did not learn enough about the interaction between the fiber bands and fat bands. So this samples are very interested to make the calibtration more robust.

After checking this, we can add these samples to the calibration to improve the results of the next validation.

Graphically in Excel we can se the interaction between the Residuals, GHs and T values:

22 may. 2017

Mosaic 7.12 is now available on our Europe server

Mosaic version 7.12 is now available on our Europe server.
Once you try to connect, you should be asked to automatically download and install the new client.
User accounts, passwords remain the same.

Ports used for NOVA:
Configure correctly the ports with your IT for a successful synchonization.

7 may. 2017

Easy way to check the eigen values with the T (scores) matrix

Other interesting Matrix multiplication is the product of the score matrix T by it´s transpose in this way:


This product give us a square matrix (a.a), being “a” the number of loadings or PCs chosen, and the diagonal has the eigenvalues which are related to the quantity of explained variance for every loading.

If we plot the diagonal we can see how the eigenvalue decrease by every loading. This plot can help us to decide how many loadings or PCs to choose.

Add caption

6 may. 2017

Checking the orthogonality of P (loadings) matrix

One of the values we got in the script of the post:"Tutorials with Resemble (Part 3 - orthoProjection) " was the loadings matrix (X.loadings), or what we called usually in this blog the P matrix.

One of the characteristics of the loadings “P” matrix, when we develop the PCA, is that if we multiply it by its transpose we get the Identity Matrix “I”



P%*%Pt = I

In the “I” matrix, its diagonal is “1”, and “0” values for all the rest cells indicating that all the loadings are orthogonal between them.

  • Check it by yourself and take out the diagonal from the P matrix.
  • Represent in a graphic the first loadings:
    • 1 vs 2      : a plane
    • 1, 2 and 3: a cube

19 abr. 2017

How to load a REP file in a MOSAIC LOCAL Prediction Model

If we use the MONITOR in Win ISI or a LOCAL Prediction Model in ISI Scan, there is a field to load the REP file (is a ".nir" which include the variation we want to minimize in the model, like the temperature, differences between instruments, differences between the pathlengths of the gold reflectors,….). This way the LOCAL uses the REP file when developing the calibration.

In MOSAIC the REP file must be load in a different way.

As usual we load the ".RED" file, reduced with the appropriate math-treatment, we set the maximum and minimum number of factors and samples,...., but where I load the repeatability file (.NIR) .

😏...Easy but tricky.

Rename the extension from the repeatability file from ".NIR" to ".REP", and give to this file the same name than the ".RED" file; put them both in the same folder. Now when you import the ".RED" file to the LOCAL Prediction Model, the ".REP" file will go with it. Just check it on the Links tab of the LOCAL P.M.
As you know something similar happens when whe load a ".EQA" and load also the ".PCA" and ".LIB" files

Thanks to Montse for testing this feature...😉

24 mar. 2017

Tutorials with Resemble (Part 3 - orthoProjection)

Using orthoProjection:
One of the different functions of Resemble is “orthoProjection” and we can use it with different options. Let check in this post the simplest one:
oP<-orthoProjection(Xr=der.Xr, X2 = NULL,
                    Yu = NULL,method = "pca",
                    pcSelection = list("cumvar",0.99),
                    center = TRUE, scaled = FALSE,
                    cores = 1)
 We can use the training data from the previous post, with the SG filter (just for smoothing) and the first derivative: der.Xr
The method we use is “pca”, so we don´t have to use the reference data “Yr”. We don´t use any additional set so X2=NULL
The number of terms will explain a cumulative variance of 99%.
We center the spectra, and we don´t scale it.
Now run this script in R (be sure that the package Resemble is loaded, library(resemble))

Now we can check the values we get:
[1] "scores" "X.loadings" "variance" "sc.sdv" "n.components"
[6] "pcSelection" "center" "scale" "method"

Matrix T of scores
Matrix P of Loadings
We can see the eigenvalue, the cumulative and explained variance
Number of terms chosen to explain 99% of the variance
cumvar  0,99
average spectrum

Check all these values and matrices.
3.1.......Practice plotting the average spectrum. (page Exercises)
3.2.......Play with the accumulative variance.     (page Exercises)
3.3.......Plot the loadings.                                 (page Exercises)
3.4.......Plot combinations of score Maps            (page Exercises)

¡And enjoy Chemometrics with R!

23 mar. 2017

Tutorials with Resemble (part 2)

If you have practise with the post : Tutorials with Resemble (part 1) , you can continue adding more script following the recomendations of the Resemble Package. This time we can add another math treatment to the previous one of the SG filter.
Once applied the "sg" function, we can calculate the first derivative to define better he variance in the spectra. The Resemble Manual show us how to convert the spectra to a first derivative using  differences. We can do it for the calibration and the validation sets:

der.Xr <- t(diff(t(Xr), lag = 1, differences = 1))
der.Xu <- t(diff(t(Xu), lag = 1, differences = 1))

In this case we lose a data point on the left of the spectra so we have to define the wavelengths to see the plot of the first derivative.


and we get this plot:

Practise doing the same for the validation set Xu and overplotting the spectra with the training set Xr.
Do you see significant differences?
Enjoy using Chemometrics with R.

20 mar. 2017

Tutorials with Resemble (part 1)

I see that some of you are interested in the package "Resemble", so I´m going to re-writte some of the post with this package, so we can understand better the LOCAL concept we have been treating with Win ISI.

The examples use the NIRsoil data that we can get from the package "prospectr".
If we can plot the  raw spectra, ..., just writte this script

In the Resemble manual recomends to apply a SG filter without derivatives to smooth the spectra, so in this case we proceed as the manual:
sg <- savitzkyGolay(NIRsoil$spc, p = 3, w = 11, m = 0)
NIRsoil$spc <- sg
Now the spectra is truncated in both sides, so we have to create:
and we can plot the spectra filtered:
matplot(wavelength_sg,t(NIRsoil$spc ),type="l",col="black",

You won´t see too much difference with the raw spectra.

Now we split the data into a training (Xr , Yr) set and a validation set (Xu, Yu)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train),]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]    

Xr <- NIRsoil$spc[as.logical(NIRsoil$train),]   

Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]     
and we take out the data without reference values form both sets:
Xu <- Xu[!is.na(Yu),]    
Xr <- Xr[!is.na(Yr),]    
Yu <- Yu[!is.na(Yu)]     
Yr <- Yr[!is.na(Yr)]

Practise making plots again of the spectra of the diferent sets. Overlap training and validation sets with different colors,....., and enjoy using R for chemometrics.

6 mar. 2017

Neighborhood Mahalanobis distance matrix

Working with the chemometric packages in R help us to understand other chemometric commercial software’s better.

In Resemble we can use the function fDiss to get a matrix of distances between all the samples in a spectra data set, so we get a square and diagonal matrix with zeroes in the diagonal, because the distance between a sample and itself in the PCA space is cero. This way we can see redundant information and remove it from the spectra set. Finally we can get a well distributed cloud of samples and the average spectrum is more representative to all of them.

Here I just trim the matrix in order to see how close the first 10 samples spectra are  between them.
The spectra used was the NIRsoil data from R.

5 mar. 2017

Wheigthed Average (LOCAL)

We have seen in the post  LOCAL optimization  how, when giving a prediction, LOCAL uses all the PLS terms range we have fixed in the options Min to Max number of terms, and the result is a weighted average of all the results predictions of all the models. So to choose the right range is important to get more accurate predictions.
Looking in the Resemble R package documentation you can see some explanations about how the calculations are made:

"Weighted average pls ("wapls1"): It uses multiple models generated by multiple pls components (i.e. between a minimum and a maximum number of pls components). At each local partition the final predicted value is a weighted average of all the predicted values generated by the multiple pls models. The weight for each component is calculated as follows":

"where s1:j  is the root mean square of the spectral residuals of the unknown (or target) sample when a total of j pls components are used and gj is the root mean square of the regression coefficients corresponding to the jth pls component (see Shenk et al., 1997 for more details).
"wapls1" is not compatible with valMethod = "loc_crossval" since the weights are computed based on the sample to be predicted at each local iteration.
by the multiple pls models".