R & Chemometrics: 2020

30 dic 2020

ALL THE BEST FOR THIS YEAR 2021

Espectros y Señales de Trafico ¿?

Dentro de los curso de Data Camp, varios de ellos son dedicados a "Machine Learning" y usan bases de datos de diferentes tipos, pero a los que queremos aplicar estas técnicas y paquetes de R a la espectroscopia nos gustaría hacerlo con espectros y en el caso de este blog, con espectros NIR. Es por ello interesante extrapolar los ejemplos y ejercicios que se usan al caso de la espectroscopía en lo posible.

Uno de los casos usados en los cursos de Machine Learning, es el de como los vehículos autónomos reconocen las señales de tráfico para en función de ello tomen determinadas decisiones (por ejemplo la de pararse en el caso de ver una señal de Stop), para lo cual debemos de disponer de una base de datos de fotografías de señales de tráfico en diferentes situaciones de (tiempo, hora, ángulo,...). Estas imágenes tendrán distintos fondos (arboles, cielo,...), brillos, etc por lo que se necesitan una gran cantidad de imágenes. Aquí podríamos también pensar en una gran base espectros adquiridos en distintos equipos, con distintas presentaciones de muestra, usuarios, etc.

En el caso de una señal de tráfico es una matriz de pixelado (ejemplo 4.4) y con tres capas de color, por lo que disponemos de 16.3 variables que también estarán correlacionadas entre sí al igual que las variables espectrales. Cuando el coche vea una señal de tráfico la comparará con la base de datos y por un algoritmo de distancia KNN (K nearest neighbors) detectará de que tipo de señal se trata y tomará una decisión que puede ser o no acertada. Esta serie de decisiones se validará para ver si el modelo de reconocimiento de imágenes está funcionando correctamente. Lo mismo pasaría a la hora de registrar un espectro y que el modelo lo prediga como perteneciente a una especie, región,.... y en base a ello tome una determinada decisión de aplicar otro método cuantitativo por ejemplo. En ambos casos usamos modelos de compresión del espacio quitando las correlaciones entre las variables por métodos como PCA (Principal Component Analisis).

Quienes trabajamos con calibraciones LOCAL podemos pensar en una cierta similitud, donde fijamos el valor de "k" o número de vecinos (en la función KNN) a la muestra problema en un determinado rango de valores y en función de las muestras que seleccione dará un resultado que puede ser comparado con el valor real para ver la viabilidad de funcionamiento.

5 dic 2020

FOSS IBERIA NIR FORUM 2020 (Video Summary)

1 dic 2020

Nouvelles stratégies pour la modélisation LOCAL

Interesting lecture from Pierre Dardenne:

FOSS CALIBRATOR TIPS (SPLITING RULE)

When we import a "cal" file into Foss Calibrator, we have the option to split this file into a training and validation set deciding the percentage of samples which goes to the training (80 by default) and to the validation set (20 by default), and the way this percentage is used (randomly), preserving the same distribution for every parameter, or based on time (the last 20% goes to validation and the older rest samples goes for training).

We can, anyway to import the "cal" file as a Validation set (will be used in the models as validation), as Training (will be used as training set for calibration) or None, being this last one important to hide, in same way, this set to the development of the model and change it later to Validation to check the performance for this particular set.

6 nov 2020

Selecting between ANN, MPLS or LOCAL calibrations

What is the best algorithm to analyze pH in soil. I try with MPLS, ANN and LOCAL. Models had been developed with a training set and we check the performance with a test set.

We can see that the performance is almost similar for ANN and LOCAL vs. the MPLS model.

LOCAL models have the advantage that we get the GH and NH values, so we can recalculate removing the high GHs values, that will be marked in red if the test samples would be analyzed in routine.

Sorry for some black cuts in the video.

28 oct 2020

SPAIN COVID-19 reports in "R"

Interesting webpage where we can follow the evolution of the COVID-19 in Spain, and it is developed with R, so I recommend to all R users to consult it.

Link added on left side of the blog.

Please take care and keep safe.

30 sept 2020

Monitor function: Improved boxplot distribution

Adding the "edaplot" function to the predicted and reference values, we can get a better idea of the distributions and help to a better understanding about how the model work. So this option is used to update the monitor boxplot function.

25 sept 2020

Monitor Package Demo 03

28 ago 2020

NIR - Comparativa DS2500 vs modelos anteriores - Webinario

9 jul 2020

FOSS CALIBRATOR: Tutorial 007

This time we want to test the performance of the models with a new sample set that, I have imported as Validation Set (so it is not divide into, training and validation as in other cases).

First we check if there are any strange spectrum (which is not the case), so we go to Models _ Predict to see how the new samples appear in the XY validation plot versus the samples we have used during the development of the model. A clear bias appear, so we have to improve the model adding this new variability (new company, new batches, samples much more recent than the used in the calibration, new instrument, different laboratory,….).

3 jun 2020

FOSS CALIBRATOR: Tutorial 006

A calibration is robust if independently of the validation and training sets their predictions are robust as well, so we can try with different sets for training and validation selected randomly, time based, retaining parameter distribution,.....

Foss Calibrator can help quite a lot in this part as you can see in the video.

2 jun 2020

Water Absortion in Wheat Flour

Undertand you product and the parameters of that product it is very important for the development of the calibration. XY Plots and Residuals plot helps to understand them you use colors, symbols or other resources to create them.

In this case is a validation of water absortion in wheat flour, and all the samples are represented (low, medium and high strength (Flojas, Media Fuerza y Fuerza) , and some use aditives and others not.

Filters and other functions of ggplot or R, help to see the figures in more details and get interesting information to improve the calibration.

27 may 2020

FOSS CALIBRATOR: Tutorial 005

Time to create the outlier model to predict the Mahalanobis distances in the principal component space.

FOSS CALIBRATOR: Tutorial 004

This is the video number 4 for the Foss Calibrator tutorials in spanish, where a model is developed using the MPLS algorithm. After the model calculation we can see several plots and statistics.

Foss calibrator is very fast for this types of models so we can do several almost at the same time and choose the best one.

Review the statistics and plots trying to finds patterns, outliers,...,etc

12 may 2020

Creating a Single Sample Standardization

We create two single sample standardization files, one with the NIR5000 as Master and the DS2500F as Host , and other with the NIR5000 as Host and the DS2500F as Master.

Depending of the scenario we can use one or the other.

Choosing one sample for the standardization

See first the other three videos:
Trim Spectra
RMS of subsamples
RMS between same samples in different instruments

In this video we want to select one of the samples to create a standarization file to transfer the data base of soy meal from the NIR5000 to the DS2500.

One rule of thumb is to select the sample with the lowest GH value looking that the sample is near the average of the spectral population of the soy datababe from which I had developed the calibration in the NIR5000.

11 may 2020

RMS between same samples in different instruments

See first the previous two videos:
Trim Spectra
RMS of subsamples

Now we average the repacks and compare the RMS between the samples scanned on different instrument not standardized using the Contrast spectra function of Win ISI.

Obviously the RMS are higher than the repacks of the samples in one instrument, because we are adding the difference between instruments, but the idea is that after the standardization the RMS of the same sample between two instruments be similar or if possible lower than the sampling error (RMS between repacks on the same instrument).

RMS of Subsamples

See first the previous video:

Trim spectra

It is important to know the sampling error, and for this reason the calculation of the RMS of subsamples is very important.

The RMS is a way to obtain a value for the spectral differences between all the repacks.

For the calculation of the RMS simply scan diiferent repacks of the same sample and homogenize the sample betwee subsamples. Try to do it as best as you can in order to get a low RMS.

Same products are quite heterogeneus and for that reason the RMS can increase. If we grind the sample the RMS wil decrease much more.

The RMS value wil be usefull for some comparisons during the standardization or database transfer.

TRIM SPECTRA

We have a certain number of samples scanned in two instruments (a NIR5000 and a DS2500F). Several repacks of the same sample have been scanned on both instruments, due that they have different sample presentation and different cups.

A sample with a certain ID was well homogenized and the contain was splitted into the two cuvettes, and we repeat the process several times in order to get a higher probability that the same sample has been scanned on both instrument and that will help to see the differences between the instruments.

When we want to compare spectra files from different instruments they must have the same range and the same number of data. In the video we trim the spectra from a DS2500F (850-2500, 0.5) to the range and data points of a NIR5000 (1100-2500, 2nm).

After that we can overplot or subtract the spectra.

The idea of all this coming videos is to show the process of database transfer from a NIR5000 to a DS2500 or DS2500F instrument.

26 abr 2020

FOSS CALIBRATOR: Tutorial - 003

In the Sample menu, apart to see the spectra we have the option to inspect the samples in a principal components space looking for GH outliers. In this case after changing the default configuration by the one I choose in the previous videos I decide to remove the samples with a GH higher tha 4.00.

These samples are marked as spectral outliers, and we can see them in red color merged with the rest of the spectra, so we can inspect possible reason for those GH values.

We can recalculate the PCs again once these samples are remove, but we will do that in the Model Menu in next videos. This time the idea was to remove what we can consider clear spectral outliers.

In the sample menu we have the option to run PLS to have an idea about how the calibration will work and to check if we have clear reference outliers, taht in the case that the values are not correctly typed we can edit them and change to the correct value.

PCA and PLS will be treated specially in the Models menu, where we use the validation set and we will get the performance statistics in the case of PLS or MPLS Models.

FOSS CALIBRATOR: Importing lab values into a ".nir" file
FOSS CALIBRATOR: Tutorial 001
FOSS CALIBRATOR: Tutorial 002

23 abr 2020

FOSS CALIBRATOR: Tutorial 002

In this second tutorial, we continue looking with more detail to the spectra looking for noise that can be due to the sample presentation or other causes. Unless that noisy area has important information we can remove it for the calculation of outlier models and prediction models.

Use a higher degree of derivative or lower gaps can help to the detection of noise.

If there are important information, in the noisy area try to use higher gaps or lower derivative to see if there is an improvement in the spectra shape.

In the case that we are discriminating we have to check if the bands of interest are clearly higher than the noise.

Other tutorials:
FOSS CALIBRATOR: Importing lab values into a ".nir" file
FOSS CALIBRATOR: Tutorial 001

19 abr 2020

FOSS CALIBRATOR: Tutorial 001

After importing the "nir" file into the project and link to it the ".csv" file with the lab values, we have generated three logical sets (the total, the training and the validation sets).

Random split was choose and a warning advice that the validation set is not cover by the training set. We can recalculate or choose other split method, but we continue with these ones.

Without generating any outliers models yet we explore the data into the PCA and PLS space looking for reference or spectral outliers.

Other tutorials:
FOSS CALIBRATOR: Tutorial 002
FOSS CALIBRATOR: Importing lab values into a ".nir" file

17 abr 2020

FOSS CALIBRATOR: Importing and adding lab values to a ".nir" file

As you see in the video a project is created in Foss Calibrator and a cocoa nir file is imported.

This file can not be divided into a training and test set until a parameter is created and the lab values are imported from a ".csv" file.

Other tutorials:
FOSS CALIBRATOR: Tutorial 002
FOSS CALIBRATOR: Tutorial 001