R & Chemometrics: febrero 2021

20 feb 2021

Copying plot values

One of the new features of Foss Calibrator update is that we can copy the plot values of an XY plot of predicted vs. actual values, for example, and paste them in as Excel sheet.

Include it with the copy are: sample position and sample number, sample date and time, and predicted and reference values.

We can do the same with the GH and NH plots.

Move reference outliers to a reference outlier sample set (Visual Check)

Once the clear spectra outliers are remove, we develop the model (In this case N2 in soil) and represent the XY plot (reference values vs. predicted) for the calibration set (blue points) and for the validation set (more than 1000 samples were taken apart from the total set).

Now is our decision to remove some of the samples as reference outliers in the two sets:

Only those that clearly seem reference outliers must be marked, now you can check the statistics, and decide to run the model with other math treatment or calibration strategy (ANN in this case).

Move spectra outliers to a spectra outliers sample set (Visual Check)

One of the strategies that we have to do when developing a new calibration is to inspect visually the spectra, with the idea to remove or mark the apparently clear outliers. In the case of Win ISI, if we have a lot of samples it is easy to see them but takes a lot of time find them to delete them.

That point is improve with Foss Calibrator where we can select them with the mouse and mark them as spectral outliers.

There are many reasons for a sample to be an spectral outlier: Instrument was not warmed up, failure in the instrument (lamp or mechanical noise), not a good sample presentation, temperature, or simple that the sample is very different from the rest.

This is the case of soil samples and we start selecting the ones that seem noisy or different from the rest:

We can keep those samples for further detail in a spectra outlier sample set, that at the same time has lab fata in order to validate with them to check if the calibration can extrapolate.

6 feb 2021

Why R? Webinar 026 - David Smith - R at Microsoft

Interesting webinar with David Smith. I follow David from some time ago and the wonderful blog with a post every day of the week. Now he is quite busy since Revolutions was acquired from Microsoft.

Interesting to see how R is growing and how important companies are interested for people with R background.

I can see how one important NIR manufacturer has contracted to an expert in R (author of several packages) as Head of Data Science.

3 feb 2021

Checking overfitting in validation

Validation is an important tool to improve the calibration. When developing the calibration we use the cross validation to select the number of terms, and we do not do any further action to avoid overfitting. Later we have the surprise that our predictions for new samples have a bias for certain tip e of samples and the error is much more than expected compared to the SECV (standard error of cross validation) that we use as reference.

This is the case, for example, for wheat bran where the CV for moisture (Humedad) suggested 9 terms, and we keep that decision. Some time later we have 61 new samples and the validation gives this results:

In the case we have chosen 3 terms when we develop the calibration the results would be:

We can see that we have a high improvement.

I suggest you use some more techniques apart from the cross validation to select the number of terms. Lately I am trying with some bootstrap techniques.