R & Chemometrics: Questions about NIR Modelling (001)

5 jul 2022

Questions about NIR Modelling (001)

Sometimes I receive mails from the readers, that are very interesting, so I create this post to answer the reader and to keep the post to create comments or add what the readers consider about their own experience.

The choice of the wavelength corresponding to the studied parameter (is it better to keep all the scan or to choose a part which represents the targeted parameter? If any, how to do so?)

Normally all the scan is used, and the PLS algorithm, latent variables (PLS terms) which represents the spectral variance and their covariance with the studied parameters. Looking to the regression coefficients, and knowing the wavelengths at which those correlation absorb, you can try to interpret the regression and decide if certain wavelength zones could be excludes (as flat zero zones, …..). Regression coefficients are very difficult to interpret due to the math treatments applied (specially derivatives).

Other option is to choose a few specific wavelengths, when normally the first one is the one at which the parameter absorbs (example: 1940 nm for the water) and continue adding wavelengths of other constituents that interfere with the water, or zones that do not absorb, but scatter is observed. Normally the software helps you with these selections, and you have always the statistics to see if the wavelengths added improve the regression. Normally this type of algorithm is called MLR (Multiple Linear Regression).

How to split the samples between calibration and validation (is there a test to do?)

Split randomly 80% of the samples to the Training Set, and the remain 20% to the Test Set. In the case you have a lot of samples from different years you can uses other approaches (the older samples for calibration and the new ones for validation, …..). Anyway, if the calibration is robust, you should get similar results.

The criteria for choosing the tests to be performed for pre-processing (2nd derivative, SNV, MSC, etc.).

Normally the criteria is to choose the simplest math treatment. For the scatter, if you have a lot od samples and all the possible variability represented you can try MSC, if not one of the best options is to combine SNV and Detrend.

First derivative is difficult to interpret (the maximum for the raw spectra, becomes a zero crossing), second derivative is better interpretable. The important option is the gap you use (not to long because you can loose information, and not too short because you add noise).

2 comentarios:

Hisham 5 de julio de 2022, 11:40
Thank you very much for the explanations and clarifications.
ResponderEliminar
Respuestas