29 jul 2014

Comparing regressions with different scatter math-treatments (Shootout 2002 Tutorial)


In a previous post, we have developed the calibration without any math-treatment with the Training Set from Instrument 1, without any treatment , knowing that it was not the best choice, and we look to the LOO (leave one out) cross validation errors to check the performance.

Now we develop the regressions with some anti-scatter math-treatments to compare the cross validation errors and to decide which of them performs better. Anyway the shootout supplies also a test set, so validating with this test set will give us a better idea about how the calibration is performing with independent data.
 
First thing to do is to convert the “X1. Training” and “X1. Test” matrix to the math treatment we want to use: SNV (Standard Normal Variate), Detrend , SNV + Detrend and MSC.
>nir.train1_snv<-data.frame(X= I(X1_snv),Y=I(Y))
>nir.train1_detrend<-data.frame(X= I(t(X1_detrend)),Y=I(Y))
>nir.train1_snvdt<-data.frame(X= I(t(X1_snvdt)),Y=I(Y))
>nir.train1_msc<-data.frame(X= I(X1_msc),Y=I(Y))
Now we can develop the PLS regressions:
 
>mod1_snv<-plsr(Y~X,data=nir.train1_snv,+
 ncomp=10,validation="LOO")
>mod1_detrend<-plsr (Y~X,data=nir.train1_detrend,+
 ncomp=10,validation="LOO")
>mod1_snvdt<-plsr(Y~X,data=nir.train1_snvdt,+
 ncomp=10,validation="LOO")
>mod1_msc<-plsr(Y~X,data=nir.train1_msc,+
 ncomp=10,validation="LOO")
We can plot the RMSEP values versus the number of components (or terms), to have a better idea of the performance of the models (black line is the model without mat-treatments or raw spectra, green line is with just Detrend, the rest (SNV, SNV+DT and MSC) are almost overlaped.
 
But if we want to see it with more details, we have to see the numbers provided by the summary of the models. I mark in yellow the smallest values.
 
 
But the question can be: Do we have to choose the number of components which gives the small RMSEP?.
We will reply to this question soon.
In a next post we will do the same with derivative mixed with scatter corrections to see if we get better values for RMSEP and we will check it with an external validation (don´t forget that these RMSEP are for Cross Validation).
 
 
 
 
 
 

3 comentarios:

  1. Este comentario ha sido eliminado por el autor.

    ResponderEliminar
  2. Dear José Ramón Cuesta, I have a question concerning a topic on you blog. Do you have an email address where I can contact you? you can contact me on: eva.ampe@limagrain.com
    Thank you very much
    Best wishes
    Eva Ampe

    ResponderEliminar