27 ene 2015

Comparing spectra for standardization

These are spectra of a check sample acquired with and without an option called "compatibility mode", the sample is the same on both spectra, but they are different because in one of them an algorithm is applied, in order that the sample seems to be acquired in another instrument (different bandwidth, and some differences in the wavelength scale). If we compare the spectra without math treatments, we can see small differences (an offset), but we don´t realize if there is a shift in the wavelength scale:
Now we apply a first derivative to these spectra:
Here we can see some shifts in the wavelengths from one instrument to other, but we make a zoom to the areas in yellow to see in more detail.
 
 
Recently I read some advices from Mark Whesterhaus, and one way to decide which type of standardization to apply is to subtract one spectra from the other and to check if it looks to a raw spectrum, to a first derivative or to a second derivative and depending of this you can decide the better standardization algorithm.
This is the spectrum of one raw spectrum (from Master Instrument) substracted from the same sample raw spectrum scanned in Instrument 2 (Host):

The spectrum is very similar to the first derivative spectrum of the sample confirming that there is a instrument shift that must be considered during the standardization.
 
This option has been done exporting the same sample scanned in the same instrument with and without "compatibility mode", which is similar to scan the same sample in 2 different instruments.

26 ene 2015

Calculating the Regression Coefficients in PCR

This is a simply exercise, indeed to use pls regression, we use this time "pcr" (principal components regression), in "R" we can use  this script:
mod3.pcr<-pcr(Y~X,data=nir.tr1.2dmsc,ncomp=10,validation="LOO")
One of the outputs we get from this calculation is mod3.pcr$coefficients, this output contains the regression coefficients, one for every wavelength. The coefficients change depending of the number of principal components we use. In this case we have chosen a maximum of 10, and we the cross validation we will decide the number of PCs to use.
When we calculate the PCs, we get from the original spectra matrix, two matrices: The score matrix (T) and the loading matrix (P) and they are related with the formula X=T.Pt + E.
We use the T matrix for the calculation of the regression coefficient in the PCR.
Here I do the steps to calculate the regression coefficients in R and I will check finally if they match with the coefficients calculated with the PCR function in the PLS package. I use for this exercise the shootout 2002 data as in previous posts.

Another output from the PCR function is mod3.pcr$scores, which is the T matrix.
This matrix has been calculated in another post:
T.pca<-X1.sg2dmsc_svd_T[,1:10]

It is easy to check that: T.pcr==T.pca  give TRUE for all.
Now we develop in R (with  the T matrix) this formula:

xhat=(Tt.T)-1. Tt.Y

T.pca<-X1.sg2dmsc_svd_T[,1:10]     (n.a) matrix
T.pca.t<-t(T.pca)                  (a.n) matrix
Tt.T<-T.pca.t%*%T.pca              (a.a) matrix
Tt.T.inv<-solve(Tt.T)              (a.a) matrix
Tt.T.inv.Tt<-Tt.T.inv%*%T.pca.t    (a.n) matrix
x.hat<-Tt.T.inv.Tt%*%Y             (a.1) matrix

and finally
reg.coef=P.xhat

reg.coef<-P.pca%*%x.hat            (k.1) matrix

We check if they match:
pcr.coef.10<-as.matrix(mod3.pcr$coefficients[,,10])
pcr.coef.10[1:10,]

> pcr.coef.10[1:10,]
-1.010183  8.223292 14.806965 26.005365 33.214725 20.101344 14.456345 
 9.498212  4.154216 -5.061061 
 
> reg.coef[1:10,]
-1.010183  8.223292 14.806965 26.005365 33.214725 20.101344 14.456345
 9.498212  4.154216 -5.061061

7 ene 2015

Algebra Lessons: Projections Problem 4_2_1a

 
This is an important Algebra lesson to understand the projections; Chemometrics use them frequently for calculations.
 
Gilbert Strang has a good book: "Introduction to Linear Algebra", with several exercises. From time to time I like to try to solve some of them.