R & Chemometrics: PLS2 with "R"

22 sept 2012

PLS2 with "R"

I´ve been working these days with PLS2 calibrations with a chemometric software called “Unscrambler” with a data set called “jam”. I said “can I develop PLS2 models with R?”.I look in the book “Introduction to Multivariate Statistical Analysis in Chemometrics”, and I got the response “Yes, we can”.

I have other posts for PLS regressions, but it is PLS1, where we have an X matrix (spectra) and we make a regression for one constituent of the Y matrix at a time. What about to make the regression for all the constituents at the same time using the whole Y matrix?. That is the purpose of PLS2.

PLS2 is recommended when there is a high correlation between the constituents.

library(chemometrics)

data(cereal)

This data is part of a set used by Varmuza et al. 2008, for other papers.

You can get a description for this data in the R help page:

Description

For 15 cereals an X and Y data set, measured on the same objects, is available. The X data are 145 infrared spectra, and the Y data are 6 chemical/technical properties (Heating value, C, H, N, Starch, Ash). Also the scaled Y data are included (mean 0, variance 1 for each column). The cereals come from 5 groups B=Barley, M=Maize, R=Rye, T=Triticale, W=Wheat.

Once loaded, take a look to the data

dim(cereal$X)

dim(cereal$Ysc)

We can have a look to the spectra, (it is already treated with SG and first derivative).

wavelengths<-seq(1126,2278,by=8)

matplot(wavelengths,t(cereal$X),lty=1,xlab="wavelengths(nm)",ylab="log(1/R)")

Now let´s run PLS2, using “mvsr”, with “LOO” (leave one out) cross validation.

cerpls2<-mvr(Ysc~X,data=cereal,method="simpls",validation="LOO")

We can see a summary of the results:

summary(cerpls2)

Now we have to take an important decision, “How many terms to choose?”.

Plots can help us with it:

plot(RMSEP(cerpls2), legendpos = "topright")

We have to select an average, and looking to the plots we can say that 7 is fine, anyway for "starch" less terms would be fine, but for the rest 6 or 7 is the correct number.

1 comentario:

Isabel21 de enero de 2021 a las 15:49
Gostaria de ajuda apara padronizar meus dados para SNV e roda-los em analise PLS 9( Obs.: é um conjunto de dados muito grande).
Agradeço desde já!!
ResponderEliminar
Respuestas

Añadir comentario