I observed also that the sample 219 has a high residual for the regressions of all the constituents, so I decided to remove these two samples from the sample set in order to continue, and to develop the models.
I am starting with R, so I will appreciate if you add comments in order to do this task in a simpler way.
I create two sample sets, in order to remove these two samples (219 and 57):
> fattyac1<-fattyac_msc[1:56,]
> fattyac2<-fattyac_msc[58:218,]
and I combined this three sets again:
> fattyac_msc1<-rbind(fattyac1,fattyac2)
Well, I can develop my regression now:
Now we have to take the decision of how many terms to choose. Let´s see the validation plot with 7 and 12 components (terms).
plot(C16_0,ncomp=7,which="validation")
It is clear that the decision to choose one model or the other will have a great influence in the predictions. We need a validation set to make a better decision. But I think that it will work better with 12 terms.
It will be important, if possible to find samples with C16:0 values bellow 18 to add to our database in order to develop a better model.
Another decision could be to keep out this extreme sample until we find more, but we can decide to leave it, in order to extrapolate better in this zone.
It is important not to have unique samples in the model. In this case we have one. We have to consider this.
It will be important, if possible to find samples with C16:0 values bellow 18 to add to our database in order to develop a better model.
Another decision could be to keep out this extreme sample until we find more, but we can decide to leave it, in order to extrapolate better in this zone.
It is important not to have unique samples in the model. In this case we have one. We have to consider this.
If you want to follow this tutorial, please send me an e_mail. I´ll send you the “txt” file attached.
No hay comentarios:
Publicar un comentario