I have been using 159 spectra of soya meal in order to select 60 of them to send to the laboratory (imagine that I can only afford to analyze this number due to the high cost of the reference methods). It is obvious that I would like to select a set which represent as better as possible the whole population of these 159 samples. I used the “duplex algorithm" from "prospectr package", which select a “model set” and a “test set”, so I will have 30 on each.
Some days later the result from the “Lab” are here, so I am anxious to check the “summary” and the “histograms”. Normally you have an idea of the range of protein in the soya meal that you received, and this can help you to check if the sample sets are representative. But in this case I have the advantage to have all the lab values for the 159 samples, so let´s see: (green: Histogram of the 159 samples for protein, blue:histogram of the 30 samples for training, violet: histogram of the 30 samples for validation).