25 jun 2019

More about Mahalanobis distance in R

There are several Mahalanobis distance post in this blog, and this post show a new way to find outliers with a library in R called "mvoutlier".

Mahalanobis ellipses can only be shown in 2 dimensions with a cutoff value as we have seen, so we show the maps of scores 2 by 2 for the different combinations of PCs, like in this case for PC1 and PC2 and we can mark the outliers in the plot by the identify function:

In this case I mark some of the samples out of the Mahalanobis distance cutoff. Anyway the Mahalanobis distance is univariate and in this case where we have a certain number of PCs, we have to see not just a map of two of them or all at the same time, we need a unique Mahalanobis distance value and to check if that value is over or into the cutoff value that we assign.

For that reason we use the Moutlier function of the "chemometrics" package and show a real Mahalanobis outlier plot which can be Robust or Classical:

We can see the classical plot and identify the samples over the cutoff:

We can see the list of all the distances in the output list for the function. I will continue with more options to check the Mahalanobis distances in the next post.

24 jun 2019

Validation problem (extrapolation)

Sometime when validating a product for a certain constituent (in this case dry matter) we can see this type of X-Y plot:

This a not nice at all validation, but we have to see first that we have like to clusters of lab values for lower and higher dry matter. So the first question is:
Which is the range of the calibration samples in the model which I am validating?.

I check and I see that the range for dry matter in the model is from 78,700 to 86,800, so I am validating with samples more dried than the ones in the calibration.

I see that it seems like bias effect for those samples. Let´s remove the samples in range and check the statistics for the samples out of range:

We see that we have a bias effect, and some slope caused but one of the samples. So this is a new source of variation to expand the calibration. Merge the validation samples to the database and recalibrate. Try to make robust the new model for extrapolation.