15 sept. 2019


Este video muestra como calcular los PCA y PLS con los "logical sets", también sirve como introducción al desarrollo de modelos de calibración con las opciones recomendadas o bien con la configuración que nosotros consideremos conveniente.

12 sept. 2019


Continuamos a partir de donde lo habíamos dejado en el video 1 donde habíamos encontrado ocho anómalos espectrales (4 de ellos encontrados fácilmente de forma visual y otros cuatro por análisis de componentes principales. Ahora se trata de inspeccionar de si tenemos datos sospechosos de ser anómalos en lo que se refiere al dato de laboratorio, por lo que sin llegar a hacer los modelos observamos las rectas de regresión de cada uno de los parámetros y también los scores y loadings de PLS. Señalaremos las muestras sospechosas para que podamos examinarlas y en caso necesario excluirlas del modelo de calibración que desarrollaremos en los próximos videos.

24 ago. 2019

Using "Sweave" and "Latex" with the Monitor function

Finally I can find the best configuration for "Sweave" in "R" to generate the Validation Reports with the Monitor function. Still some more improvements are needed, but I am quite happy with the results.

7 ago. 2019

Looking the Residual plot taking into account the Boxplot

I have improved the Monitor function just to give more importance to the Box plots. So we can see simultaneously the boxplot and the residual plots and to have more clear ideas about the performance of the model. I have more ideas coming so I hope to complete it in a future.

As we know the box plot give us the median and limits for the quartiles. It defines also the limits to consider where a sample is an outlier. So I divide the samples in 5 groups (Q1 to MIN,Q1,Q2,Q2 to MAX and BPOUT) depending of their value in the boxplot.

If the samples are ordered in the residual pot by their reference value we get:

It is important to order the data by date and we can get other conclusions:

Spending time looking to these plots we can get some conclusions to improve the model to make it more robust.

24 jul. 2019


 When we create a Good Product Model we want to test it with new samples knowing if they are good or bad. There is of course an uncertain area that we can calculate from experience we get from several evaluation.

In this Excel plot we see the values for "Max Peak T", for the training set (samples before Mars 2019) and new batches we consider are fine from Mars to June. There is a set of bad samples prepared with mixtures out of tolerance for a certain component or components of the mixture.

As we can see the model works in some cases but there are other that are misclassified, so we have to try other treatments or models to check if we can classify them better. Anyway there is always an uncertain zone and we have to check for confidence levels of the prediction.

22 jul. 2019

Looking for problems in the Residual plots

Control charts or residual plots are very helpful to detect problems, and we have to look at them always to try to understand how well our model performs. It is important tp have a certain order ibn the X axis to succeed in the interpretation. In this case is in order by the value of the reference, but the order can be by date, by GH,....
There are different rules and we have to check  them. One rule is that there must not be nine points or more in a row on the same side of the zero line, and this is what it happens in this case for a model where the Monitor functions show that there is a problem with the slope. 
Look from left to right and see how more than nine points (red) in a row are over the zero line. Once corrected (yellow points) the distribution improves. 

25 jun. 2019

More about Mahalanobis distance in R

There are several Mahalanobis distance post in this blog, and this post show a new way to find outliers with a library in R called "mvoutlier".
Mahalanobis ellipses can only be shown in 2 dimensions with a cutoff value as we have seen, so we show the maps of scores 2 by 2 for the different combinations of PCs, like in this case for PC1 and PC2 and we can mark the outliers in the plot by the identify function: 

In this case I mark some of the samples out of the Mahalanobis distance cutoff. Anyway the Mahalanobis distance is univariate and in this case where we have a certain number of PCs, we have to see not just a map of two of them or all at the same time, we need a unique Mahalanobis distance value and to check if that value is over or into the cutoff value that we assign.
For that reason we use the Moutlier function of the "chemometrics" package and show a real Mahalanobis outlier plot which can be Robust or Classical:
We can see the classical plot and identify the samples over the cutoff:
We can see the list of all the distances in the output list for the function. I will continue with more options to check the Mahalanobis distances in the next post.