This is a useful tool in R in order to evaluate a predictive model for classification. We know the expected value and the predicted on and from that we can get the Confusion Matrix and the useful statistics based by formulas from that matrix.
I reproduce here the code from the post: "How To Estimate Model Accuracy in R Using The Caret Package" from the blog "Machine Learning Mastery":
# load the libraries
library(caret)
library(klaR)
# load the iris dataset
data(iris)
# define an 80%/20% train/test split of the dataset
split=0.80
trainIndex <- createDataPartition(iris$Species,
p=split,
list=FALSE)
data_train <- iris[ trainIndex,]
data_test <- iris[-trainIndex,]
# train a naive bayes model
model <- NaiveBayes(Species~., data=data_train)
# make predictions
x_test <- data_test[,1:4]
y_test <- data_test[,5]
predictions <- predict(model, x_test)
# summarize results
confusionMatrix(predictions$class, y_test)
Try to understand the results, some samples are well classified and others not. So we must try to find the model where we have the better statistics for the classification. This is a simple example, but why not to try this machine learning algorithms to spectra for classification and use the confusion matrix to get the best model.
The statistics we get running the last line of code are:
> confusionMatrix(predictions$class, y_test) Confusion Matrix and Statistics Reference Prediction setosa versicolor virginica setosa 10 0 0 versicolor 0 9 1 virginica 0 1 9 Overall Statistics Accuracy : 0.9333 95% CI : (0.7793, 0.9918) No Information Rate : 0.3333 P-Value [Acc > NIR] : 8.747e-12 Kappa : 0.9 Mcnemar's Test P-Value : NA Statistics by Class: Class: setosa Class: versicolor Class: virginica Sensitivity 1.0000 0.9000 0.9000 Specificity 1.0000 0.9500 0.9500 Pos Pred Value 1.0000 0.9000 0.9000 Neg Pred Value 1.0000 0.9500 0.9500 Prevalence 0.3333 0.3333 0.3333 Detection Rate 0.3333 0.3000 0.3000 Detection Prevalence 0.3333 0.3333 0.3333 Balanced Accuracy 1.0000 0.9250 0.9250
An easy example to understand the confusion matrix can be with this code:
library(caret)
expected <- factor(c(1, 1, 0, 1, 0, 0, 1, 0, 0, 0))
predicted <- factor(c(1, 0, 0, 1, 0, 0, 1, 1, 1, 0))
results <- confusionMatrix(data=predicted, reference=expected)
print(results)
Where you get:
> print(results)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 4 1
1 2 3
Accuracy : 0.7
95% CI : (0.3475, 0.9333)
No Information Rate : 0.6
P-Value [Acc > NIR] : 0.3823
Kappa : 0.4
Mcnemar's Test P-Value : 1.0000
Sensitivity : 0.6667
Specificity : 0.7500
Pos Pred Value : 0.8000
Neg Pred Value : 0.6000
Prevalence : 0.6000
Detection Rate : 0.4000
Detection Prevalence : 0.5000
Balanced Accuracy : 0.7083
From the Caret Documentation which are the formulas for these statistics:
Could you explain plz how caret compute the kappa statistic ?
ResponderEliminar