Model Evaluation in Machine Learning
Introduction
After building a model in Machine Learning, the next important step is to evaluate its performance. Model evaluation helps you understand how well your model is working and whether it can be trusted in real-world scenarios.
In this lesson, you will learn key evaluation metrics such as accuracy, precision, recall, and F1 score.
Why Model Evaluation is Important
- Measures model performance
- Helps compare different models
- Detects errors and weaknesses
- Improves decision-making
Without evaluation, you cannot determine if your model is good or not.
Confusion Matrix
A confusion matrix is used to evaluate classification models.
It contains four important components:
- True Positive (TP): Correctly predicted positive
- True Negative (TN): Correctly predicted negative
- False Positive (FP): Incorrectly predicted positive
- False Negative (FN): Incorrectly predicted negative
Accuracy
Accuracy=TP+TNTP+TN+FP+FNAccuracy = \frac{TP + TN}{TP + TN + FP + FN}
Accuracy measures the overall correctness of the model.
When to Use
When the dataset is balanced
Precision
Precision=TPTP+FPPrecision = \frac{TP}{TP + FP}
Precision measures how many predicted positives are actually correct.
Example
Out of all predicted spam emails, how many are truly spam
Recall
Recall=TPTP+FNRecall = \frac{TP}{TP + FN}
Recall measures how many actual positives are correctly identified.
Example
Out of all actual spam emails, how many were detected
F1 Score
F1=2⋅Precision⋅RecallPrecision+RecallF1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}
F1 Score balances precision and recall.
When to Use
When there is an imbalance in the dataset
Accuracy vs Precision vs Recall
- Accuracy: Overall performance
- Precision: Quality of positive predictions
- Recall: Coverage of actual positives
Choosing the right metric depends on the problem.
Example Scenario
In medical diagnosis:
- High recall is important (detect all patients)
In spam detection:
- High precision is important (avoid marking normal emails as spam)
Implementation in Python
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
y_true = [1, 0, 1, 1]
y_pred = [1, 0, 0, 1]
print(accuracy_score(y_true, y_pred))
print(precision_score(y_true, y_pred))
print(recall_score(y_true, y_pred))
print(f1_score(y_true, y_pred))
Common Mistakes
- Relying only on accuracy
- Ignoring class imbalance
- Not using proper metrics for the problem
Conclusion
Model evaluation is essential for building reliable Machine Learning systems. Understanding these metrics helps you choose the right model and improve performance.
In the next lesson, you will learn about ROC Curve and AUC, which are advanced evaluation techniques.
FAQs
What is accuracy in Machine Learning?
It measures how many predictions are correct overall.
What is precision?
It measures how many predicted positives are correct.
What is recall?
It measures how many actual positives are detected.
What is F1 score?
It is the balance between precision and recall.
Which metric is best?
It depends on the problem and dataset.
Internal Link
To explore more courses and improve your skills, click here for more free courses



