Cross Validation in Machine Learning
Introduction
Evaluating a model correctly is critical in Machine Learning. A single train-test split may not always give reliable results. This is where cross validation comes in.
In this lesson, you will learn how cross validation works, why it is important, and how to implement it in Python.
What is Cross Validation?
Cross validation is a technique used to evaluate Machine Learning models by dividing the dataset into multiple parts and testing the model multiple times.
It ensures that every data point gets a chance to be in both training and testing sets.
Why Cross Validation is Important
- Provides more reliable evaluation
- Reduces overfitting
- Uses data efficiently
- Helps in model selection
What is K-Fold Cross Validation?
K-Fold Cross Validation splits the dataset into K equal parts (folds).
Process
- Divide data into K folds
- Use one fold for testing and remaining for training
- Repeat K times
- Calculate average performance
This gives a more accurate estimate of model performance.
K-Fold Formula Representation
CV Score=1K∑i=1KScoreiCV\ Score = \frac{1}{K} \sum_{i=1}^{K} Score_i
The final score is the average of all fold results.
Types of Cross Validation
K-Fold Cross Validation
Most commonly used method
Stratified K-Fold
Maintains class distribution in each fold
Leave-One-Out (LOOCV)
Uses one data point for testing and rest for training
Repeated K-Fold
Repeats K-Fold multiple times for better accuracy
Advantages of Cross Validation
- Better model evaluation
- Reduces bias
- Works well with small datasets
- Helps compare models
Limitations of Cross Validation
- Computationally expensive
- Time-consuming for large datasets
Implementation in Python
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])
model = LinearRegression()
scores = cross_val_score(model, X, y, cv=5)
print(scores)
print(scores.mean())
When to Use Cross Validation
- When dataset is small
- When you want reliable evaluation
- When comparing multiple models
Cross Validation vs Train-Test Split
Train-Test Split
- Faster
- Less reliable
Cross Validation
- More accurate
- More computationally expensive
Practical Tip
Always use cross validation before finalizing your model to ensure it performs well on unseen data.
Conclusion
Cross validation is a powerful technique that improves model evaluation and helps you build more reliable Machine Learning models.
In the next lesson, you will learn about Hyperparameter Tuning, which helps optimize model performance.
FAQs
What is cross validation?
It is a technique to evaluate models using multiple data splits.
What is K in K-Fold?
K represents the number of folds.
Why is cross validation better than train-test split?
Because it provides more reliable results.
What is stratified K-Fold?
It maintains class distribution across folds.
Is cross validation slow?
It can be slower due to repeated training.
Internal Link
To explore more courses and improve your skills, click here for more free courses



