Cross Validation in Machine Learning

Introduction

Evaluating a model correctly is critical in Machine Learning. A single train-test split may not always give reliable results. This is where cross validation comes in.

In this lesson, you will learn how cross validation works, why it is important, and how to implement it in Python.

What is Cross Validation?

Cross validation is a technique used to evaluate Machine Learning models by dividing the dataset into multiple parts and testing the model multiple times.

It ensures that every data point gets a chance to be in both training and testing sets.

Why Cross Validation is Important

Provides more reliable evaluation
Reduces overfitting
Uses data efficiently
Helps in model selection

What is K-Fold Cross Validation?

K-Fold Cross Validation splits the dataset into K equal parts (folds).

Process

Divide data into K folds
Use one fold for testing and remaining for training
Repeat K times
Calculate average performance

This gives a more accurate estimate of model performance.

K-Fold Formula Representation

$Score=1K∑i=1KScoreiCV\ Score = \frac{1}{K} \sum_{i=1}^{K} Score_i$

The final score is the average of all fold results.

Types of Cross Validation

K-Fold Cross Validation

Most commonly used method

Stratified K-Fold

Maintains class distribution in each fold

Leave-One-Out (LOOCV)

Uses one data point for testing and rest for training

Repeated K-Fold

Repeats K-Fold multiple times for better accuracy

Advantages of Cross Validation

Better model evaluation
Reduces bias
Works well with small datasets
Helps compare models

Limitations of Cross Validation

Computationally expensive
Time-consuming for large datasets

Implementation in Python

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])

model = LinearRegression()

scores = cross_val_score(model, X, y, cv=5)

print(scores)
print(scores.mean())

When to Use Cross Validation

When dataset is small
When you want reliable evaluation
When comparing multiple models

Cross Validation vs Train-Test Split

Train-Test Split

Faster
Less reliable

Cross Validation

More accurate
More computationally expensive

Practical Tip

Always use cross validation before finalizing your model to ensure it performs well on unseen data.

Conclusion

Cross validation is a powerful technique that improves model evaluation and helps you build more reliable Machine Learning models.

In the next lesson, you will learn about Hyperparameter Tuning, which helps optimize model performance.

FAQs

What is cross validation?

It is a technique to evaluate models using multiple data splits.

What is K in K-Fold?

K represents the number of folds.

Why is cross validation better than train-test split?

Because it provides more reliable results.

What is stratified K-Fold?

It maintains class distribution across folds.

Is cross validation slow?

It can be slower due to repeated training.

Internal Link

To explore more courses and improve your skills, click here for more free courses

Our Coach

Quick Link

Apps Download

Archives

Categories

Course

Machine Learning Course in Jaipur – Complete AI & ML Training with Projects

Curriculum