Overfitting and Underfitting in Machine Learning

Introduction

When building models in Machine Learning, one of the biggest challenges is ensuring that the model performs well not only on training data but also on new, unseen data.

Two common problems that affect model performance are overfitting and underfitting.

In this lesson, you will learn what these problems are, why they occur, and how to fix them.

What is Underfitting?

Underfitting occurs when a model is too simple to learn the underlying patterns in the data.

Characteristics

Poor performance on training data
Poor performance on test data
High bias

Example

Using a straight line to model complex data.

What is Overfitting?

Overfitting occurs when a model learns the training data too well, including noise and outliers.

Characteristics

Very high accuracy on training data
Poor performance on test data
High variance

Example

A model that memorizes data instead of learning patterns.

Bias-Variance Tradeoff

$Total\ Error = Bias^2 + Variance + Irreducible\ Error$

Bias: Error due to overly simple model
Variance: Error due to model sensitivity to data

The goal is to find a balance between bias and variance.

Key Differences

Underfitting

Model too simple
High bias
Misses patterns

Overfitting

Model too complex
High variance
Captures noise

Causes of Underfitting

Using simple models
Insufficient training
Lack of features

Causes of Overfitting

Too complex model
Too many features
Small dataset
Noise in data

How to Fix Underfitting

Use more complex models
Add more features
Train longer
Reduce regularization

How to Fix Overfitting

Use more data
Apply regularization
Reduce model complexity
Use cross-validation
Feature selection

Regularization

Regularization helps reduce overfitting by penalizing large coefficients.

Types

L1 Regularization
L2 Regularization

Practical Example

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X = [[1], [2], [3], [4], [5]]
y = [1, 2, 3, 4, 100]

X_train, X_test, y_train, y_test = train_test_split(X, y)

model = LinearRegression()
model.fit(X_train, y_train)

print(model.score(X_test, y_test))

Why This Concept is Important

Understanding overfitting and underfitting helps you:

Build better models
Improve accuracy
Avoid common mistakes
Generalize well to new data

Conclusion

Overfitting and underfitting are critical concepts in Machine Learning. Balancing bias and variance is key to building robust and reliable models.

In the next lesson, you will learn about Cross Validation, a powerful technique to evaluate models more effectively.

FAQs

What is overfitting?

It is when a model learns the training data too well and fails on new data.

What is underfitting?

It is when a model is too simple to learn patterns in the data.

What is bias?

It is error due to overly simple assumptions.

What is variance?

It is error due to sensitivity to training data.

How to avoid overfitting?

Use regularization, more data, and cross-validation.

Internal Link

To explore more courses and improve your skills, click here for more free courses

Our Coach

Quick Link

Apps Download

Archives

Categories

Course

Machine Learning Course in Jaipur – Complete AI & ML Training with Projects

Curriculum

Overfitting and Underfitting in Machine Learning

Introduction

What is Underfitting?

Characteristics

Example

What is Overfitting?

Characteristics

Example

Bias-Variance Tradeoff

Key Differences

Causes of Underfitting

Causes of Overfitting

How to Fix Underfitting

How to Fix Overfitting

Regularization

Types

Practical Example

Why This Concept is Important

Conclusion

FAQs

What is overfitting?

What is underfitting?

What is bias?

What is variance?

How to avoid overfitting?

Internal Link

Leave A Comment Cancel Comment

Our Coach

Quick Link

Apps Download

Archives

Categories

Modal title