Overfitting and Underfitting in Machine Learning
Introduction
When building models in Machine Learning, one of the biggest challenges is ensuring that the model performs well not only on training data but also on new, unseen data.
Two common problems that affect model performance are overfitting and underfitting.
In this lesson, you will learn what these problems are, why they occur, and how to fix them.
What is Underfitting?
Underfitting occurs when a model is too simple to learn the underlying patterns in the data.
Characteristics
- Poor performance on training data
- Poor performance on test data
- High bias
Example
Using a straight line to model complex data.
What is Overfitting?
Overfitting occurs when a model learns the training data too well, including noise and outliers.
Characteristics
- Very high accuracy on training data
- Poor performance on test data
- High variance
Example
A model that memorizes data instead of learning patterns.
Bias-Variance Tradeoff
Total Error=Bias2+Variance+Irreducible ErrorTotal\ Error = Bias^2 + Variance + Irreducible\ Error
- Bias: Error due to overly simple model
- Variance: Error due to model sensitivity to data
The goal is to find a balance between bias and variance.
Key Differences
Underfitting
- Model too simple
- High bias
- Misses patterns
Overfitting
- Model too complex
- High variance
- Captures noise
Causes of Underfitting
- Using simple models
- Insufficient training
- Lack of features
Causes of Overfitting
- Too complex model
- Too many features
- Small dataset
- Noise in data
How to Fix Underfitting
- Use more complex models
- Add more features
- Train longer
- Reduce regularization
How to Fix Overfitting
- Use more data
- Apply regularization
- Reduce model complexity
- Use cross-validation
- Feature selection
Regularization
Regularization helps reduce overfitting by penalizing large coefficients.
Types
- L1 Regularization
- L2 Regularization
Practical Example
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
X = [[1], [2], [3], [4], [5]]
y = [1, 2, 3, 4, 100]
X_train, X_test, y_train, y_test = train_test_split(X, y)
model = LinearRegression()
model.fit(X_train, y_train)
print(model.score(X_test, y_test))
Why This Concept is Important
Understanding overfitting and underfitting helps you:
- Build better models
- Improve accuracy
- Avoid common mistakes
- Generalize well to new data
Conclusion
Overfitting and underfitting are critical concepts in Machine Learning. Balancing bias and variance is key to building robust and reliable models.
In the next lesson, you will learn about Cross Validation, a powerful technique to evaluate models more effectively.
FAQs
What is overfitting?
It is when a model learns the training data too well and fails on new data.
What is underfitting?
It is when a model is too simple to learn patterns in the data.
What is bias?
It is error due to overly simple assumptions.
What is variance?
It is error due to sensitivity to training data.
How to avoid overfitting?
Use regularization, more data, and cross-validation.
Internal Link
To explore more courses and improve your skills, click here for more free courses



