AdaBoost

AdaBoost in ML

Introduction

AdaBoost (Adaptive Boosting) is a sequential ensemble method that combines many weak learners (typically shallow decision trees) into a strong classifier by focusing more and more on the hard-to-classify points.

What makes it Adaptive?

Instead of training all models on the same data, AdaBoost:

Starts with equal weights for all data points.
Trains a weak learner (e.g., a decision stump — a tree with one split).
Increases the weights of misclassified points.
Trains the next learner to focus on those difficult points.
Repeats for a fixed number of iterations or until performance stops improving.

Why "Weak" Learners?

AdaBoost works with models that are only slightly better than random guessing (e.g., error rate < 50% on binary classification). The power comes from the ensemble, not the individual learners. Each tree corrects the mistakes of the previous ones.

The Boosting Process

Imagine a dataset of 10 data points. AdaBoost goes through these steps:

Step 1: Train the first tree. Let's say it correctly classifies 7 points but misclassifies 3.
Step 2: Increase the weights of the 3 misclassified points.
Step 3: Train the second tree, forcing it to focus more on those 3 points. It might correctly classify 2 of them, but misclassify 1.
Step 4: Increase the weight of the remaining misclassified point.
Step 5: Train the third tree to focus on that last point.
Final Model: Combine all 3 trees with different weights (the ones that performed better get higher weights).

AdaBoost Algorithm Steps

Here's the step-by-step breakdown of the AdaBoost algorithm:

Initialize Weights: Start with equal weights for all data points:
Loop for T Iterations:

Train Weak Learner: Train a weak learner (e.g., decision stump) on the current weighted data.
Calculate Weighted Error: Calculate the weighted error rate of the learner.
Calculate Learner Weight: Calculate the weight for this learner (more error = lower weight).
Update Data Weights: Increase weights for misclassified points, decrease for correctly classified points.
Normalize Weights: Rescale weights so they sum to 1.

Final Prediction: Combine all learners using their weights (weighted majority vote for classification, weighted sum for regression).

AdaBoost Mathematics (for the curious mind)

Here are the core formulas used in AdaBoost:

Initializing Weights: $$ w_i = \frac{1}{N} \quad \text{for all } i = 1, \dots, N $$

Learner Weight Calculation: $$ \alpha_t = \frac{1}{2} \ln \left( \frac{1 - \epsilon_t}{\epsilon_t} \right) $$

The larger $\alpha_t$, the more weight is given to the learner.

Updating Data Weights: $$ w_{i, t+1} = \frac{w_{i, t}}{Z_t} \times \begin{cases} e^{-\alpha_t} & \text{if misclassified} \\ e^{\alpha_t} & \text{if correctly classified} \end{cases} $$

where $Z_t$ is the normalization factor to ensure weights sum to 1.

Final Prediction: $$ y_{final}(x) = \text{sign} \left( \sum_{t=1}^{T} \alpha_t h_t(x) \right) $$

Step-by-Step Algorithm

Step 1: Initialize weights: Initially, all data points are equally important: $$w_i^{(1)} = \frac{1}{n}$$
Step 2: Train weak learner: Train a weak classifier $h_t(x)$ using the weighted dataset. The learner focus more on points with higher weights.
Step 3: Compute weighted error: $$\epsilon_t = \sum_{i=1}^n w_i^{(t)} \cdot I(y_i \neq h_t(x_i))$$ This is not simple accuracy, it's weighted error.
Step 4: Compute model weight: $$\alpha_t = \frac{1}{2} \ln \left( \frac{1 - \epsilon_t}{\epsilon_t} \right)$$ Interpretation:
- If error is small $\rightarrow \alpha_t$ is large $\rightarrow$ strong learner
- If error is $\sim 0.5 \rightarrow \alpha_t \approx 0 \rightarrow$ useless learner
- If error is close to 1 $\rightarrow \alpha_t \approx -\infty \rightarrow$ overfitting - giving more weight to the mistakes.
Step 5: Update sample weights: $$w_i^{(t+1)} = w_i^{(t)} \cdot \exp\left(-\alpha_t y_i h_t(x_i)\right)$$ This is the key idea:
- If correctly classfied $\rightarrow y_i h_t(x_i) = +1 \rightarrow $ weight decreases.
- If misclassified $\rightarrow y_i h_t(x_i) = -1 \rightarrow $ weight increases.
Step 6: Normalize weights: $$w_i^{(t+1)} = \frac{w_i^{(t+1)}}{\sum_{j=1}^n w_j^{(t+1)}}$$ so weights remain a probability distribution.
Step 7: Repeat until desired accuracy or max iterations reached.
Step 8: Final prediction: $$y_{final}(x) = \text{sign} \left( \sum_{t=1}^{T} \alpha_t h_t(x) \right)$$ so:
- Each weak learner votes
- Stronger learners (low error) get higher weights
- Final prediction is weighted majority vote
Step 6: Intuition Behind the Math:
Why exponential update?

The term: $\exp(-\alpha_t y_i h_t(x_i))$

creates:
- smooth penalty
- stronger updates for misclassified points
- connections to exponential loss minimization
AdaBoost minimizes this loss:
$$\sum_{i=1}^n \exp\left(-y_i F(x_i)\right)$$ where $F(x)$ is the final predictor and is given by $F(x) =\sum_{t=1}^T \alpha_t h_t(x)$. So AdaBoost is actually doing stage-wise additive modeling.

For a example project, checkout the link: Adaboost - Gradient Boosting comparison project

📘 Next Chapter: Decision Trees — The Building Blocks of Gradient Boosting. We will dive into how trees split data, the impurity measures (Gini, entropy, MSE), and how the split-finding algorithm works — the foundation for understanding XGBoost and LightGBM internals.

References & Further Reading

Friedman, J. H. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5), 1189–1232.
Friedman, J. H. (1999). Stochastic Gradient Boosting. Computational Statistics & Data Analysis.
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proc. 22nd ACM SIGKDD.
Ke, G., et al. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems (NeurIPS).
Prokhorenkova, L., et al. (2018). CatBoost: Unbiased Boosting with Categorical Features. NeurIPS.
Freund, Y., & Schapire, R. E. (1997). A Decision-Theoretic Generalization of On-Line Learning. JCSS, 55(1), 119–139.
The Elements of Statistical Learning — Chapter 10: Boosting and Additive Trees.
My GitHub: Machine Learning repository

Arun