Boosting in Machine Learning: Adaboost

Tahera Firdose
5 min readNov 1, 2023

--

AdaBoost, short for Adaptive Boosting, stands as one of the pioneering ensemble learning algorithms. Its approach is simple yet powerful: adjust the weights of data points based on how they’re classified, and train multiple weak learners sequentially. In this blog, we delve deep into the mechanics of AdaBoost.

The Essence of AdaBoost

Boosting, as a strategy, focuses on improving the accuracy of any machine learning algorithm by allowing it to learn from its mistakes. AdaBoost does this by adjusting the weight of training examples. Those that are misclassified gain more weight, prompting the subsequent weak learner to focus more on them.

How AdaBoost Works: Step-by-Step

  1. Initialization: Every data point is assigned an equal weight.
  2. Training the Weak Learner: A weak learner (typically a decision tree with a single split, known as a “stump”) is trained on the data.
  3. Calculate Error: The weighted error rate of the weak learner is calculated. This rate indicates how well (or poorly) the model did.
  4. Determine Amount of Say: Depending on its accuracy, the weak learner is assigned a “say” in the final decision. More accurate learners have more say, and vice-versa.
  5. Adjust Weights: Increase weights for misclassified points and decrease for the correctly classified ones.
  6. Normalize Weights: Ensure that the sum of weights remains consistent.
  7. Repeat: Steps 2 to 6 are repeated for the predetermined number of iterations or until perfect accuracy is achieved.
  8. Final Model: The final model aggregates predictions from individual weak learners based on their respective “says” to produce the final output.

Let’s consider a simplified example to explain the AdaBoost algorithm. Assume we have a small dataset and we are performing a binary classification (Yes or No).

Here’s our dataset:

  1. Initialization: All data points are assigned equal initial weights. we have five data points A, B, C, D, and E, so each point gets a weight of 0.2.

2. Train the Weak Learner: Using these weights, we train our weak learner. Suppose it misclassifies points B and D.

3. Compute the Weighted Error Rate (ε) of the Weak Learner: Using the weights of the misclassified points, we find:

4. Compute the Learner’s Weight (α):

Plugging in the value of ε we get:

This value of α represents the amount of say this weak learner will have in the final decision.

5. Update the Data Weights:

For misclassified points (B and E):

For correctly classified points:

Now we’ll adjust the weights of each data point.

Before we can train another weak learner, we need to normalize the updated weights so that they sum up to 1.

From our updated weights:

Sum of updated weights: 0.18 + 0.24 + 0.18 + 0.18 + 0.24 = 1.02

To normalize:

Training the Next Weak Learner

To train the next weak learner, the normalized weights will act as a sampling distribution.

Sampling Method:

  • The algorithm will sample the training data based on these normalized weights.
  • This is often done using a method known as “weighted random sampling with replacement”. Here’s a simplified way of explaining this method: imagine you have a bag with balls representing each data point. The number of balls in the bag for each data point corresponds to its weight. You will then draw a ball (data point) from this bag, record it, and then put it back in the bag. You repeat this process to generate your sampled dataset.

Building the Sampled Dataset:

  • For our data, data points B and E (with higher weights) have a higher chance of appearing multiple times in the sampled dataset because of their increased normalized weight.
  • Let’s say we want to draw a sample of 5 points for training. It’s possible to have a dataset like this: [B, B, E, A, C], where point B and E are sampled more than once.

Training on Sampled Dataset:

  • The weak learner (it could be a decision tree, logistic regression, or any other classifier) is then trained on this sampled dataset.
  • Due to the repeated instances of the misclassified points, the learner is “forced” to try and classify them correctly. In essence, it places more emphasis on them.

Advantages of AdaBoost

  1. Versatility: AdaBoost can be used with various base learners, not just decision stumps.
  2. Performance: Often provides comparable accuracy to more complex algorithms, especially when data is imbalanced.
  3. Low Overfitting Risk: Due to the emphasis on weak learners.

Limitations of AdaBoost

  1. Noisy Data: AdaBoost can be sensitive to noisy data and outliers since it focuses on misclassifications.
  2. Time Intensive: The sequential nature means it can’t be parallelized as efficiently as some other algorithms.

Conclusion:

AdaBoost, with its elegant approach to boosting, has cemented its position as a staple in the machine learning toolkit. By combining the outputs of multiple weak learners, AdaBoost often crafts a model that outperforms each individual component. While it has its limitations, understanding and leveraging AdaBoost can provide significant value in various data scenarios.

--

--

Tahera Firdose
Tahera Firdose

Written by Tahera Firdose

Datascience - Knowledge grows exponentially when it is shared

No responses yet