Feature Scaling — Normalization and Standardization

7 min readMay 23, 2023

Machine learning algorithms often rely on the quality and distribution of input features to make accurate predictions. However, not all features are created equal. Feature scaling, a crucial preprocessing step, ensures that features are transformed into a consistent range, allowing machine learning models to perform optimally. In this blog post, we will explore the significance of feature scaling in machine learning, its impact on different algorithms, popular scaling techniques, and best practices to enhance model performance.

Understanding Feature Scaling:

What is feature scaling?

Feature scaling is a preprocessing technique used in machine learning to standardize or normalize the range of input features. It involves transforming the values of features to a specific range or distribution. The goal is to ensure that all features have a similar scale, which can help the machine learning algorithms perform more effectively.

Why is feature scaling important?

Feature scaling is important because many machine learning algorithms are sensitive to the scale of input features. When features have different scales or ranges, it can negatively impact the performance of these algorithms. For example, distance-based algorithms such as K-Nearest Neighbors (KNN) or Support Vector Machines (SVM) calculate distances between data points, and if the features have different scales, the distances may be dominated by the features with larger scales. This can lead to biased results and inaccurate predictions.

Impact of unscaled features on machine learning models.

When features are not scaled, it can have several negative impacts on machine learning models. Distance-based algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machines (SVM), may be dominated by features with larger scales, leading to biased decisions. Gradient-based algorithms, like Linear Regression, Logistic Regression, and Neural Networks, can experience slower convergence if features have different scales. Tree-based algorithms, such as Decision Trees and Random Forests, are generally robust to feature scaling but may use distance-based metrics for specific purposes.

The Impact of Feature Scaling on Different Algorithms:

1. Distance-based algorithms (K-Nearest Neighbors, Support Vector Machines, etc.): Distance-based algorithms rely on distance calculations between data points. Inconsistent feature scales can result in features with larger scales dominating the distances, leading to biased decisions and inaccurate predictions.

2. Gradient-based algorithms (Linear Regression, Logistic Regression, Neural Networks, etc.): Gradient-based algorithms optimize model parameters based on gradients. Inconsistent feature scales can slow down convergence and affect the overall performance of these algorithms.

3. Tree-based algorithms (Decision Trees, Random Forests, etc.): Tree-based algorithms are generally robust to feature scaling since they partition the feature space based on thresholds.

Popular Feature Scaling Techniques:

Standardization

Standardization, also known as Z-score normalization, is a feature scaling technique that transforms the data to have zero mean and unit variance

X_scaled = (X — mean(X)) / std(X)

In this formula, X represents the original feature values, mean(X) is the mean of the feature values, and std(X) is the standard deviation of the feature values.

Here’s an example code that demonstrates standardization.

In order to standardize your data, you can make use of the StandardScaler class from the sklearn library. By applying the StandardScaler to your dataset, you will be able to achieve the desired standardization. Here’s a demonstration of how you can implement the StandardScaler.

Before Applying Standard Scaler

After Applying Standard Scaler

By comparing the original data graph with the standardized data graph, the following changes can be observed:

• The scale of the features has been altered in the standardized data. The values on both the x-axis and y-axis are now in terms of standard deviations.

• The range of the data is compressed and centered around zero.

• The distribution of the standardized data is more symmetric compared to the original data.

The relationship between the features is preserved, but the data is now on a standardized scale.

Effects Of Outliers using Standardization: Adding the outliers to the dataframe.

When outliers are present in the data, they can have extreme values that are far from the mean. However, since standardization uses the mean and standard deviation of the entire dataset, including the outliers, the outliers themselves are not altered by standardization. Instead, their relative position with respect to the mean and standard deviation may change, but their absolute values remain the same.

Normalization

Normalization, also known as min-max scaling, is a feature scaling technique that rescales the data to a specific range, typically between 0 and 1. It is particularly useful when the feature values have different ranges and it’s necessary to bring them to a common scale. The formula for normalization is as follows:

X_normalized = (X — min(X)) / (max(X) — min(X))

In this formula, X represents the original feature values, min(X) is the minimum value of the feature, and max(X) is the maximum value of the feature.

Now, let’s apply normalization to a dataset using Python code:

To perform data normalization, you can import the MinMaxScaler class from the sklearn library and utilize it to transform your dataset. Let’s proceed with applying the MinMaxScaler to achieve normalization on our data.

Let’s examine the impact of normalization on our dataset. After applying normalization, we can observe that all the features now possess a minimum value of 0 and a maximum value of 1. This normalization process has successfully rescaled the data within the desired range.

The visualization allows us to observe the changes in the data distribution after Min-Max scaling. In the original data plot, we can see the density estimates of the original ‘Age’ and ‘Fare’ features. In the Min-Max scaled data plot, we can observe the density estimates of the corresponding scaled features.

By comparing the original and scaled data plots, we can observe that the Min-Max scaling process transforms the data distribution. The scaled data is compressed within the range of 0 to 1, and the density estimates are adjusted accordingly.

Difference between Standardization and Normalization

Conclusion

Feature scaling plays a vital role in preparing input data for machine learning models. By transforming features into a consistent range, we can improve model convergence, enhance prediction accuracy, and mitigate the impact of varying feature scales. Whether it is distance-based algorithms, gradient-based algorithms, or tree-based algorithms, understanding and implementing appropriate scaling techniques can significantly improve the performance of machine learning models. Remember, when it comes to feature scaling, consistency is key!

Refer to Code: https://github.com/taherafirdose/100-days-of-Machine-Learning/tree/master/Feature%20Scaling