Index

Feature transformation

Feature transformation

Feature transformation is simply a function that transforms features from one representation to another.

Why would we need features transformation in the first place?

Well there are many reasons, such as:
- Data types are not suitable to be fed into a machine learning algorithm, e.g. text, categories
- Feature values may cause problems during the learning process, e.g. data represented in different scales
- We want to reduce the number of features to plot and visualize data, speed up training or improve the accuracy of a specific model
- Linear Regression uses Gradient Descent for its calculation of Global Minima thats needs a even scaled data for better a prediction.
- Algorithms like KNN,K Means,Hierarichal Clustering use Eucledian Distance, Manhattan Distance etc. for the peorpose of mesure the distance between observations, They need scaled data too for the better prediction.
- Deep Learning Techniques needs Standardization, Scaling they are also Feature Transformation techniques.

Feature Scaling

In practice, different types of variables are often encountered in the same data set.
A significant issue is that the range of the variables may differ a lot.
Using the original scale may put more weight on the variables with a large range.
In order to deal with this problem, the technique of features rescaling need to be applied to independent variables or features of data in the step of data preprocessing.
The terms "normalization" and "standardization" are sometimes used interchangeably, but they usually refer to different things.
The goal of applying feature scaling is to make sure features are on almost the same scale so that each feature is equally important and make it easier to process by most machine-learning algorithms

Where does Feature Scaling is needed?

Some machine learning models are fundamentally based on distance matrix, also known as the distance-based classifier, for example, K-Nearest-Neighbours, SVM, and Neural Network.
Feature scaling is extremely essential to those models, especially when the range of the features is very different.
Otherwise, features with a large range will have a large influence in computing the distance.

Why Standardisation is better than Min-Max Normalization

Standardisation is more robust to outliers and in many cases, it is preferable over Max-Min Normalisation.
In contrast to standardisation, we will obtain smaller standard deviations through the process of Max-Min Normalisation.
Max-Min Normalisation typically allows us to transform the data with varying scales so that no specific dimension will dominate the statistics, and it does not require making a very strong assumption about the distribution of the data, such as k-nearest neighbours and artificial neural networks.
However, Normalisation does not treat outliners very well.
On the contrary, standardisation allows users to better handle the outliers and facilitate convergence for some computational algorithms like gradient descent.
Therefore, we usually prefer standardisation over Min-Max Normalisation.

Note: If an algorithm is not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.).

Code for Feature Scaling

Data Standardization (Z-score normalization)

The result of Standardization (or Z-score normalization) is that the features will be rescaled to ensure the mean and the standard deviation are 0 and 1, respectively.
Data Standardization is a data processing workflow that converts the structure of disparate datasets into a Common Data Format.
Standardization comes into picture when features of input data set have large differences between their ranges, or simply when they are measured in different measurement units (e.g., Pounds, Meters, Miles … etc).
We try to bring all the variables or features to a similar scale. standarisation means centering the variable at zero.
$z=\frac{( x-x_{mean} )}{StandardDeviation}$
This technique to rescale features value with the distribution value between 0 and 1 is useful for the optimization algorithms, such as gradient descent, that are used within machine-learning algorithms that weight inputs (e.g., regression and neural networks).
Rescaling is also used for algorithms that use distance measurements, for example, K-nearest-neighbours (KNN).

Gaussian Distribution / Normal Distribution

In Machine learning or Deep Learning, some of the models such as Linear Regression, Logistic Regression, Artificial Neural Networks assume that features are normally distributed and can perform much better if the features provided to them during modeling are normally distributed.
A normal distribution is sometimes informally called a bell curve.
Samples of the Gaussian Distribution follow a bell-shaped curve and lies around the mean. The mean, median, and mode of Gaussian Distribution are the same.
In probability theory, a normal (or Gaussian) distribution is a type of continuous probability distribution for a real-valued random variable.

Various Transformations to change the distribution of features

Log Transformation
Square root Transformation
Reciprocal Transformation
Exponential Transformation
Box-Cox Transformation

How to check if a variable is following Normal Distribution?

Histogram
Q-Q plot
KDE plot
Skewness and Kurtosis
Five Number Summary

Normal Distribution

We call this Bell-shaped curve a Normal Distribution. Carl Friedrich Gauss discovered it so sometimes we also call it a Gaussian Distribution as well.
Bell Curve
The normal distribution is a core concept in statistics, the backbone of data science.

Properties of Normal Distribution

References

Analytics vidhya

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datatransformation.md

datatransformation.md

Index

Feature transformation

Why would we need features transformation in the first place?

Feature Scaling

Where does Feature Scaling is needed?

Why Standardisation is better than Min-Max Normalization

Note: If an algorithm is not distance-based, feature scaling is unimportant, including `Naive Bayes`, `Linear Discriminant Analysis`, and `Tree-Based models` (gradient boosting, random forest, etc.).

Code for Feature Scaling

Data Standardization (Z-score normalization)

Gaussian Distribution / Normal Distribution

Various Transformations to change the distribution of features

How to check if a variable is following Normal Distribution?

Normal Distribution

Properties of Normal Distribution

References

Files

datatransformation.md

Latest commit

History

datatransformation.md

File metadata and controls

Index

Feature transformation

Why would we need features transformation in the first place?

Feature Scaling

Where does Feature Scaling is needed?

Why Standardisation is better than Min-Max Normalization

Note: If an algorithm is not distance-based, feature scaling is unimportant, including Naive Bayes, Linear Discriminant Analysis, and Tree-Based models (gradient boosting, random forest, etc.).

Code for Feature Scaling

Data Standardization (Z-score normalization)

Gaussian Distribution / Normal Distribution

Various Transformations to change the distribution of features

How to check if a variable is following Normal Distribution?

Normal Distribution

Properties of Normal Distribution

References

Note: If an algorithm is not distance-based, feature scaling is unimportant, including `Naive Bayes`, `Linear Discriminant Analysis`, and `Tree-Based models` (gradient boosting, random forest, etc.).