-
Notifications
You must be signed in to change notification settings - Fork 8
/
numeric-maxabs.qmd
107 lines (80 loc) · 2.83 KB
/
numeric-maxabs.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
pagetitle: "Feature Engineering A-Z | Max Abs Scaling"
---
# Max Abs Scaling {#sec-numeric-maxabs}
::: {style="visibility: hidden; height: 0px;"}
## Max Abs Scaling
:::
The **Max-Abs** scaling method works by making sure that the training data lies within the range $[-1, 1]$ by applying the following formula
$$
X_{scaled} = \dfrac{X}{\text{max}(\text{abs}(X))}
$$ {#eq-maxabs-minimal}
This is similar to the scaling we saw in @sec-numeric-normalization.
And we see that the only difference is whether we are aiming for the statistical properly (standard deviation of 1) or a specific decision (dividing by the largest quantity seen).
This method is a learned transformation.
So we use the training data to derive the right value of $\text{max}(\text{abs}(X))$ and then this value is used to perform the transformations when applied to new data.
For this, there is no specific guidance as to which method you want to use and you need to look at your data and see what works best.
## Pros and Cons
### Pros
- Fast calculations
- Transformation can easily be reversed, making its interpretations easier on the original scale
- Doesn't affect sparsity
- Can be used on a zero variance variable. Doesn't matter much since you likely should get rid of it
### Cons
- Is highly affected by outliers
## R Examples
We will be using the `ames` data set for these examples.
```{r}
#| label: set-seed
#| echo: false
set.seed(1234)
# To avoid changing recipe ID columns
```
```{r}
#| label: show-data
#| message: false
# remotes::install_github("emilhvitfeldt/extrasteps")
library(recipes)
library(extrasteps)
library(modeldata)
data("ames")
ames |>
select(Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
```
We will be using the `step_maxabs()` step for this, and it can be found in the [extrasteps extension package](https://github.com/EmilHvitfeldt/extrasteps/).
```{r}
#| label: step_maxabs
maxabs_rec <- recipe(Sale_Price ~ ., data = ames) |>
step_maxabs(all_numeric_predictors()) |>
prep()
maxabs_rec |>
bake(new_data = NULL, Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
```
We can also pull out what the max values were for each variable using `tidy()`
```{r}
#| label: tidy
maxabs_rec |>
tidy(1)
```
## Python Examples
```{python}
#| label: python-setup
#| echo: false
import pandas as pd
from sklearn import set_config
set_config(transform_output="pandas")
pd.set_option('display.precision', 3)
```
We are using the `ames` data set for examples.
{sklearn} provided the `MaxAbsScaler()` method we can use.
```{python}
#| label: maxabsscaler
from feazdata import ames
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MaxAbsScaler
ct = ColumnTransformer(
[('maxabs', MaxAbsScaler(), ['Sale_Price', 'Lot_Area', 'Wood_Deck_SF', 'Mas_Vnr_Area'])],
remainder="passthrough")
ct.fit(ames)
ct.transform(ames)
```