numeric-maxabs.qmd

---
pagetitle: "Feature Engineering A-Z | Max Abs Scaling"
---

# Max Abs Scaling {#sec-numeric-maxabs}

::: {style="visibility: hidden; height: 0px;"}
## Max Abs Scaling
:::

The **Max-Abs** scaling method works by making sure that the training data lies within the range $[-1, 1]$ by applying the following formula

$$
X_{scaled} = \dfrac{X}{\text{max}(\text{abs}(X))}
$$ {#eq-maxabs-minimal}

This is similar to the scaling we saw in @sec-numeric-normalization.
And we see that the only difference is whether we are aiming for the statistical properly (standard deviation of 1) or a specific decision (dividing by the largest quantity seen).
This method is a learned transformation.
So we use the training data to derive the right value of $\text{max}(\text{abs}(X))$ and then this value is used to perform the transformations when applied to new data.
For this, there is no specific guidance as to which method you want to use and you need to look at your data and see what works best.

## Pros and Cons

### Pros

-   Fast calculations
-   Transformation can easily be reversed, making its interpretations easier on the original scale
-   Doesn't affect sparsity
-   Can be used on a zero variance variable. Doesn't matter much since you likely should get rid of it

### Cons

-   Is highly affected by outliers

## R Examples

We will be using the `ames` data set for these examples.

```{r}
#| label: set-seed
#| echo: false
set.seed(1234)
# To avoid changing recipe ID columns
```

```{r}
#| label: show-data
#| message: false
# remotes::install_github("emilhvitfeldt/extrasteps")
library(recipes)
library(extrasteps)
library(modeldata)
data("ames")

ames |>
  select(Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
```

We will be using the `step_maxabs()` step for this, and it can be found in the [extrasteps extension package](https://github.com/EmilHvitfeldt/extrasteps/).

```{r}
#| label: step_maxabs
maxabs_rec <- recipe(Sale_Price ~ ., data = ames) |>
  step_maxabs(all_numeric_predictors()) |>
  prep()

maxabs_rec |>
  bake(new_data = NULL, Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
```

We can also pull out what the max values were for each variable using `tidy()`

```{r}
#| label: tidy
maxabs_rec |>
  tidy(1)
```

## Python Examples

```{python}
#| label: python-setup
#| echo: false
import pandas as pd
from sklearn import set_config

set_config(transform_output="pandas")
pd.set_option('display.precision', 3)
```

We are using the `ames` data set for examples.
{sklearn} provided the `MaxAbsScaler()` method we can use.

```{python}
#| label: maxabsscaler
from feazdata import ames
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MaxAbsScaler

ct = ColumnTransformer(
    [('maxabs', MaxAbsScaler(), ['Sale_Price', 'Lot_Area', 'Wood_Deck_SF',  'Mas_Vnr_Area'])], 
    remainder="passthrough")

ct.fit(ames)
ct.transform(ames)
```