filter explication

tryolabs · Jun 5, 2024 · 95193f8 · 95193f8
1 parent 22d9fae
commit 95193f8
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/documentation/user_guide.md b/documentation/user_guide.md
@@ -149,7 +149,7 @@ A list of all steps you can define in the `config.json` can be found in the foll
 | CalculateFeaturesStep  | ✅       | ✅         | Extract date features from columns that contain dates                                                     | `datetime_columns: Optional[List[str]]` List of columns, that contain dates, `features: Optional[List[str]]` List of features, that should be extracted from `datetime_columns`. Choose from "year”, "month", "day", "hour", "minute", "second", "weekday", "dayofyear”                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
 | CalculateMetricsStep   | ✅       | ❌         | Calculate a set of regression metrics: MAE, RMSE, R², Mean Error, Max Error, Median Absolute Error        |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
 | CalculateReportsStep   | ✅       | ❌         | Calculate SHAP values and feature importances                                                             | `max_samples: int = 1000` Maximum number of samples used for the calculations of the report. Default value is set to 1000.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |     |
-| CleanStep              | ✅       | ✅         | Clean the data: fill missing values, remove outliers, convert types, drop missing values, drop rows by id | `fill_missing: Optional[dict]` Dictionary, where the key defines the column and the value the value that should be filled in for a missing value, `remove_outliers: Optional[dict]` Dictionary, where the key defines the column and the value, the method that should be used to remove outliers. Choose between: clip: clips outliers by 0.25/0.75 percentile, drop: drops outliers that are lower/higher than 0.25/0.75 percentile, `convert_dtypes: Optional[dict]` Dictionary, where the key defines the column and the value of the type that column should be converted to, `drop_na_columns: Optional[list]` List of columns which missing values define rows to be dropped, `drop_ids: Optional[dict]` Dictionary where the key defines the column and the value the value to be dropped, `filter: Optional[dict]` Dictionary, where the key defines the column and the value a rule based on which the data is filtered. The rule must be an expression interpretable by a Pandas query. |
+| CleanStep              | ✅       | ✅         | Clean the data: fill missing values, remove outliers, convert types, drop missing values, drop rows by id | `fill_missing: Optional[dict]` Dictionary, where the key defines the column and the value the value that should be filled in for a missing value, `remove_outliers: Optional[dict]` Dictionary, where the key defines the column and the value, the method that should be used to remove outliers. Choose between: clip: clips outliers by 0.25/0.75 percentile, drop: drops outliers that are lower/higher than 0.25/0.75 percentile, `convert_dtypes: Optional[dict]` Dictionary, where the key defines the column and the value of the type that column should be converted to, `drop_na_columns: Optional[list]` List of columns which missing values define rows to be dropped, `drop_ids: Optional[dict]` Dictionary where the key defines the column and the value the value to be dropped, `filter: Optional[dict]` Dictionary, where the value a rule based on which the data is filtered. The rule must be an expression interpretable by a Pandas query. |
 | EncodeStep             | ✅       | ✅         | Encode categorical features using ordinal encoding. Numerical features are not affected by this step.     | `cardinality_threshold: int = 5` Threshold to split features in low and high cardinality. Default set to 5, `feature_encoders: Optional[dict]` Dictiionary, where the key contains the feature and the value the encoder that should be used. Choose between “OrdinalEncoder” and “TargetEncoder”. If feature_encoders is not provided default encoders based on cardinality are used. features with low cardinality use the `OrdinalEncoder`, features with high cardinality use the `TargetEndoder`.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
 | ExplainerDashboardStep | ✅       | ❌         | Calculate SHAP values.                                                                                    | `max_samples: int = 1000` Maximum number of samples used for the calculations. Default set to 1000, `X_background_samples: int = 100` length of background dataset, `enable_step: bool = True` If set to False this step is not executed.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
 | ModelStep              | ✅       | ✅         | Fit the model.                                                                                            | `model_class: Type[Model]` The model to be used for fitting, `model_parameters: Optional[dict]` Dictionary containing the model parameters, `optuna_params: Optional[dict]` Dictionary containing parameters for Optuna, `save_path: Optional[str]` Path where results are saved.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |