Skip to content

Commit

Permalink
filter explication
Browse files Browse the repository at this point in the history
  • Loading branch information
Frauke Albrecht authored and froukje committed Jun 5, 2024
1 parent 22d9fae commit 95193f8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion documentation/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ A list of all steps you can define in the `config.json` can be found in the foll
| CalculateFeaturesStep | ✅ | ✅ | Extract date features from columns that contain dates | `datetime_columns: Optional[List[str]]` List of columns, that contain dates, `features: Optional[List[str]]` List of features, that should be extracted from `datetime_columns`. Choose from "year”, "month", "day", "hour", "minute", "second", "weekday", "dayofyear” |
| CalculateMetricsStep | ✅ | ❌ | Calculate a set of regression metrics: MAE, RMSE, R², Mean Error, Max Error, Median Absolute Error | |
| CalculateReportsStep | ✅ | ❌ | Calculate SHAP values and feature importances | `max_samples: int = 1000` Maximum number of samples used for the calculations of the report. Default value is set to 1000. | |
| CleanStep | ✅ | ✅ | Clean the data: fill missing values, remove outliers, convert types, drop missing values, drop rows by id | `fill_missing: Optional[dict]` Dictionary, where the key defines the column and the value the value that should be filled in for a missing value, `remove_outliers: Optional[dict]` Dictionary, where the key defines the column and the value, the method that should be used to remove outliers. Choose between: clip: clips outliers by 0.25/0.75 percentile, drop: drops outliers that are lower/higher than 0.25/0.75 percentile, `convert_dtypes: Optional[dict]` Dictionary, where the key defines the column and the value of the type that column should be converted to, `drop_na_columns: Optional[list]` List of columns which missing values define rows to be dropped, `drop_ids: Optional[dict]` Dictionary where the key defines the column and the value the value to be dropped, `filter: Optional[dict]` Dictionary, where the key defines the column and the value a rule based on which the data is filtered. The rule must be an expression interpretable by a Pandas query. |
| CleanStep | ✅ | ✅ | Clean the data: fill missing values, remove outliers, convert types, drop missing values, drop rows by id | `fill_missing: Optional[dict]` Dictionary, where the key defines the column and the value the value that should be filled in for a missing value, `remove_outliers: Optional[dict]` Dictionary, where the key defines the column and the value, the method that should be used to remove outliers. Choose between: clip: clips outliers by 0.25/0.75 percentile, drop: drops outliers that are lower/higher than 0.25/0.75 percentile, `convert_dtypes: Optional[dict]` Dictionary, where the key defines the column and the value of the type that column should be converted to, `drop_na_columns: Optional[list]` List of columns which missing values define rows to be dropped, `drop_ids: Optional[dict]` Dictionary where the key defines the column and the value the value to be dropped, `filter: Optional[dict]` Dictionary, where the value a rule based on which the data is filtered. The rule must be an expression interpretable by a Pandas query. |
| EncodeStep | ✅ | ✅ | Encode categorical features using ordinal encoding. Numerical features are not affected by this step. | `cardinality_threshold: int = 5` Threshold to split features in low and high cardinality. Default set to 5, `feature_encoders: Optional[dict]` Dictiionary, where the key contains the feature and the value the encoder that should be used. Choose between “OrdinalEncoder” and “TargetEncoder”. If feature_encoders is not provided default encoders based on cardinality are used. features with low cardinality use the `OrdinalEncoder`, features with high cardinality use the `TargetEndoder`. |
| ExplainerDashboardStep | ✅ | ❌ | Calculate SHAP values. | `max_samples: int = 1000` Maximum number of samples used for the calculations. Default set to 1000, `X_background_samples: int = 100` length of background dataset, `enable_step: bool = True` If set to False this step is not executed. |
| ModelStep | ✅ | ✅ | Fit the model. | `model_class: Type[Model]` The model to be used for fitting, `model_parameters: Optional[dict]` Dictionary containing the model parameters, `optuna_params: Optional[dict]` Dictionary containing parameters for Optuna, `save_path: Optional[str]` Path where results are saved. |
Expand Down

0 comments on commit 95193f8

Please sign in to comment.