Handling Missing Values and Outliers #137
Labels
enhancement
New feature or request
gssoc-ext
GSSoC'24 Extended Version
hacktoberfest
Hacktober Collaboration
hacktoberfest-accepted
Hacktoberfest 2024
level1
10 Points 🥇(GSSoC)
Is this a unique feature?
Is your feature request related to a problem/unavailable functionality? Please describe.
Feature Request: Handling Missing Values and Outliers
Problem
Fine-tuning models is challenging due to missing values and outliers, which affect performance and prediction accuracy.
Request
Provide better support for detecting and handling missing values and outliers during data preprocessing for fine-tuning.
Proposed Solution
Proposed Solution
Add automatic or customizable ways to handle missing values and outliers during preprocessing. This could include:
Missing values: options for filling in missing data (like mean or median) or let users choose custom ways to handle them.
Outliers: tools to detect and deal with outliers (like Z-score or IQR) and methods to either remove or adjust them.
Easy to use: Make these features simple to access while fine-tuning, so users don’t have to spend a lot of time on data cleaning.
This would improve model performance and save time for users during fine-tuning.
Screenshots
Do you want to work on this issue?
Yes
If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)
Implementation Plan
from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data = imputer.fit_transform(data)
from scipy import stats z_scores = stats.zscore(data) outliers = (z_scores > 3)
The text was updated successfully, but these errors were encountered: