Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Missing Values and Outliers #137

Closed
1 task done
RB137 opened this issue Oct 19, 2024 · 4 comments · Fixed by #148
Closed
1 task done

Handling Missing Values and Outliers #137

RB137 opened this issue Oct 19, 2024 · 4 comments · Fixed by #148
Assignees
Labels
enhancement New feature or request gssoc-ext GSSoC'24 Extended Version hacktoberfest Hacktober Collaboration hacktoberfest-accepted Hacktoberfest 2024 level1 10 Points 🥇(GSSoC)

Comments

@RB137
Copy link
Contributor

RB137 commented Oct 19, 2024

Is this a unique feature?

  • I have checked "open" AND "closed" issues and this is not a duplicate

Is your feature request related to a problem/unavailable functionality? Please describe.

Feature Request: Handling Missing Values and Outliers

Problem
Fine-tuning models is challenging due to missing values and outliers, which affect performance and prediction accuracy.

Screenshot 2024-10-20 015427

Request
Provide better support for detecting and handling missing values and outliers during data preprocessing for fine-tuning.

Proposed Solution

Proposed Solution
Add automatic or customizable ways to handle missing values and outliers during preprocessing. This could include:

Missing values: options for filling in missing data (like mean or median) or let users choose custom ways to handle them.
Outliers: tools to detect and deal with outliers (like Z-score or IQR) and methods to either remove or adjust them.
Easy to use: Make these features simple to access while fine-tuning, so users don’t have to spend a lot of time on data cleaning.
This would improve model performance and save time for users during fine-tuning.

Screenshots

image

Do you want to work on this issue?

Yes

If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)

Implementation Plan

  1. Use pandas and scikit-learn for missing values. For example, SimpleImputer to fill missing data:
    from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data = imputer.fit_transform(data)
  2. Detect outliers using Z-score or IQR methods with scipy:
    from scipy import stats z_scores = stats.zscore(data) outliers = (z_scores > 3)
  3. Integrate these into a preprocessing pipeline with scikit-learn's Pipeline for easy fine-tuning.
@RB137 RB137 added the enhancement New feature or request label Oct 19, 2024
Copy link
Contributor

Ensure the issue is not similar or previously being worked on.Thanks for your time

@RB137
Copy link
Contributor Author

RB137 commented Oct 19, 2024

@rohitinu6
Kindly assign me this task, as I am familiar with handling it.

@RB137
Copy link
Contributor Author

RB137 commented Oct 19, 2024

Kindly add the gssoc-ext, hacktoberfest, and hacktoberfest-accepted labels to this issue. @rohitinu6

@rohitinu6
Copy link
Owner

@RB137 all the best
please ensure to star the repo

@rohitinu6 rohitinu6 added gssoc-ext GSSoC'24 Extended Version hacktoberfest-accepted Hacktoberfest 2024 level2 25 Points 🥈(GSSoC) hacktoberfest Hacktober Collaboration labels Oct 20, 2024
@rohitinu6 rohitinu6 added level1 10 Points 🥇(GSSoC) and removed level2 25 Points 🥈(GSSoC) labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request gssoc-ext GSSoC'24 Extended Version hacktoberfest Hacktober Collaboration hacktoberfest-accepted Hacktoberfest 2024 level1 10 Points 🥇(GSSoC)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants