Handling Missing Values and Outliers #137

RB137 · 2024-10-19T20:33:41Z

Is this a unique feature?

I have checked "open" AND "closed" issues and this is not a duplicate

Is your feature request related to a problem/unavailable functionality? Please describe.

Feature Request: Handling Missing Values and Outliers

Problem
Fine-tuning models is challenging due to missing values and outliers, which affect performance and prediction accuracy.

Request
Provide better support for detecting and handling missing values and outliers during data preprocessing for fine-tuning.

Proposed Solution

Proposed Solution
Add automatic or customizable ways to handle missing values and outliers during preprocessing. This could include:

Missing values: options for filling in missing data (like mean or median) or let users choose custom ways to handle them.
Outliers: tools to detect and deal with outliers (like Z-score or IQR) and methods to either remove or adjust them.
Easy to use: Make these features simple to access while fine-tuning, so users don’t have to spend a lot of time on data cleaning.
This would improve model performance and save time for users during fine-tuning.

Screenshots

Do you want to work on this issue?

Yes

If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)

Implementation Plan

Use pandas and scikit-learn for missing values. For example, SimpleImputer to fill missing data:
from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='mean') data = imputer.fit_transform(data)
Detect outliers using Z-score or IQR methods with scipy:
from scipy import stats z_scores = stats.zscore(data) outliers = (z_scores > 3)
Integrate these into a preprocessing pipeline with scikit-learn's Pipeline for easy fine-tuning.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-19T20:34:06Z

Ensure the issue is not similar or previously being worked on.Thanks for your time

RB137 · 2024-10-19T20:39:36Z

@rohitinu6
Kindly assign me this task, as I am familiar with handling it.

RB137 · 2024-10-19T20:42:53Z

Kindly add the gssoc-ext, hacktoberfest, and hacktoberfest-accepted labels to this issue. @rohitinu6

rohitinu6 · 2024-10-20T04:38:28Z

@RB137 all the best
please ensure to star the repo

RB137 added the enhancement New feature or request label Oct 19, 2024

rohitinu6 assigned RB137 Oct 20, 2024

rohitinu6 added gssoc-ext GSSoC'24 Extended Version hacktoberfest-accepted Hacktoberfest 2024 level2 25 Points 🥈(GSSoC) hacktoberfest Hacktober Collaboration labels Oct 20, 2024

RB137 mentioned this issue Oct 20, 2024

Performed EDA: Handled Missing Values and Outliers in Stock Price Data #148

Merged

rohitinu6 added level1 10 Points 🥇(GSSoC) and removed level2 25 Points 🥈(GSSoC) labels Oct 21, 2024

Mayureshd-18 closed this as completed in #148 Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling Missing Values and Outliers #137

Handling Missing Values and Outliers #137

RB137 commented Oct 19, 2024

github-actions bot commented Oct 19, 2024

RB137 commented Oct 19, 2024

RB137 commented Oct 19, 2024

rohitinu6 commented Oct 20, 2024

Handling Missing Values and Outliers #137

Handling Missing Values and Outliers #137

Comments

RB137 commented Oct 19, 2024

Is this a unique feature?

Is your feature request related to a problem/unavailable functionality? Please describe.

Proposed Solution

Screenshots

Do you want to work on this issue?

If "yes" to above, please explain how you would technically implement this (issue will not be assigned if this is skipped)

github-actions bot commented Oct 19, 2024

RB137 commented Oct 19, 2024

RB137 commented Oct 19, 2024

rohitinu6 commented Oct 20, 2024