-
-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added regression algorithm #1752
Merged
ajay-dhangar
merged 2 commits into
ajay-dhangar:main
from
KGupta2601:regression-algorithm
Nov 4, 2024
Merged
Changes from 1 commit
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
--- | ||
id: regression | ||
title: Regression Algorithm (Supervised learning) | ||
sidebar_label: Regression Algorithms | ||
description: "In this post, we’ll explore the concept of regression in supervised learning, a fundamental approach used for predicting continuous outcomes based on input features." | ||
tags: [machine learning, algorithms, supervised learning, regression] | ||
--- | ||
|
||
### Definition: | ||
**Regression** is a type of supervised learning algorithm used to predict continuous outcomes based on one or more input features. The model learns from labeled training data to establish a relationship between the input variables and the target variable. | ||
|
||
<AdsComponent /> | ||
|
||
### Characteristics: | ||
- **Continuous Output**: | ||
Regression algorithms are used when the output variable is continuous, such as predicting prices, temperatures, or scores. | ||
|
||
- **Predictive Modeling**: | ||
The primary goal is to create a model that can accurately predict numerical values for new, unseen data based on learned relationships. | ||
|
||
- **Evaluation Metrics**: | ||
Regression models are evaluated using metrics such as Mean Squared Error (MSE), R-squared (R²), and Root Mean Squared Error (RMSE). | ||
|
||
### Types of Regression Algorithms: | ||
1. **Linear Regression**: | ||
A simple approach that models the relationship between one or more independent variables and a dependent variable by fitting a linear equation. | ||
|
||
2. **Polynomial Regression**: | ||
Extends linear regression by fitting a polynomial equation to the data, allowing for more complex relationships. | ||
|
||
3. **Ridge and Lasso Regression**: | ||
Regularization techniques that add penalties to the loss function to prevent overfitting. Ridge uses L2 regularization, while Lasso uses L1 regularization. | ||
|
||
4. **Support Vector Regression (SVR)**: | ||
An extension of Support Vector Machines (SVM) that can be used for regression tasks by finding a hyperplane that best fits the data. | ||
|
||
5. **Decision Tree Regression**: | ||
Uses decision trees to model relationships between features and target values by splitting data into subsets based on feature values. | ||
|
||
6. **Random Forest Regression**: | ||
An ensemble method that combines multiple decision trees to improve prediction accuracy and control overfitting. | ||
|
||
<Ads /> | ||
|
||
### Steps Involved: | ||
1. **Input the Data**: | ||
The algorithm receives labeled training data consisting of features and corresponding target values. | ||
|
||
2. **Preprocess the Data**: | ||
Data cleaning and preprocessing steps may include handling missing values, normalizing or scaling features, and encoding categorical variables. | ||
|
||
3. **Split the Dataset**: | ||
The dataset is typically split into training and testing sets to evaluate model performance. | ||
|
||
4. **Select a Model**: | ||
Choose an appropriate regression algorithm based on the problem type and data characteristics. | ||
|
||
5. **Train the Model**: | ||
Fit the model to the training data using an optimization algorithm to minimize error. | ||
|
||
6. **Evaluate Model Performance**: | ||
Use metrics such as MSE or R² score to assess how well the model performs on unseen data. | ||
|
||
7. **Make Predictions**: | ||
Use the trained model to make predictions on new data points. | ||
|
||
<AdsComponent /> | ||
|
||
### Problem Statement: | ||
Given a labeled dataset with multiple features and corresponding continuous target values, the objective is to train a regression model that can accurately predict target values for new, unseen data based on learned patterns. | ||
|
||
### Key Concepts: | ||
- **Training Set**: | ||
The portion of the dataset used to train the model. | ||
|
||
- **Test Set**: | ||
The portion of the dataset used to evaluate model performance after training. | ||
|
||
- **Overfitting and Underfitting**: | ||
Overfitting occurs when a model learns noise in the training data rather than general patterns. Underfitting occurs when a model is too simple to capture underlying trends. | ||
|
||
- **Evaluation Metrics**: | ||
Metrics used to assess model performance include MSE for regression tasks and R² score for measuring explained variance. | ||
|
||
<Ads /> | ||
|
||
### Split Criteria: | ||
Regression algorithms typically split data based on minimizing prediction error or maximizing explained variance in predictions. | ||
|
||
### Time Complexity: | ||
- **Training Complexity**: | ||
Varies by algorithm; can range from linear time complexity for simple models like Linear Regression to polynomial time complexity for more complex models. | ||
|
||
- **Prediction Complexity**: | ||
Also varies by algorithm; some algorithms allow for faster predictions after training (e.g., linear models). | ||
|
||
### Space Complexity: | ||
- **Space Complexity**: | ||
Depends on how much information about the training set needs to be stored (e.g., decision trees may require more space than linear models). | ||
|
||
### Example: | ||
Consider a scenario where we want to predict house prices based on features such as size, number of bedrooms, and location. | ||
|
||
**Dataset Example:** | ||
|
||
| Size (sq ft) | Bedrooms | Price ($) | | ||
|---------------|----------|-----------| | ||
| 1500 | 3 | 300000 | | ||
| 2000 | 4 | 400000 | | ||
| 1200 | 2 | 250000 | | ||
| 1800 | 3 | 350000 | | ||
|
||
Step-by-Step Execution: | ||
|
||
1. **Input Data**: | ||
The model receives training data with features (size and bedrooms) and labels (price). | ||
|
||
2. **Preprocess Data**: | ||
Handle any missing values or outliers if necessary. | ||
|
||
3. **Split Dataset**: | ||
Divide the dataset into training and testing sets (e.g., 80% train, 20% test). | ||
|
||
4. **Select Model**: | ||
Choose an appropriate regression algorithm like Linear Regression. | ||
|
||
5. **Train Model**: | ||
Fit the model using the training set. | ||
|
||
6. **Evaluate Performance**: | ||
Use metrics like R² score or mean squared error on the test set. | ||
|
||
7. **Make Predictions**: | ||
Predict prices for new houses based on their features. | ||
|
||
<AdsComponent /> | ||
|
||
### Python Implementation: | ||
Here’s a basic implementation of Linear Regression using **scikit-learn**: | ||
|
||
```python | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.linear_model import LinearRegression | ||
from sklearn.metrics import mean_squared_error | ||
|
||
# Sample dataset | ||
data = { | ||
'Size': [1500, 2000, 1200, 1800], | ||
'Bedrooms': [3, 4, 2, 3], | ||
'Price': [300000, 400000, 250000, 350000] | ||
} | ||
|
||
# Convert to DataFrame | ||
import pandas as pd | ||
df = pd.DataFrame(data) | ||
|
||
# Features and target variable | ||
X = df[['Size', 'Bedrooms']] | ||
y = df['Price'] | ||
|
||
# Split dataset | ||
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) | ||
|
||
# Create Linear Regression model | ||
model = LinearRegression() | ||
|
||
# Train model | ||
model.fit(X_train, y_train) | ||
|
||
# Predict | ||
y_pred = model.predict(X_test) | ||
|
||
# Evaluate | ||
mse = mean_squared_error(y_test, y_pred) | ||
print(f"Mean Squared Error: {mse:.2f}") | ||
``` | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace
title: Regression Algorithm (Supervised learning)
withtitle: "Regression Algorithm (Supervised learning)"