Skip to content

Omdena-NIC-Nepal/omdena-nic-nepal-classroom-1f2b87-machine_learning_linear_regression-Machine_Learning_Linear_Regress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Machine Learning Beginner Project: Linear Regression

Project Title: Predicting House Prices Using Linear Regression

Objective: To introduce students to supervised learning, focusing on linear regression, by guiding them through a project that predicts house prices based on a variety of features.

Dataset: We'll use the "Boston Housing Dataset" from the UCI Machine Learning Repository. This dataset contains information about housing in Boston, including features such as the number of rooms, age of the house, and crime rate, along with the target variable, which is the median value of owner-occupied homes.

Project Structure:

  • Data Exploration and Preprocessing
  • Model Building and Training
  • Model Evaluation
  • Feature Engineering and Improvement
  • Collaboration and Version Control

├── data

│ └── boston_housing.csv

├── notebooks

│ ├── EDA.ipynb

│ ├── Data_Preprocessing.ipynb

│ ├── Model_Training.ipynb

│ └── Model_Evaluation.ipynb

├── scripts

│ ├── data_preprocessing.py

│ ├── train_model.py

│ └── evaluate_model.py

├── README.md

├── requirements.txt

└── .gitignore

2. Data Exploration and Preprocessing

Task 1: Data Exploration

Notebook: notebooks/EDA.ipynb Steps:

  • Load the dataset.
  • Explore the data structure, types, and summary statistics.
  • Visualize relationships between features and the target variable.
  • Identify missing values and outliers.

Task 2: Data Preprocessing

Notebook: notebooks/Data_Preprocessing.ipynb Steps:

  • Handle missing values and outliers.
  • Encode categorical variables.
  • Normalize/standardize numerical features.
  • Split the data into training and testing sets.
  • Script: scripts/data_preprocessing.py

3. Model Building and Training

Task 3: Model Training

Notebook: notebooks/Model_Training.ipynb Steps:

  • Choose appropriate features for the model.

  • Train a linear regression model.

  • Perform hyperparameter tuning (if applicable).

  • Script: scripts/train_model.py

4. Model Evaluation

Task 4: Model Evaluation

Notebook: notebooks/Model_Evaluation.ipynb Steps:

  • Evaluate the model using metrics such as Mean Squared Error (MSE), R-squared.
  • Plot residuals to check the assumptions of linear regression.
  • Compare model performance with different feature sets or preprocessing steps.
  • Script: scripts/evaluate_model.py

5. Feature Engineering and Improvement

Task 5: Feature Engineering

Notebook: notebooks/Feature_Engineering.ipynb Steps:

  • Create new features that might improve model performance.
  • Test different feature combinations.
  • Evaluate the impact of new features on model performance.

6. Collaboration and Version Control

Task 6: Collaboration and Version Control

Steps:

Ensure each student works on their branch.

  • Regularly merge branches after reviewing and resolving conflicts.
  • Use pull requests to review and discuss changes before merging.
  • Git Commands: Provide a list of essential Git commands for students to use.

Dataset Details

Source: Boston Housing Dataset on UCI Machine Learning Repository

Features:

  • CRIM: per capita crime rate by town.
  • ZN: proportion of residential land zoned for lots over 25,000 sq. ft.
  • INDUS: proportion of non-retail business acres per town.
  • CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise).
  • NOX: nitrogen oxides concentration (parts per 10 million).
  • RM: average number of rooms per dwelling.
  • AGE: proportion of owner-occupied units built prior to 1940.
  • DIS: weighted distances to five Boston employment centers.
  • RAD: index of accessibility to radial highways.
  • TAX: full-value property tax rate per $10,000.
  • PTRATIO: pupil-teacher ratio by town.
  • B: 1000(Bk - 0.63)^2 where Bk is the proportion of black residents by town.
  • LSTAT: percentage of lower status of the population.
  • MEDV: median value of owner-occupied homes in $1000s (Target).

About

omdena-nic-nepal-classroom-1f2b87-machine_learning_linear_regression-Machine_Learning_Linear_Regress created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published