Project | Libraries | Module |
---|---|---|
Profitable App Profiles for the App Store and Google Play Markets | csv reader |
Python for Data Science: Fundamentals |
Exploring Hacker News Posts. | csv datetime |
Python for Data Science: Intermediate |
Exploring eBay Car Sales Data | pandas matplotlib, datetime | Pandas and NumPy Fundamentals |
Finding Heavy Traffic Indicators on I-94 | pandas matplotlib datetime |
Data Visualization Fundamentals |
Storytelling Data Visualization on Exchange Rates | pandas matplotlib datetime numpy |
Storytelling Data Visualization and Information Design |
Analyzing CIA Factbook Data Using SQL | sqlite | SQL Fundamentals |
Clean and Analyze Employee Exit Surveys | pandas matplotlib numpy |
Data Cleaning and Analysis |
Investigating Fandango Movie Ratings | pandas, matplotlib, numpy | Statistics Fundamentals |
Predicting car prices using attributes using machine learning | pandas matplotlib numpy KNeighborsRegressor mean_squared_error |
Machine Learning Fundamentals |
Data Cleaning
- Remove innacurate data
- Remove duplicates
- Remove non english apps
- Isolate free apps
Data analysis
- Most popular apps by genre on App store
- Most popular apps by genre on Google play store
Libraries used :| csv, reader
Objective is to compare between Ask HN and Show HN posts based on the frequency during different hours.
Libraries used : csv, datetime
The aim of this project is to clean the data and analyze the included used car listings
- Explore date values
- Explore registration year
- Explore Price by brand
- Explore Mileage by brand
Libraries used : pandas, matplotlib, datetime
Aim : Using the dataset of I-94 Interstate highway we are going to look at the factors that causes slowness in traffic in this notebook
- Split the traffic to day and night data to analyse futher on the influence for traffic
- Split the data to different months and analyse if the traffic slowness is different for each month
- Analyse traffic slowness for different weather conditions
Libraries used : pandas, matplotlib, datetime
Aim : The dataset we'll use describes Euro daily exchange rates between 1999 and 2021. Look at the impact of black swan events .
- Mark vertical lines for the event days
- Data story visualization based on the price movements
Libraries used : pandas, matplotlib, datetime, numpy
- Query the database for insights
Libraries used : sqlite
Aim : Understand why employees are resigning .Is there some kind of a dissatisfaction and is the trend common between experienced and new employees?
-
Explored the data and figured out how to prepare it for analysis
-
Corrected some of the missing values
-
Dropped any data not needed for our analysis
-
Renamed our columns
-
Verified the quality of our data
-
Created a new institute_service column
-
Cleaned the Contributing Factors columns
-
Created a new column indicating if an employee resigned because they were dissatisfied in some way
-
Combined the data
-
Cleaned the institute_service column
-
Handled the missing values in the dissatisfied column
-
Aggregated the data
Libraries used : pandas,matplotlib,numpy
In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator). He published his analysis in this article — a great piece of data journalism that's totally worth reading.
Aim : We are analysing the recent movie ratings data to determine if Fadango's rating system has had any change unlike from 2015.
-
Collected the datasets
-
Classify the datasets based on the year
-
Plot the distribution for year 2015 and 2016
-
Calculate the summary metrics and plot the same for the years 2015 and 2016
Libraries used : pandas,matplotlib,numpy
Using the attributes available from the dataset (UCI machine learning repository) we try to predict car prices. In this project I create a model using KNN algorithm .
- Univariate model
- Multivariate model
- Hyperparameter tuning