Skip to content

pravvvv/Dataquest_Data_Scientist_Path

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Libraries Module
Profitable App Profiles for the App Store and Google Play Markets csv
reader
Python for Data Science: Fundamentals
Exploring Hacker News Posts. csv
datetime
Python for Data Science: Intermediate
Exploring eBay Car Sales Data pandas matplotlib, datetime Pandas and NumPy Fundamentals
Finding Heavy Traffic Indicators on I-94 pandas
matplotlib
datetime
Data Visualization Fundamentals
Storytelling Data Visualization on Exchange Rates pandas
matplotlib
datetime
numpy
Storytelling Data Visualization and Information Design
Analyzing CIA Factbook Data Using SQL sqlite SQL Fundamentals
Clean and Analyze Employee Exit Surveys pandas
matplotlib
numpy
Data Cleaning and Analysis
Investigating Fandango Movie Ratings pandas, matplotlib, numpy Statistics Fundamentals
Predicting car prices using attributes using machine learning pandas
matplotlib
numpy
KNeighborsRegressor
mean_squared_error
Machine Learning Fundamentals

Dataquest Projects

Data Cleaning

  1.   Remove innacurate data
  2.   Remove duplicates
  3.   Remove non english apps
  4.   Isolate free apps

Data analysis

  1.   Most popular apps by genre on App store
  2.   Most popular apps by genre on Google play store

    Libraries used :| csv, reader

  Objective is to compare between Ask HN and Show HN posts based on the frequency during different hours.

    Libraries used : csv, datetime

  The aim of this project is to clean the data and analyze the included used car listings

  1.   Explore date values
  2.   Explore registration year
  3.   Explore Price by brand
  4.   Explore Mileage by brand

    Libraries used : pandas, matplotlib, datetime

Aim : Using the dataset of I-94 Interstate highway we are going to look at the factors that causes slowness in traffic in this notebook

  1.   Split the traffic to day and night data to analyse futher on the influence for traffic
  2.   Split the data to different months and analyse if the traffic slowness is different for each month
  3.   Analyse traffic slowness for different weather conditions

    Libraries used : pandas, matplotlib, datetime

Aim : The dataset we'll use describes Euro daily exchange rates between 1999 and 2021. Look at the impact of black swan events .

  1.   Mark vertical lines for the event days
  2.   Data story visualization based on the price movements

    Libraries used : pandas, matplotlib, datetime, numpy

  1.   Query the database for insights

    Libraries used : sqlite

Aim : Understand why employees are resigning .Is there some kind of a dissatisfaction and is the trend common between experienced and new employees?

  • Explored the data and figured out how to prepare it for analysis

  • Corrected some of the missing values

  • Dropped any data not needed for our analysis

  • Renamed our columns

  • Verified the quality of our data

  • Created a new institute_service column

  • Cleaned the Contributing Factors columns

  • Created a new column indicating if an employee resigned because they were dissatisfied in some way

  • Combined the data

  • Cleaned the institute_service column

  • Handled the missing values in the dissatisfied column

  • Aggregated the data

        Libraries used : pandas,matplotlib,numpy

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator). He published his analysis in this article — a great piece of data journalism that's totally worth reading.

Aim : We are analysing the recent movie ratings data to determine if Fadango's rating system has had any change unlike from 2015.

  • Collected the datasets

  • Classify the datasets based on the year

  • Plot the distribution for year 2015 and 2016

  • Calculate the summary metrics and plot the same for the years 2015 and 2016

        Libraries used : pandas,matplotlib,numpy

Using the attributes available from the dataset (UCI machine learning repository) we try to predict car prices. In this project I create a model using KNN algorithm .

  • Univariate model
  • Multivariate model
  • Hyperparameter tuning

About

Running notes - Dataquest - Data Scientist in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published