Skip to content

VaishDeshpande234/Data-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 

Repository files navigation

Data-Portfolio

Welcome to my data portfolio!

About Me

I am a results-oriented data scientist with a strong academic background and practical experience in data analysis and engineering. I am proficient in Python, SQL, and have hands-on experience in machine learning and deep learning frameworks such as PyTorch and TensorFlow. My goal is to leverage data-driven insights to tackle complex challenges and drive strategic decision-making.

Below are details of the projects I've worked on:

Project 1: Analysis of Electric Vehicle Registrations in Washington State

Project Overview:

In this project, I conducted an exploratory data analysis (EDA) on electric vehicle (EV) registrations by the Washington State Department of Licensing (DOL) each month. The analysis aimed to uncover trends, identify missing or inconsistent data, and provide insights into the adoption of electric vehicles over time.

Key Highlights:

  • Data Overview: This dataset shows the number of vehicles registered each month, segmented by county and vehicle type.
  • Data Quality Checks: Identified missing values and inconsistent counts.
  • Key Findings: Trends in EV registrations over time and by county.
  • Conclusion: Insights into the growth of EV adoption in Washington State despite data challenges.

Technologies Used:

  • Python for data analysis and scripting.
  • Pandas, Matplotlib, and Seaborn for data manipulation and visualization.

Link to the Project Repository:

https://github.com/VaishDeshpande234/Electric_Vehicle_Registrations_Project

Project 2: Predictive Maintenance for Turbofan Engine using Machine Learning

Project Overview:

The goal of this project is to predict the Remaining Useful Life (RUL) of Turbofan engines based on sensor data. Predictive maintenance helps identify the point at which an engine is likely to fail, allowing for timely maintenance to prevent failures and optimize maintenance schedules.

Key Highlights:

  • Data Preprocessing: Loaded and cleaned the training, test, and RUL datasets.
  • Feature Engineering: Calculated RUL for the training dataset and prepared the test dataset.
  • Model Training and Evaluation: Trained Random Forest, Gradient Boosting, and LSTM models. Evaluated models using RMSE and MAE metrics.
  • Visualizations: Created scatter plots comparing predicted and ground truth RUL, histograms of engine cycle distributions, and correlation matrices of features to analyze model performance and data relationships.

Technologies Used:

  • Python for data analysis and scripting.
  • Pandas, Matplotlib, Seaborn for data manipulation and visualization.
  • Scikit-learn for machine learning models.
  • Keras for LSTM model.

Link to the Project Repository:

https://github.com/VaishDeshpande234/Predictive-Maintenance

Project 3: Sentiment Analysis on Twitter Dataset Using Machine Learning

Project Overview:

The goal of this project is to perform sentiment analysis on a large dataset of tweets to classify them as positive or negative. Sentiment analysis helps in understanding the sentiment of users towards specific topics, brands, or events, enabling better decision-making and strategy formulation.

Key Highlights:

  • Data Preprocessing and EDA: Loaded the dataset, removed unnecessary columns, replaced sentiment values for better understanding, and created word clouds for negative and positive tweets.
  • Feature Engineering: Preprocessed text data by converting text to lowercase, replacing URLs, emojis, and usernames with placeholders, removing non-alphanumeric characters and stopwords, and lemmatizing words. Converted text data into numerical features using TF-IDF.
  • Model Training and Evaluation: Split the data into training and test sets. Trained three models (Bernoulli Naive Bayes, LinearSVC, and Logistic Regression) and evaluated them using precision, recall, f1-score, and confusion matrix.
  • Results: Achieved good accuracy and performance across all models, with Logistic Regression performing the best with an accuracy of 0.83.
  • Visualizations:
    1. Word Cloud for Negative Tweets: Visual representation of the most frequent words in negative tweets.
    2. Word Cloud for Positive Tweets: Visual representation of the most frequent words in positive tweets.
    3. Confusion Matrix: Heatmap of the confusion matrix showing the performance of the models in terms of true positives, false positives, false negatives, and true negatives.

Technologies Used:

  • Python for data analysis and scripting.
  • Pandas, Matplotlib, Seaborn for data manipulation and visualization.
  • Scikit-learn for machine learning models.
  • NLTK for natural language processing.

Link to the Project Repository:

https://github.com/VaishDeshpande234/Sentiment-Analysis

Next Steps:

I am currently expanding my portfolio with more projects in data science and machine learning. Stay tuned for updates!

Contact Information

Feel free to reach out to me via LinkedIn(https://www.linkedin.com/in/vaishnavi-deshpande-477392297/) or Email(deshpandevaish2310@gmail.com).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published