Skip to content

An exploratory data analysis project I worked on as my capstone for the Hack the Hood Fall 2023 Build Data Science Bootcamp.

Notifications You must be signed in to change notification settings

francisohara24/HtH-2023-animal-condition-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HtH-2023-animal-condition-analysis

Dataset Cover Image
This is an exploratory data analysis project I worked on as my capstone project for the Hack the Hood Fall 2023 Build Data Science bootcamp.

  • I chose to analyze the Animal Conditions dataset by Grace Hephzibah M. and William Oliveira Gibin available on Kaggle at this link.
  • This particular dataset inherently stood out to me when I was searching for a dataset I could use because I thought it'd be interesting to find the particular symptoms or combination of symptoms that cause a condition to be dangerous as opposed to non-dangerous although that is more in the domain of machine learning.
  • I also had a childhood dream of becoming a vet when I was in grade school, and I think this is one way I can get close to fulfilling that dream.

Analysis Questions

I sought to answer the following questions about the dataset:

  1. What kinds of animals are present in the dataset and in what quantities?
  2. What is the proportion of domestic animals to wild animals in the dataset?
    I suspect that there will be more observations on domestic animals than on wild animals and would like to test this hypothesis.
  3. What are the different symptoms and in what quantities do they occur?
  4. What symptom was most prevalent across the entire dataset?
  5. What was the rarest symptom across the entire dataset?
  6. What percentage of animals had their symptoms marked as dangerous?
  7. Which symptom was most prevalent in cats?
  8. Which symptoms appear more in non-dangerous cases than they do in dangerous cases and of those symptoms which one had the highest non-dangerous occurrence to dangerous occurrence ratio?
    This will help identify if there are any symptoms that are generally not dangerous.
  9. What is the percentage of animals that died?
  10. A) Train a machine learning model for classifying the condition of the animals as dangerous or not dangerous based on their symptoms.
    B) Visualize the performance of the trained machine learning model.

Demo notebook

A Google Colab notebook for the project is available at this link.

About

An exploratory data analysis project I worked on as my capstone for the Hack the Hood Fall 2023 Build Data Science Bootcamp.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published