HtH-2023-animal-condition-analysis

This is an exploratory data analysis project I worked on as my capstone project for the Hack the Hood Fall 2023 Build Data Science bootcamp.

I chose to analyze the Animal Conditions dataset by Grace Hephzibah M. and William Oliveira Gibin available on Kaggle at this link.
This particular dataset inherently stood out to me when I was searching for a dataset I could use because I thought it'd be interesting to find the particular symptoms or combination of symptoms that cause a condition to be dangerous as opposed to non-dangerous although that is more in the domain of machine learning.
I also had a childhood dream of becoming a vet when I was in grade school, and I think this is one way I can get close to fulfilling that dream.

Analysis Questions

I sought to answer the following questions about the dataset:

What kinds of animals are present in the dataset and in what quantities?
What is the proportion of domestic animals to wild animals in the dataset?
I suspect that there will be more observations on domestic animals than on wild animals and would like to test this hypothesis.
What are the different symptoms and in what quantities do they occur?
What symptom was most prevalent across the entire dataset?
What was the rarest symptom across the entire dataset?
What percentage of animals had their symptoms marked as dangerous?
Which symptom was most prevalent in cats?
Which symptoms appear more in non-dangerous cases than they do in dangerous cases and of those symptoms which one had the highest non-dangerous occurrence to dangerous occurrence ratio?
This will help identify if there are any symptoms that are generally not dangerous.
What is the percentage of animals that died?
A) Train a machine learning model for classifying the condition of the animals as dangerous or not dangerous based on their symptoms.
B) Visualize the performance of the trained machine learning model.

A Google Colab notebook for the project is available at this link.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
.gitignore		.gitignore
README.md		README.md
animal_condition_analysis.ipynb		animal_condition_analysis.ipynb
requirements.txt		requirements.txt