-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpenguins.rmd
143 lines (100 loc) · 9.03 KB
/
penguins.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
title: "Penguins Sample R Document"
author: "Jared White"
date: "`r Sys.Date()`"
output: github_document
---
© Jared White, 2024. This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).
## A Short Analysis of the Palmer's Penguins Dataset R Library
#### Load Libraries & Dataset
The Palmer Penguins data is one of the many standard datasets that is available within CRAN, and is commonly used to demonstrate the capabilities of analysis and visualization with the R TidyVerse package.
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(janitor)
library(palmerpenguins)
pallete1 <- c("royalblue1", "forestgreen", "salmon")
pallete2 <- c("#b69574", "#efe0bf")
scale_color_manual(values = pallete1)
# the data is already very clean, but does have 11 NAs, which are dropped.
df <- penguins %>% clean_names() %>% drop_na()
```
#### Creating Some Useful Variables and Exploratory Statistics.
###### First and Last Years of the Data.
```{r message=FALSE, warning=FALSE}
fyr <- min(penguins$year)
lyr <- max(penguins$year)
```
###### Correlation between body mass and flipper length per island and species.
```{r message=FALSE, warning=FALSE}
bm_fl_cor <- df %>% group_by(island, species) %>% summarise(correlation = cor(body_mass_g, flipper_length_mm), avg_bm = mean(body_mass_g), avg_fl = mean(flipper_length_mm))
head(bm_fl_cor)
# standard deviations of summary values
sd_bm_fl_cor <- sd(bm_fl_cor$correlation)
sd_avg_bm <- sd(bm_fl_cor$avg_bm)
sd_avg_fl <- sd(bm_fl_cor$avg_fl)
```
#### The Classic Body Mass to Flipper Length Visualization, with Some Extra Flare
This is one of the most widely created correlation visuals in R. It shows a positive correlation between the total body mass of penguins in grams and the length of their flippers in millimeters. This sppecific series of scatterplots also identifies the difference between the three species' size and native islands.
```{r message=FALSE, warning=FALSE}
# need to create dfs for custom annotations per species...
plot1_ann_gent <- data.frame(island = c("Biscoe"), label = c("Corr = 0.71"))
plot1_ann_chin <- data.frame(island = c("Dream"), label = c("Corr = 0.64"))
plot1_ann_adel <- data.frame(island = c("Biscoe", "Dream", "Torgersen"), label = c("Corr = 0.53", "Corr = 0.43", "Corr = 0.45"))
ggplot(data = df)+
# Gam Smoothing is used
geom_smooth(mapping = aes(x=flipper_length_mm, y=body_mass_g), color = 'black', method = "gam" )+
# Jitter and color+shape is used to help with overplotting
geom_jitter(aes(x=flipper_length_mm, y=body_mass_g,
color = species, shape = species)) +
# Separate the plots by native island.
facet_wrap(~ island) +
# Adding the custom annotation values and labels
geom_text(x = 190, y = 5750, aes(label = label), data = plot1_ann_gent, color = 'royalblue1', fontface = "bold") +
geom_text(x = 190, y = 5000, aes(label = label), data = plot1_ann_chin, color = 'forestgreen', fontface = "bold") +
geom_text(x = 215, y = 3250, aes(label = label), data = plot1_ann_adel, color = 'salmon', fontface = "bold") +
labs(title = "Flipper Length and Body Mass by Island and Species", x = "Flipper Length (mm)", y = "Body Mass (g)", caption = paste0("Palmer's Penguins, ", fyr, " to ", lyr))
```
As these plots illustrate, Adelie penguins inhabit all three islands, whereas Gentoos and Chinstraps occupy only one island each - Biscoe and Dream respectively. Notably, although Adelies exhibit the lowest correlation between body mass and flipper length overall, the correlation within this species is noticeably higher on the island that is shared with the species having the highest correlation (Biscoe, which is also inhabited by Gentoo penguins). This raises interesting questions about environmental adaptation and possible cross-breeding which could be the subject of future research. While cross-breeding between penguin species is relatively rare, there have been recently observed and documented cases of this occurring.
Sources: [Nature.com, Feb 13, 2024](https://www.nature.com/articles/s41598-024-54105-w), [BouldersBeachPenguins.com](https://bouldersbeachpenguins.com/can-penguins-crossbreed/)
#### Visualization of the Number of Observations by Sex and Species
```{r message=FALSE, warning=FALSE}
sp_sex_summary <- df %>% group_by(species, sex) %>% summarise(count = n())
ggplot(data = sp_sex_summary) + geom_col(mapping = aes(x = species, y = count, fill = sex, color = species)) +
scale_fill_grey(start = 0.7, end = 0.4) +
geom_text(aes(label = count, x = species, y = ifelse(sex =="male", count-(.2*count), count+(.8*count))), color = "black", fontface = "bold") +
labs(title = "Number of Observations by Species and Sex", caption = paste0("Palmer's Penguins, ", fyr, " to ", lyr), y = "Observations", fill = "Sex")
```
The representation of male and female Gentoos differs slightly, likely due to a slight sampling bias rather than reflecting the true composition of the entire population.
#### Using the Dataset to Predict the Fastest-Swimming Species
#### (and Why its not what this dataset might suggest)
According to [This Research Article](https://pubs.aip.org/aip/pof/article-abstract/35/6/061907/2897344/Hydrodynamic-performance-of-a-penguin-wing-Effect?redirectedFrom=fulltext) on the hydrodynamic performance of penguin wings, published June of 2023: "Results indicate that feathering is the primary contributor to thrust generation." Feathering is the angling of the wings for optimal hydrodynamic performance. Additionally, the publication states: "Spatial analysis of the thrust shows that the outer three-fifths of the wing are the primary source of thrust, contributing 85.4% of thrust generation at optimal feathering amplitude." Since the flipper width is not included in the dataset, it is not possible to calculate the total wing area. However, the cited publication does detail a full computer simulation of penguin hydrodynamics. The following ratio is a *very* rough approximation of this model given the data available:
$$
SP_{fl} : BodyMass = 10 \times \frac{fl \times .5124}{BodyMass}
$$
Since the outer 60% of the flipper generates 85.4% of the total thrust, 51.24% of the total flipper length is divided by the body mass. This figure is then multiplied by 10 to create a rounded Swim Power from flipper length : Body Mass ratio.
###### Adding the metric to a new df and summarizing.
```{r message=FALSE, warning=FALSE}
df_sr <- df %>% mutate(sp_ratio = round(10*((flipper_length_mm*.5124)/body_mass_g), digits = 2))
df_sr_summary <- df_sr %>% group_by(species) %>% summarise(avg_sp_ratio = round(mean(sp_ratio), digits = 4))
ggplot(data = df_sr_summary) + geom_col(aes(x = species, y = avg_sp_ratio, fill = species)) +
geom_text(aes(x = species, y = avg_sp_ratio-(.2*avg_sp_ratio), label = avg_sp_ratio, fontface = 'bold'), color = "black") +
labs(title = "Swim Power from Flipper Length to Body Mass per Species", caption = paste0("Palmer's Penguins, ", fyr, " to ", lyr), y = "Swim Power : Body Mass", x = "Species")
```
Despite having the lowest Swim Power from Flipper Length : Body Mass Ratio, Gentoo penguins are the fastest average swimmers out of the three species included in the dataset. This demonstrates the effectiveness of optimal feathering on hydrodynamics. Additionally, the formula for this ratio does not account for the size of the penguins' air sacs or many other relevant factors.
#### Citations
###### Core R
R Core Team (2023). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
###### Tidyverse package
Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” _Journal of Open Source Software_, *4*(43), 1686. doi: 10.21105/joss.01686 (URL: <https://doi.org/10.21105/joss.01686>).
###### Janitor Package
Sam Firke (2023). janitor: Simple Tools for Examining and Cleaning Dirty Data. R package version 2.2.0. <https://CRAN.R-project.org/package=janitor>
###### Penguins Dataset
Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. https://allisonhorst.github.io/palmerpenguins/. doi: 10.5281/zenodo.3960218.
###### Penguin Species Cross-Breeding Articles
Borecki, P., Rosenberger, J., Mucha, A. et al. Breeding behavior analysis in a large captive colony of African penguins (Spheniscus demersus) and its implications for population management and conservation. _Sci Rep_ 14, 3589 (2024). <https://doi.org/10.1038/s41598-024-54105-w>
<https://bouldersbeachpenguins.com/can-penguins-crossbreed/>
© 2024 bouldersbeachpenguins.com - A project by Jan Pretorius
###### Gentoo Feathering Research
Zhanzhou Hao, Bo Yin, Prasert Prapamonthon, Guowei Yang; Hydrodynamic performance of a penguin wing: Effect of feathering and flapping. _Physics of Fluids_ 1 June 2023; 35 (6): 061907. <https://doi.org/10.1063/5.0147776>