-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathp8105_hw1_pw2551.Rmd
94 lines (84 loc) · 4.05 KB
/
p8105_hw1_pw2551.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "p8105_hw1_pw2551"
author: "Paula Wu"
date: "9/23/2021"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, message = FALSE}
# I set message = FALSE for concision
library(tidyverse)
```
### **Problem 1**:
Create a data frame with variables:
```{r}
set.seed(1)
p1_df = tibble(
vec_numeric = rnorm(10),
vec_logical = vec_numeric > 0,
vec_char = c("will", "you", "get", "a",
"headache", "if", "you", "think", "too", "hard"),
vec_factor = factor(c("high", "low", "medium", "high", "medium",
"medium", "low", "high", "high", "low"))
)
p1_df
```
Take the mean of each variable:
```{r, collapse = TRUE}
# pulled each columns from the data frame for future uses
numeric_col = pull(p1_df, vec_numeric)
logical_col = pull(p1_df, vec_logical)
char_col = pull(p1_df, vec_char)
factor_col = pull(p1_df, vec_factor)
# calculate means
mean(numeric_col)
mean(logical_col)
mean(char_col)
mean(factor_col)
```
* Taking the means for numeric and logical variables works. It's because numeric and logical variables have arithmetic values: numeric variables have their literal values, while logical variables have 1 for TRUE and 0 for FALSE. <br>
* Taking means for character and factor variables doesn't work, because neither of them have arithmetic values that can be added or divided. <br>
To explicitly convert types:
```{r, eval = FALSE}
as.numeric(logical_col)
as.numeric(char_col)
as.numeric(factor_col)
```
* The conversion works for logical and factor variables but **not** for character variables. <br>
* For logical variables, the method `as.numeric()` will convert TRUE to 1 and FALSE to 0. <br>
* For factor variables, according to *rdocumentation.com*, `as.numeric()` will return the underlying numeric representation. Since there are three "levels" for factor variables, each of the element is converted to its arbitrarily corresponding number. <br>
* For character variables, `as.numeric()` gives us a "NAs introduced by coercion" warning. Texts just can't be converted into numbers. <br>
* `as.numeric()` helps explain why logic variables work and character variables don't work when I tried to take the mean. However, it doesn't help explain why factor variables don't work: factor variables can be converted to numeric variables based on their levels, but applying the `mean()` function on factor variables yields an NA result. <br>
### **Problem 2**:
Download and read dataset:
```{r}
# packages installed in the terminal
data("penguins", package = "palmerpenguins")
penguins
```
* The variable names are: `r variable.names(penguins)`. Among these `r ncol(penguins)` attributes, "bill_length_mm", "bill_depth_mm", "flipper_length_mm", and "body_mass_g" look important and meaningful to me. The values, specifically the mean values, for these important variables will be calculated in r code chunks below. <br>
```{r, collapse = TRUE}
# I get rid of the Null values
bill_l = mean(pull(penguins, bill_length_mm), na.rm = TRUE)
bill_d = mean(pull(penguins, bill_depth_mm), na.rm = TRUE)
flip_l = mean(pull(penguins, flipper_length_mm), na.rm = TRUE)
body_m = mean(pull(penguins, body_mass_g), na.rm = TRUE)
c(bill_l, bill_d, flip_l, body_m)
```
* Thus, the means, rounded to 2 decimal places, for "bill_length_mm" is `r round(bill_l, digits = 2)`, for "bill_depth_mm" is `r round(bill_d, digits = 2)`, for "flipper_length_mm" is `r round(flip_l, digits = 2)`, and for "body_mass_g" is `r round(body_m, digits = 2)`. <br>
* For this penguins data set, there are `r nrow(penguins)` rows and `r ncol(penguins)` columns. <br>
* The mean of flipper length (mm) is mentioned above, which is `r round(flip_l, digits =2)`. <br>
To plot the scatterplot using `ggplot`:
```{r}
ggplot(penguins, aes(x = bill_length_mm, y = flipper_length_mm, col = species)) +
geom_point() +
ggtitle("Flipper and Bill Length for Penguins") +
theme(plot.title = element_text(hjust = 0.5)) +
labs(y = "Flipper Length (mm)", x = "Bill Length (mm)")
```
To save the graph using `ggsave`:
```{r}
ggsave("./scatterplot.pdf", height = 4, width = 6)
```