cleanup

jmbuhr · Oct 14, 2023 · 33ad9dc · 33ad9dc
1 parent a59f910
commit 33ad9dc
Show file tree

Hide file tree

Showing 18 changed files with 670 additions and 213 deletions.
diff --git a/01-intro.qmd b/01-intro.qmd
@@ -169,43 +169,49 @@ A good convention is to always use `snake_case`.
 
 First we have numbers (which internally are called `numeric` or `double`)
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 12
 12.5
 ```
 
 Then, there are whole numbers (`integer`)
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 1L # denoted by L
 ```
 
 as well as the rarely used complex numbers (`complex`)
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 1 + 3i # denoted by the small i for the imaginary part
 ```
 
 Text data however will be used more often (`character`, `string`).
 Everything enclosed in quotation marks will be treated as text.
 Double or single quotation marks are both fine.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 "It was night again."
 'This is also text'
 ```
 
 Logical values can only contain yes or no, or rather `TRUE` and `FALSE` in programming terms (`boolean`, `logical`).
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 TRUE
 FALSE
 ```
 
 There are some special types that mix with any other type.
 Like `NULL` for no value and `NA` for "Not Assigned".
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 NULL
 NA
 ```
@@ -269,7 +275,8 @@ This is actually one of the most important things to learn today, because the he
 We can pass arguments by name or by order of appearance.
 The following two expressions are equivalent.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 sin(x = 12)
 sin(12)
 ```
@@ -365,12 +372,12 @@ Sometimes it can be helpful to write our R's full name when searching (turns out
 
 ## Literate Programming with Quarto (previously Rmarkdown): Code is communication
 
-<aside><a href="https://rmarkdown.rstudio.com/index.html"> ![](images/rmarkdown.png){width="200"} </a></aside>
-
-<!-- TODO -->
+:::aside
+[![](./images/quarto.png){width="200"}](https://quarto.org)
+:::
 
 **Quarto** enables us, to combine text with `code` and then produce a range of output formats like pdf, html, word documents, presentations etc.
-In fact, this whole website, including the slides, was created with Quarto.
+In fact, this whole website was created with Quarto.
 Sounds exciting?
 Let's dive into it!
 
@@ -401,7 +408,8 @@ The lecture video also demonstrates the different output formats (though for the
 
 Go ahead and install the tidyverse packages with
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 install.packages("tidyverse")
 ```
 
@@ -415,7 +423,8 @@ It gets executed automatically before any other chunk in the document is run.
 This makes it a good place to load packages.
 The dataset we are working with today actually comes in its own package, so we need to install this as well (Yes, there is a lot of installing today, but you will have to do this only once):
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 install.packages("palmerpenguins")
 ```
 
@@ -575,7 +584,8 @@ my_plot
 We can save our plot with the `ggsave` function.
 It also has more arguments to control the dimentions and resolution of the image.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 ggsave("my_plot.png", my_plot)
 ```
 
@@ -607,21 +617,40 @@ Here are today's tasks:
 
 In a fresh quarto document (without the example template content), load the tidyverse and the palmerpenguins packages.
 
-- Write a section of text about your previous experience with data analysis and/or programming (optional, but I can use this information to customize the course).
-- Produce a scatterplot (meaning a plot with points) of the bill length vs. the bill depth, colorcoded by species.
-    - Imaginary bonus points if you manage to use the same colors as in the [penguin-image](#fig-penguins) (hint: look at the help page for `scale_color_manual()` to find out how. Note, that R can work with it's built-in color names, `rgb()` specifications or as hex-codes such as `#1573c7`). Even more bonus points if you also look into the `theme()` function and it's arguments, or the `theme_<...>()` functions to make the plot prettier.
--   Create a vector of all odd numbers from 1 to 99 and store it in a variable.
+- Write a section of text about your **previous experience** with data analysis and/or programming (optional, but I can use this information to customize the seminars to your needs).
+
+- Create a **vector of all odd numbers from 1 to 99** and store it in a variable.
     - Create a second variable that contains the squares of the first.
     - Have a look at the `tibble` function. Remember that you can always access the help page for a function using the `?` syntax, e.g. `?tibble::tibble` (The two colons `::` specify the package a function is coming from. You only need `tibble(...)` in the code because the `tibble` package is loaded automatically with the tidyverse. Here, I specify it directly to send you to the correct help page).
     - Create a `tibble` where the columns are the vectors `x` and `y`.
     - Create a scatterplot (points) of the two columns using `ggplot`.
     - What `geom_` function do you need to add to the plot to add a line that connects your points?
-- Check the metadata (YAML) of your quarto document and make sure it contains your name as the `author:` .
 
-<!-- TODO -->
+- Load the **penguins** dataset from the `palmerpenguins` package.
+  Produce a scatterplot of the bill length vs. the bill depth, colorcoded by species.
+    - Imaginary bonus points if you manage to use the same colors as in the [penguin-image](#fig-penguins) (hint: look at the help page for `scale_color_manual()` to find out how.
+      Note, that R can work with it's built-in color names, `rgb()` specifications or as hex-codes such as `#1573c7`).
+      Even more bonus points if you also look into the `theme()` function and it's arguments, or the `theme_<...>()` functions to make the plot prettier.
+
+- Check the metadata (YAML) of your quarto document and make sure it contains your name as the `author:` .
 
 - Make the output document [self contained](https://quarto.org/docs/output-formats/html-basics.html#self-contained) by adding `embed-resources: true` to the yaml header.
     - [Here](https://quarto.org/docs/reference/formats/html.html) are a couple more YAML options you can try if you feel adventurous.
+
+The top of your Quarto document should now look like this between the `---`:
+
+```yaml
+title: "Lecture 1"
+author: <Your Name>
+format:
+  html:
+    embed-resources: true
+    # more html options if you want
+    # like e.g. theme: <...>
+execute:
+  warning: false
+```
+
 - Knit it and ship it! (=press the render button and send me the rendered html document via discord)
 
 
@@ -634,6 +663,8 @@ In a fresh quarto document (without the example template content), load the tidy
 
 ### Exercise Tips
 
+<!-- TODO -->
+
 {{< youtube Ycl4CMJdneM >}}
 
 ## Learn more:

diff --git a/02-data-wrangling.qmd b/02-data-wrangling.qmd
@@ -80,7 +80,8 @@ We can get the correct link by clicking on the **Raw** button:
 
 Then we can use this to download the file:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 download.file("https://raw.githubusercontent.com/jmbuhr/dataintro/main/data/02/gapminder.csv", "example-download.csv")
 ```
 
@@ -131,7 +132,8 @@ However, in order to have the data nice and safe,
 we might want to save if somewhere, just in case
 (links can change, especially when it is someone else's link).
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 our_data <- read_csv("https://raw.githubusercontent.com/jmbuhr/dataintro/main/data/02/gapminder.csv")
 
 write_csv(our_data, "our-data.csv")
@@ -159,7 +161,8 @@ so on windows you will see something like `C:\\User\Jannik\Documents\...`
 while on mac and linux it starts with `/home/jannik/Documents/...`.
 For example, I could read the same gapminder dataset by:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_csv("/home/jannik/Documents/projects/teaching/dataintro/data/02/gapminder.csv")
 ```
 
@@ -174,7 +177,8 @@ In order for our work to be portable, robust and shareable,
 we need our file paths to be relative to the root
 of our project (which is set by the RStudio project).
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_csv("./data/02/gapminder.csv")
 ```
 
@@ -216,7 +220,8 @@ It will still call it csv,
 but actually it is separated by semicolons!
 We have a special function for this:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_csv2("data/02/gapminder_csv2.csv")
 ```
 
@@ -240,7 +245,8 @@ We notice that the values are separated by "\t", a special sequence that stands
 I am not showing the output here because it is just
 the gapminder dataset once again.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_tsv("data/02/gapminder_tsv.txt")
 ```
 
@@ -249,7 +255,8 @@ we can use the general function `read_delim`.
 Say a co-worker misunderstood us and thought tsv stands for "Tilde separated values",
 we can still read his file.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_delim("data/02/obscure_file.tsv", "~")
 ```
 
@@ -337,7 +344,8 @@ Here are a couple of examples without the output,
 run them in your R session to confirm that they do what you think they do
 (but do have a look at the help pages yourselves, they are quite well written).
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 select(gapminder, year:pop)
 select(gapminder, starts_with("co"))
 select(gapminder, where(is.numeric))
@@ -355,7 +363,8 @@ This is achieved with the function `filter`.
 Here, we select all rows, where the year is greater than 2000
 and the country is New Zealand.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 filter(gapminder, year > 2000, country == "New Zealand")
 ```
 
@@ -369,7 +378,8 @@ Functions that deal with text (strings or character in R's language)
 in the tidyverse start with `str_`, so they are easy to find
 with autocompletion.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 filter(gapminder, year > 2000, str_to_lower(country) == "new zealand")
 ```
 
@@ -378,14 +388,16 @@ Instead of combining conditions with `,` (which works the same as
 Here, we get all rows where the country is New Zealand or
 the country is Afghanistan.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 filter(gapminder, country == "New Zealand" | country == "Afghanistan")
 ```
 
 This particular comparison can be written more succinctly,
 by asking (for every row), is the particular country `%in%` this vector?
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 filter(gapminder, country %in% c("New Zealand", "Afghanistan"))
 ```
 

diff --git a/03-tidy-data.qmd b/03-tidy-data.qmd
@@ -432,7 +432,8 @@ So, what do I mean by nested?
 Remember that `lists` can contain elements of any type, even other lists.
 If we have a list that contains more lists, we call it nested e.g.
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 list(
   c(1, 2),
   list(

diff --git a/04-functional-programming.qmd b/04-functional-programming.qmd
@@ -69,7 +69,8 @@ You can find them in the `data/04/` folder.
 
 We already know how to read in *one* csv-file:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 read_csv("./data/04/Africa.csv")
 ```
 
@@ -145,7 +146,8 @@ paths <- fs::dir_ls("./data/04/")
 Then we map the `read_csv` function over our vector
 and bind the resulting list of dataframes into one dataframe:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 result <- map(paths, read_csv)
 bind_rows(result)
 ```
@@ -154,7 +156,8 @@ The operation of mapping over a vector and combining the resulting
 list into one dataframe is actually so common that
 there is a variant of `map` that does this step automatically:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 map_df(paths, read_csv, .id = "continent")
 ```
 
@@ -275,7 +278,8 @@ code in a file.
 And when we define functions in this file, those functions
 are then available to us:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 source("R/my_functions.R")
 read_experiment_data(4)
 ```
@@ -495,7 +499,8 @@ from the tidytuesday project
 ([link](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-08-13)).
 You can import it with:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 emperors <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-13/emperors.csv")
 ```
 
@@ -504,7 +509,8 @@ In order to fix this we will be using the [lubridate](https://lubridate.tidyvers
 which is installed with the tidyverse, but not automatically loaded.
 For your convenience here is a function that you can use to fix the dataset:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 library(tidyverse)
 library(lubridate)
 
@@ -544,7 +550,8 @@ Another dataset
 concerns dairy product consumption per person in the US across a number of years.
 Load it with:
 
-```{r, eval=FALSE}
+```{r}
+#| eval: false
 dairy <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/milk_products_facts.csv")
 ```