Skip to content

Commit

Permalink
cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
jmbuhr committed Oct 14, 2023
1 parent a59f910 commit 33ad9dc
Show file tree
Hide file tree
Showing 18 changed files with 670 additions and 213 deletions.
71 changes: 51 additions & 20 deletions 01-intro.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -169,43 +169,49 @@ A good convention is to always use `snake_case`.

First we have numbers (which internally are called `numeric` or `double`)

```{r, eval=FALSE}
```{r}
#| eval: false
12
12.5
```

Then, there are whole numbers (`integer`)

```{r, eval=FALSE}
```{r}
#| eval: false
1L # denoted by L
```

as well as the rarely used complex numbers (`complex`)

```{r, eval=FALSE}
```{r}
#| eval: false
1 + 3i # denoted by the small i for the imaginary part
```

Text data however will be used more often (`character`, `string`).
Everything enclosed in quotation marks will be treated as text.
Double or single quotation marks are both fine.

```{r, eval=FALSE}
```{r}
#| eval: false
"It was night again."
'This is also text'
```

Logical values can only contain yes or no, or rather `TRUE` and `FALSE` in programming terms (`boolean`, `logical`).

```{r, eval=FALSE}
```{r}
#| eval: false
TRUE
FALSE
```

There are some special types that mix with any other type.
Like `NULL` for no value and `NA` for "Not Assigned".

```{r, eval=FALSE}
```{r}
#| eval: false
NULL
NA
```
Expand Down Expand Up @@ -269,7 +275,8 @@ This is actually one of the most important things to learn today, because the he
We can pass arguments by name or by order of appearance.
The following two expressions are equivalent.

```{r, eval=FALSE}
```{r}
#| eval: false
sin(x = 12)
sin(12)
```
Expand Down Expand Up @@ -365,12 +372,12 @@ Sometimes it can be helpful to write our R's full name when searching (turns out

## Literate Programming with Quarto (previously Rmarkdown): Code is communication

<aside><a href="https://rmarkdown.rstudio.com/index.html"> ![](images/rmarkdown.png){width="200"} </a></aside>

<!-- TODO -->
:::aside
[![](./images/quarto.png){width="200"}](https://quarto.org)
:::

**Quarto** enables us, to combine text with `code` and then produce a range of output formats like pdf, html, word documents, presentations etc.
In fact, this whole website, including the slides, was created with Quarto.
In fact, this whole website was created with Quarto.
Sounds exciting?
Let's dive into it!

Expand Down Expand Up @@ -401,7 +408,8 @@ The lecture video also demonstrates the different output formats (though for the

Go ahead and install the tidyverse packages with

```{r, eval=FALSE}
```{r}
#| eval: false
install.packages("tidyverse")
```

Expand All @@ -415,7 +423,8 @@ It gets executed automatically before any other chunk in the document is run.
This makes it a good place to load packages.
The dataset we are working with today actually comes in its own package, so we need to install this as well (Yes, there is a lot of installing today, but you will have to do this only once):

```{r, eval=FALSE}
```{r}
#| eval: false
install.packages("palmerpenguins")
```

Expand Down Expand Up @@ -575,7 +584,8 @@ my_plot
We can save our plot with the `ggsave` function.
It also has more arguments to control the dimentions and resolution of the image.

```{r, eval=FALSE}
```{r}
#| eval: false
ggsave("my_plot.png", my_plot)
```

Expand Down Expand Up @@ -607,21 +617,40 @@ Here are today's tasks:

In a fresh quarto document (without the example template content), load the tidyverse and the palmerpenguins packages.

- Write a section of text about your previous experience with data analysis and/or programming (optional, but I can use this information to customize the course).
- Produce a scatterplot (meaning a plot with points) of the bill length vs. the bill depth, colorcoded by species.
- Imaginary bonus points if you manage to use the same colors as in the [penguin-image](#fig-penguins) (hint: look at the help page for `scale_color_manual()` to find out how. Note, that R can work with it's built-in color names, `rgb()` specifications or as hex-codes such as `#1573c7`). Even more bonus points if you also look into the `theme()` function and it's arguments, or the `theme_<...>()` functions to make the plot prettier.
- Create a vector of all odd numbers from 1 to 99 and store it in a variable.
- Write a section of text about your **previous experience** with data analysis and/or programming (optional, but I can use this information to customize the seminars to your needs).

- Create a **vector of all odd numbers from 1 to 99** and store it in a variable.
- Create a second variable that contains the squares of the first.
- Have a look at the `tibble` function. Remember that you can always access the help page for a function using the `?` syntax, e.g. `?tibble::tibble` (The two colons `::` specify the package a function is coming from. You only need `tibble(...)` in the code because the `tibble` package is loaded automatically with the tidyverse. Here, I specify it directly to send you to the correct help page).
- Create a `tibble` where the columns are the vectors `x` and `y`.
- Create a scatterplot (points) of the two columns using `ggplot`.
- What `geom_` function do you need to add to the plot to add a line that connects your points?
- Check the metadata (YAML) of your quarto document and make sure it contains your name as the `author:` .

<!-- TODO -->
- Load the **penguins** dataset from the `palmerpenguins` package.
Produce a scatterplot of the bill length vs. the bill depth, colorcoded by species.
- Imaginary bonus points if you manage to use the same colors as in the [penguin-image](#fig-penguins) (hint: look at the help page for `scale_color_manual()` to find out how.
Note, that R can work with it's built-in color names, `rgb()` specifications or as hex-codes such as `#1573c7`).
Even more bonus points if you also look into the `theme()` function and it's arguments, or the `theme_<...>()` functions to make the plot prettier.

- Check the metadata (YAML) of your quarto document and make sure it contains your name as the `author:` .

- Make the output document [self contained](https://quarto.org/docs/output-formats/html-basics.html#self-contained) by adding `embed-resources: true` to the yaml header.
- [Here](https://quarto.org/docs/reference/formats/html.html) are a couple more YAML options you can try if you feel adventurous.

The top of your Quarto document should now look like this between the `---`:

```yaml
title: "Lecture 1"
author: <Your Name>
format:
html:
embed-resources: true
# more html options if you want
# like e.g. theme: <...>
execute:
warning: false
```
- Knit it and ship it! (=press the render button and send me the rendered html document via discord)
Expand All @@ -634,6 +663,8 @@ In a fresh quarto document (without the example template content), load the tidy

### Exercise Tips

<!-- TODO -->

{{< youtube Ycl4CMJdneM >}}

## Learn more:
Expand Down
36 changes: 24 additions & 12 deletions 02-data-wrangling.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ We can get the correct link by clicking on the **Raw** button:

Then we can use this to download the file:

```{r, eval=FALSE}
```{r}
#| eval: false
download.file("https://raw.githubusercontent.com/jmbuhr/dataintro/main/data/02/gapminder.csv", "example-download.csv")
```

Expand Down Expand Up @@ -131,7 +132,8 @@ However, in order to have the data nice and safe,
we might want to save if somewhere, just in case
(links can change, especially when it is someone else's link).

```{r, eval=FALSE}
```{r}
#| eval: false
our_data <- read_csv("https://raw.githubusercontent.com/jmbuhr/dataintro/main/data/02/gapminder.csv")
write_csv(our_data, "our-data.csv")
Expand Down Expand Up @@ -159,7 +161,8 @@ so on windows you will see something like `C:\\User\Jannik\Documents\...`
while on mac and linux it starts with `/home/jannik/Documents/...`.
For example, I could read the same gapminder dataset by:

```{r, eval=FALSE}
```{r}
#| eval: false
read_csv("/home/jannik/Documents/projects/teaching/dataintro/data/02/gapminder.csv")
```

Expand All @@ -174,7 +177,8 @@ In order for our work to be portable, robust and shareable,
we need our file paths to be relative to the root
of our project (which is set by the RStudio project).

```{r, eval=FALSE}
```{r}
#| eval: false
read_csv("./data/02/gapminder.csv")
```

Expand Down Expand Up @@ -216,7 +220,8 @@ It will still call it csv,
but actually it is separated by semicolons!
We have a special function for this:

```{r, eval=FALSE}
```{r}
#| eval: false
read_csv2("data/02/gapminder_csv2.csv")
```

Expand All @@ -240,7 +245,8 @@ We notice that the values are separated by "\t", a special sequence that stands
I am not showing the output here because it is just
the gapminder dataset once again.

```{r, eval=FALSE}
```{r}
#| eval: false
read_tsv("data/02/gapminder_tsv.txt")
```

Expand All @@ -249,7 +255,8 @@ we can use the general function `read_delim`.
Say a co-worker misunderstood us and thought tsv stands for "Tilde separated values",
we can still read his file.

```{r, eval=FALSE}
```{r}
#| eval: false
read_delim("data/02/obscure_file.tsv", "~")
```

Expand Down Expand Up @@ -337,7 +344,8 @@ Here are a couple of examples without the output,
run them in your R session to confirm that they do what you think they do
(but do have a look at the help pages yourselves, they are quite well written).

```{r, eval=FALSE}
```{r}
#| eval: false
select(gapminder, year:pop)
select(gapminder, starts_with("co"))
select(gapminder, where(is.numeric))
Expand All @@ -355,7 +363,8 @@ This is achieved with the function `filter`.
Here, we select all rows, where the year is greater than 2000
and the country is New Zealand.

```{r, eval=FALSE}
```{r}
#| eval: false
filter(gapminder, year > 2000, country == "New Zealand")
```

Expand All @@ -369,7 +378,8 @@ Functions that deal with text (strings or character in R's language)
in the tidyverse start with `str_`, so they are easy to find
with autocompletion.

```{r, eval=FALSE}
```{r}
#| eval: false
filter(gapminder, year > 2000, str_to_lower(country) == "new zealand")
```

Expand All @@ -378,14 +388,16 @@ Instead of combining conditions with `,` (which works the same as
Here, we get all rows where the country is New Zealand or
the country is Afghanistan.

```{r, eval=FALSE}
```{r}
#| eval: false
filter(gapminder, country == "New Zealand" | country == "Afghanistan")
```

This particular comparison can be written more succinctly,
by asking (for every row), is the particular country `%in%` this vector?

```{r, eval=FALSE}
```{r}
#| eval: false
filter(gapminder, country %in% c("New Zealand", "Afghanistan"))
```

Expand Down
3 changes: 2 additions & 1 deletion 03-tidy-data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,8 @@ So, what do I mean by nested?
Remember that `lists` can contain elements of any type, even other lists.
If we have a list that contains more lists, we call it nested e.g.

```{r, eval=FALSE}
```{r}
#| eval: false
list(
c(1, 2),
list(
Expand Down
21 changes: 14 additions & 7 deletions 04-functional-programming.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ You can find them in the `data/04/` folder.

We already know how to read in *one* csv-file:

```{r, eval=FALSE}
```{r}
#| eval: false
read_csv("./data/04/Africa.csv")
```

Expand Down Expand Up @@ -145,7 +146,8 @@ paths <- fs::dir_ls("./data/04/")
Then we map the `read_csv` function over our vector
and bind the resulting list of dataframes into one dataframe:

```{r, eval=FALSE}
```{r}
#| eval: false
result <- map(paths, read_csv)
bind_rows(result)
```
Expand All @@ -154,7 +156,8 @@ The operation of mapping over a vector and combining the resulting
list into one dataframe is actually so common that
there is a variant of `map` that does this step automatically:

```{r, eval=FALSE}
```{r}
#| eval: false
map_df(paths, read_csv, .id = "continent")
```

Expand Down Expand Up @@ -275,7 +278,8 @@ code in a file.
And when we define functions in this file, those functions
are then available to us:

```{r, eval=FALSE}
```{r}
#| eval: false
source("R/my_functions.R")
read_experiment_data(4)
```
Expand Down Expand Up @@ -495,7 +499,8 @@ from the tidytuesday project
([link](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-08-13)).
You can import it with:

```{r, eval=FALSE}
```{r}
#| eval: false
emperors <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-13/emperors.csv")
```

Expand All @@ -504,7 +509,8 @@ In order to fix this we will be using the [lubridate](https://lubridate.tidyvers
which is installed with the tidyverse, but not automatically loaded.
For your convenience here is a function that you can use to fix the dataset:

```{r, eval=FALSE}
```{r}
#| eval: false
library(tidyverse)
library(lubridate)
Expand Down Expand Up @@ -544,7 +550,8 @@ Another dataset
concerns dairy product consumption per person in the US across a number of years.
Load it with:

```{r, eval=FALSE}
```{r}
#| eval: false
dairy <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/milk_products_facts.csv")
```

Expand Down
Loading

0 comments on commit 33ad9dc

Please sign in to comment.