diff --git a/_freeze/index/execute-results/html.json b/_freeze/index/execute-results/html.json
new file mode 100644
index 0000000..14fa41e
--- /dev/null
+++ b/_freeze/index/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "f1681b1f3333464d2082efe471d54dd9",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Overview\"\n---\n\n\n### Welcome!\n\nThis workshop provides an overview of many of the packages included in the Tidyverse suite of packages for the R programming language. The Tidyverse is a veritable universe of tools though that no single workshop could hope to cover so **we are focusing here on an introductory approach that focuses primarily on some fundamentals to tidying data in R**. We are always happy to improve workshop content so please don't hesitate to [post an Issue](https://github.com/lter/workshop-tidyverse/issues) on our GitHub repository if you see clear areas for improvement!\n\n
\n\nTo maximize the value of this workshop to you, we recommend that you take the following steps **before the day of the workshop**. If anything is unclear, feel free to reach out to us; our contact information can be found in the \"Content Creators\" tab.\n\n## Programs to Install\n\n### R & RStudio\n\n**Install [R](https://www.r-project.org/) and its more convenient (in our opinion) user-interface: [RStudio](https://www.rstudio.com/products/rstudio/download/)**.\n\nIf you already have R, check that you have at least version 4.0.0 by running the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nversion$version.string\n```\n:::\n\n\nIf your version starts with a 3 (e.g., the above code returns \"R version 3...\"), please update R to make sure all packages behave as expected.\n\n### R Packages\n\n**Install the `tidyverse` and `palmerpenguins` R packages** using the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(c(\"tidyverse\", \"palmerpenguins\"))\nlibrary(tidyverse)\nlibrary(palmerpenguins)\n```\n:::\n\n\n**Please run the above code even if you already have these packages** to update these packages and ensure that your code aligns with the examples and challenges introduced during the workshop.\n\n## Penguin Data\n\nThe data we'll be using for this workshop comes from the `palmerpenguins` package, maintained by [Allison Horst](mailto:ahorst@ucsb.edu). The \"penguins\" dataset from this package contains size measurements for adult foraging penguins near Palmer Station, Antarctica. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station Long Term Ecological Research (LTER) Program. Let's take a look at it!\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 344 × 8\n species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n \n 1 Adelie Torgersen 39.1 18.7 181 3750\n 2 Adelie Torgersen 39.5 17.4 186 3800\n 3 Adelie Torgersen 40.3 18 195 3250\n 4 Adelie Torgersen NA NA NA NA\n 5 Adelie Torgersen 36.7 19.3 193 3450\n 6 Adelie Torgersen 39.3 20.6 190 3650\n 7 Adelie Torgersen 38.9 17.8 181 3625\n 8 Adelie Torgersen 39.2 19.6 195 4675\n 9 Adelie Torgersen 34.1 18.1 193 3475\n10 Adelie Torgersen 42 20.2 190 4250\n# ℹ 334 more rows\n# ℹ 2 more variables: sex , year \n```\n\n\n:::\n:::\n\n\nThe \"penguins\" dataset has 344 rows and 8 columns.\n\nThe columns are as follows:\n\n`species`: a factor denoting penguin species (Adélie, Chinstrap and Gentoo)\n\n`island`: a factor denoting island in Palmer Archipelago, Antarctica (Biscoe, Dream or Torgersen)\n\n`bill_length_mm`: a number denoting bill length (millimeters)\n\n`bill_depth_mm`: a number denoting bill depth (millimeters)\n\n`flipper_length_mm`: an integer denoting flipper length (millimeters)\n\n`body_mass_g`: an integer denoting body mass (grams)\n\n`sex`: a factor denoting penguin sex (female, male)\n\n`year`: an integer denoting the study year (2007, 2008, or 2009)\n\nThis dataset is an example of **tidy data**, which means that each **variable** is in its own **column** and each **observation** is in its own **row**. Generally speaking, functions from packages in the Tidyverse expect tidy data though they can be used in some cases to help get data into tidy format! Regardless, the penguins dataset is what we'll use for all examples in this workshop so be sure that you install the `palmerpenguins` R package. The examples on this page were adapted from [Allison Horst's `dplyr` tutorial](https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome)!\n\n## Websites to Visit\n\n### Supplemental Material\n\nWhile not technically necessary to attend the workshop, if you'd like you can see the content that created the workshop website you are viewing by visiting our [GitHub repository here](https://github.com/lter/workshop-tidyverse).\n\nAlso, check out **NCEAS' [Learning Hub](https://www.nceas.ucsb.edu/learning-hub)** for a complete list of workshops and trainings offered by NCEAS.\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/join/execute-results/html.json b/_freeze/join/execute-results/html.json
new file mode 100644
index 0000000..c1e69fc
--- /dev/null
+++ b/_freeze/join/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "398b73593fbad0d12c963ee5430fe436",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Join\"\n---\n\n\n## Module Learning Objectives\n\nBy the end of this module, you will be able to:\n\n- Differentiate `dplyr`'s various `join` functions from each other\n- Use `dplyr`'s `left_join`, `right_join`, `inner_join`, `full_join`, and `anti_join` functions to manipulate two dataframes\n\n\n::: {.cell}\n\n:::\n\n\n## Combining data\n\nNow that we know how to manipulate a single dataframe, how do we manipulate multiple dataframes? If we have multiple sources of data and we want to combine them together into one dataframe or table, we can **join** them through any shared column(s)! Data you'll be joining can be called \"relational data\", because there is some kind of relationship between the dataframes that you’ll be leveraging. In the `tidyverse`, combining data that has a relationship is called \"joining\". Let's look at some of `dplyr`'s many `join` functions!\n\nIn each of the following `join` functions, you provide two dataframes, the one you arbitrarily provide first is called the \"left\" dataframe while the other is called the \"right\" dataframe. This is important because each of the different `join` functions brings the columns from one of the dataframes into the other depending on (1) which dataframe is left and which is right and (2) what type of `join` you specify.\n\nThis becomes somewhat more intuitive when looking at tangible examples so let's prepare some data to `join` in different ways!\n\n### `join` Data Preparation\n\nFor demonstration purposes, let's add a new column called `record_number` to our penguins data and call the new dataframe `penguins_tidy`. As you can see below, each row is now numbered from 1 to the length of the dataframe. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Add a column called `record_number` to our penguins dataset\npenguins_tidy <- penguins %>%\n dplyr::mutate(record_number = 1:n(), .before = dplyr::everything())\n\ndplyr::glimpse(penguins_tidy)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 9\n$ record_number 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n```\n\n\n:::\n:::\n\n\nThe `palmerpenguins` package also has a \"penguins_raw\" dataset with additional, raw information on the same penguins, such as their sampling region, unique identifier, and the date when their nest was observed. Again, for demonstration purposes, let's add a new column called `record_number` and call this new dataframe `penguins_extra`. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Add a column called `record_number` to our raw penguins dataset\npenguins_extra <- penguins_raw %>%\n dplyr::mutate(record_number = 1:n()) %>%\n # Also keep only desired columns to avoid unnecessary complexity\n dplyr::select(record_number, Region, `Individual ID`, `Date Egg`)\n\ndplyr::glimpse(penguins_extra)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 4\n$ record_number 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…\n$ Region \"Anvers\", \"Anvers\", \"Anvers\", \"Anvers\", \"Anvers\", \"Anv…\n$ `Individual ID` \"N1A1\", \"N1A2\", \"N2A1\", \"N2A2\", \"N3A1\", \"N3A2\", \"N4A1\"…\n$ `Date Egg` 2007-11-11, 2007-11-11, 2007-11-16, 2007-11-16, 2007-…\n```\n\n\n:::\n:::\n\n\nNow that we have two dataframes that both have a column called `record_number`, we can `join` them together to combine information in various ways!\n\nAlso, note that if column names include spaces (as in `Individual ID` and `Date Egg`) they need to have a \"backtick\" (\\`) on either side. On your keyboard, a backtick (\\`) is on the left just below the \"escape\" key, and shares a button with the tilde (~).\n\n### `left_join` Example: Prioritize the \"Left\" Dataframe\n\n:::callout-note\n## Example\n\nIn a `left_join`, we bring the columns from the right dataframe that match rows found in the specified column(s) of the left dataframe.\n\n
\n\nWe can specify the column that we want to join based on with `by = ...`. If we don't provide this argument, then `dplyr` will automatically join on **all** matching columns between the left and right dataframes. In our case, we want to `left_join` by `record_number`.\n\nTo better demonstrate that only rows found in the left dataframe will be joined from the right dataframe, we'll use the pipe `%>%` to `filter` the left dataframe before `join`ing. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Left-join the two dataframes together on the shared column!\npenguins_left_joined <- penguins_tidy %>%\n dplyr::filter(record_number < 5) %>%\n dplyr::left_join(y = penguins_extra, by = \"record_number\")\n\ndplyr::glimpse(penguins_left_joined)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 4\nColumns: 12\n$ record_number 1, 2, 3, 4\n$ species Adelie, Adelie, Adelie, Adelie\n$ island Torgersen, Torgersen, Torgersen, Torgersen\n$ bill_length_mm 39.1, 39.5, 40.3, NA\n$ bill_depth_mm 18.7, 17.4, 18.0, NA\n$ flipper_length_mm 181, 186, 195, NA\n$ body_mass_g 3750, 3800, 3250, NA\n$ sex male, female, female, NA\n$ year 2007, 2007, 2007, 2007\n$ Region \"Anvers\", \"Anvers\", \"Anvers\", \"Anvers\"\n$ `Individual ID` \"N1A1\", \"N1A2\", \"N2A1\", \"N2A2\"\n$ `Date Egg` 2007-11-11, 2007-11-11, 2007-11-16, 2007-11-16\n```\n\n\n:::\n:::\n\n\nWhat we have in the end is `penguins_left_joined`, a dataframe with information from both `penguins_tidy` and `penguins_extra`! *All* of the rows in our `filter`ed `penguins_tidy` are kept but only the rows from `penguins_extra` that have a matching `record_number` in `penguins_tidy` are included.\n:::\n\n### `right_join` Example: Prioritize the \"Right\" Dataframe\n\n:::callout-note\n## Example\n\nIn a `right_join`, we bring rows from the left dataframe into the right dataframe based on the values in the specified column(s) of the right dataframe.\n\n
\n\nAs the names imply, a `right_join` is the opposite of a `left_join`.\n:::\n\n### `inner_join` Example: Keep Rows Found in *Both* Dataframes\n\n:::callout-note\n## Example\n\nIn an `inner_join`, we keep only the rows where the values in the column we are joining `by` are found in both dataframes. \n\n
\n\nThis can be really useful when one of the dataframes includes supplementary data that has incomplete coverage on the other dataframe and you want to simultaneously combine the dataframes and remove the inevitable `NA`s that will be created.\n\nFor example, imagine that you have a dataframe of 100 study sites with information on plant growth and a second dataframe of soil chemistry information. Your grant budget was really tight though so you needed to prioritize sample processing and you only have soil chemistry for 20 of the sites where you have plant growth data.\n\nIf you use `inner_join` on your plant growth and soil chemistry datasets, you will create a single dataframe with both chemistry and plant data that only has the sites (i.e., rows) where you had data for both. This dataframe then would likely be ready for analysis because you'd have complete data for every site in the new `join`ed dataframe!\n\nNote that in an `inner_join` it doesn't matter which dataframe is \"left\" and which is \"right\" because either way you're only keeping the rows that are found in both dataframes.\n:::\n\n### `full_join` Example: Combine *All* Data in Both Dataframes\n\n:::callout-note\n## Example\n\nIn a `full_join`, we keep all values and all rows. \n\n
\n\nA `full_join` is \"smart\" enough to fill with `NA`s in all rows that don't match between the two dataframes. Also, just like an `inner_join`, a `full_join` doesn't care about which dataframe is \"left\" and which is \"right\" because all columns are getting combined regardless of which is left vs. right.\n:::\n\n### `anti_join` Example: Keep Only Columns that *Aren't* Shared\n\n:::callout-note\n## Example\n\nIn an `anti_join`, we return rows of the left dataframe that do not have a match in the right dataframe. This can be used to see what will **not** be included in a join. \n\n
\n\nOne case where an `anti_join` is particularly useful is that of \"text mining\" where you have one dataframe with a column of individual words that you've split apart from a larger block of free text. If you also have a dataframe of one column that contains words that you want to remove from your \"actual\" data (e.g., \"and\", \"not\", \"I\", \"me\", etc.), you can `anti_join` the two dataframes to quickly remove all of those unwanted words from your text mining dataframe.\n:::\n\n### Additional Notes\n\n- If we want to join by more than one matching column, we can specify multiple columns with a vector like so: `by = c(\"column1\", \"column2\")`.\n\n- We can also use a named vector, `by = c(\"column_a\" = \"COLUMN_A\")` to match on columns that have different names in each dataframe. \n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/reshape/execute-results/html.json b/_freeze/reshape/execute-results/html.json
new file mode 100644
index 0000000..ea3cd51
--- /dev/null
+++ b/_freeze/reshape/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "435b3ea98ba2256d1a7113cd0a41c830",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Reshape\"\n---\n\n\n## Module Learning Objectives\n\nBy the end of this module, you will be able to:\n\n- Contrast \"long\" data with \"wide\" data \n- Use `tidyr`'s `pivot_wider` and `pivot_longer` functions to reshape data\n\n\n::: {.cell}\n\n:::\n\n\n## Defining \"Shape\"\n\nBefore talking about *how* to reshape your data between wide and long format, let's talk about *what* \"shape\" means in reference to data. Fundamentally, \"long\" data are data with more rows than columns while \"wide\" data tend to have more columns than rows.\n\nFor example, in community ecology a \"wide\" dataframe could have each row being a site that researchers visited while each column could be a different species where the value in the row is the number of individuals of that species at that site. On the other hand, the `penguins` dataframe we've been working with so far is in \"long\" format because it has one row per penguin and multiple penguins are stacked up.\n\nBoth wide and long format data can be useful in certain contexts and it is sometimes most intuitive to reshape data from one form to the other (and sometimes back again to the original form!).\n\n## Reshaping Data\n\nThe `tidyr` package contains the intuitively-named `pivot_wider` and `pivot_longer` for doing exactly this reshaping.\n\nTo help demonstrate these two functions, let's begin by summarizing our dataframe to make changing the shape of the dataframe more visible than it would be with the full dataframe. For example, let's calculate the average bill length of each penguin species on each island.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin by naming our new data and the data they come from\npenguins_simp <- penguins %>%\n # Now group by species and island\n dplyr::group_by(species, island) %>%\n # Calculate average bill length\n dplyr::summarize(avg_bill_length_mm = mean(bill_length_mm, na.rm = TRUE)) %>%\n # And don't forget to ungroup!\n dplyr::ungroup()\n\n# And this is what we're left with:\npenguins_simp\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n species island avg_bill_length_mm\n \n1 Adelie Biscoe 39.0\n2 Adelie Dream 38.5\n3 Adelie Torgersen 39.0\n4 Chinstrap Dream 48.8\n5 Gentoo Biscoe 47.5\n```\n\n\n:::\n:::\n\n\nGreat! We can use this smaller data object to demonstrate reshaping more clearly. Let's begin with an example for `pivot_wider`.\n\n### `pivot_wider` Example: Reshaping to Wide Format\n\n:::callout-note\n## Example\n\n`pivot_wider` takes long format data and reshapes it into wide format.\n\n
\n\nLet's say that we want to take that data object and reshape it into wide format so that each island is a column and each species of penguin is a row. The contents of each cell then are going to be the average bill length values that we just calculated.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin by naming the objects\npenguins_wide <- penguins_simp %>%\n # And now we can pivot wider with `pivot_wider`!\n tidyr::pivot_wider(names_from = island,\n values_from = avg_bill_length_mm )\n\n# Take a look!\npenguins_wide\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 3 × 4\n species Biscoe Dream Torgersen\n \n1 Adelie 39.0 38.5 39.0\n2 Chinstrap NA 48.8 NA \n3 Gentoo 47.5 NA NA \n```\n\n\n:::\n:::\n\n\nGreat! We now have each island as a column, each row is a penguin species, and the bill length measurement we took is included in each cell. Note that in this specific case this makes the number somewhat ambiguous so we might want to use `dplyr`'s `select` or the more specific `rename` to change the island names to be clearer that those values are bill lengths in milimeters.\n:::\n\n### `pivot_longer` Example: Reshaping to Long Format\n\n:::callout-note\n## Example\n\nNow that we have a small wide format data object, we can feed it to `pivot_longer` and reshape our data into long format! `pivot_longer` has very similar syntax *except* that with `pivot_longer` you need to tell the function which columns should be reshaped.\n\n`pivot_wider` on the other hand knows which columns to move around because you manually specify them in the \"names_from\" and \"values_from\" arguments.\n\n
\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin with our wide data\npenguins_wide %>%\n # And reshape back into long format\n pivot_longer(cols = -species,\n names_to = \"island_name\",\n values_to = \"mean_bill_length_mm\" )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 9 × 3\n species island_name mean_bill_length_mm\n \n1 Adelie Biscoe 39.0\n2 Adelie Dream 38.5\n3 Adelie Torgersen 39.0\n4 Chinstrap Biscoe NA \n5 Chinstrap Dream 48.8\n6 Chinstrap Torgersen NA \n7 Gentoo Biscoe 47.5\n8 Gentoo Dream NA \n9 Gentoo Torgersen NA \n```\n\n\n:::\n:::\n\n\nTwo quick things to note here:\n\n- First, `pivot_longer` included the cells that were NA in the wide version of the data.\n - This default behavior is really nice so that you don't lose any cells implicitly (though you can always `filter` them out if you don't want them!).\n- Second, you'll note that in the \"cols\" argument I only told `pivot_longer` to *not* include the \"species\" column using the same notation you could use for the `select` function in the `dplyr` package.\n - This is very handy because it lets us write really concise values in the \"cols\" argument and the default becomes \"everything *except* what was specified\".\n - Note that we could have also said `cols = Biscoe, Dream, Torgersen` and achieved the same reshaping of the data.\n:::\n\n### Challenge: Reshaping\n\n:::callout-important\n## Your Turn!\n\nThe code below creates a data object that includes the flipper length of all Adelie penguins; what code would you add to reshape the data so that each sex is a column with flipper lengths in the cells?\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins %>%\n # Keep only Adelie penguins of known sex\n dplyr::filter(species == \"Adelie\" & !is.na(sex)) %>%\n # Calculate the average flipper length by island and sex\n dplyr::group_by(island, sex) %>%\n dplyr::summarize(avg_flipper_length_mm = mean(flipper_length_mm, na.rm = TRUE)) %>%\n # Ungroup (good practice to include this step!)\n dplyr::ungroup()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 3\n island sex avg_flipper_length_mm\n \n1 Biscoe female 187.\n2 Biscoe male 190.\n3 Dream female 188.\n4 Dream male 192.\n5 Torgersen female 188.\n6 Torgersen male 195.\n```\n\n\n:::\n:::\n\n:::\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/site_libs/clipboard/clipboard.min.js b/_freeze/site_libs/clipboard/clipboard.min.js
new file mode 100644
index 0000000..1103f81
--- /dev/null
+++ b/_freeze/site_libs/clipboard/clipboard.min.js
@@ -0,0 +1,7 @@
+/*!
+ * clipboard.js v2.0.11
+ * https://clipboardjs.com/
+ *
+ * Licensed MIT © Zeno Rocha
+ */
+!function(t,e){"object"==typeof exports&&"object"==typeof module?module.exports=e():"function"==typeof define&&define.amd?define([],e):"object"==typeof exports?exports.ClipboardJS=e():t.ClipboardJS=e()}(this,function(){return n={686:function(t,e,n){"use strict";n.d(e,{default:function(){return b}});var e=n(279),i=n.n(e),e=n(370),u=n.n(e),e=n(817),r=n.n(e);function c(t){try{return document.execCommand(t)}catch(t){return}}var a=function(t){t=r()(t);return c("cut"),t};function o(t,e){var n,o,t=(n=t,o="rtl"===document.documentElement.getAttribute("dir"),(t=document.createElement("textarea")).style.fontSize="12pt",t.style.border="0",t.style.padding="0",t.style.margin="0",t.style.position="absolute",t.style[o?"right":"left"]="-9999px",o=window.pageYOffset||document.documentElement.scrollTop,t.style.top="".concat(o,"px"),t.setAttribute("readonly",""),t.value=n,t);return e.container.appendChild(t),e=r()(t),c("copy"),t.remove(),e}var f=function(t){var e=1Describe the purpose of the pipe operator (`%>%`)\n- Use the pipe operator (`%>%`) to chain multiple functions together\n- Summarize data by using `dplyr`'s `group_by` and `summarize` functions \n\n\n::: {.cell}\n\n:::\n\n\n## Pipe Operator (`%>%`)\n\nBefore diving into the `tidyverse` functions that allow for summarization and group-wise operations, let's talk about the pipe operator (`%>%`). The pipe is from the `magrittr` package and allows chaining together multiple functions without needing to create separate objects at each step as you would have to without the pipe.\n\n### `%>%` Example: Using the Pipe\n\n:::callout-note\n## Example\n\nAs in the other chapters, let's use the \"penguins\" data object found in the `palmerpenguins` package. Let's say we want to keep only specimens that have a measurement for both bill length and bill depth and then remove the flipper and body mass columns.\n\nWithout the pipe--but still using other `tidyverse` functions--we could go about this like this:\n\n::: {.cell}\n\n```{.r .cell-code}\n# Filter out the NAs\npenguins_v2 <- dplyr::filter(.data = penguins,\n !is.na(bill_length_mm) & !is.na(bill_depth_mm))\n\n# Now strip away the columns we don't want\npenguins_v3 <- dplyr::select(.data = penguins_v2, \n -flipper_length_mm, -body_mass_g)\n\n# And we can look at our final product with `base::head`\ndplyr::glimpse(penguins_v3)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 342\nColumns: 6\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie,…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, …\n$ bill_length_mm 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 3…\n$ bill_depth_mm 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 1…\n$ sex male, female, female, female, male, female, male, NA, N…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2…\n```\n\n\n:::\n:::\n\n\nUsing the pipe though we can simplify this code dramatically! Note that each of the following lines must end with the `%>%` so that R knows there are more lines to consider.\n\n::: {.cell}\n\n```{.r .cell-code}\n# We begin with the name of the data object\npenguins %>%\n # Then we can filter the data\n dplyr::filter(!is.na(bill_length_mm) & !is.na(bill_depth_mm)) %>%\n # And strip away the columns we don't want\n dplyr::select(-flipper_length_mm, -body_mass_g) %>%\n # And we can even include the `glimpse` function to see our progress\n dplyr::glimpse()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 342\nColumns: 6\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie,…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, …\n$ bill_length_mm 39.1, 39.5, 40.3, 36.7, 39.3, 38.9, 39.2, 34.1, 42.0, 3…\n$ bill_depth_mm 18.7, 17.4, 18.0, 19.3, 20.6, 17.8, 19.6, 18.1, 20.2, 1…\n$ sex male, female, female, female, male, female, male, NA, N…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2…\n```\n\n\n:::\n:::\n\n\nNote that using the pipe allows each line to inherit the data created by the previous line.\n:::\n\n### Challenge: `%>%`\n\n:::callout-important\n## Your Turn!\n\nUsing pipes, `filter` the data to only include male penguins, `select` only the columns for species, island, and body mass, and `filter` out any rows with NA for body mass.\n:::\n\n### Aside: Fun History of Why `%>%` is a \"Pipe\"\n\n
\n\nThe Belgian painter René Magritte famously created a painting titled \"[The Treachery of Images](https://collections.lacma.org/node/239578)\" featuring a depiction of a smoking pipe above the words \"*Cest ci n'est pas une pipe*\" (French for \"This is not a pipe\"). Magritte's point was about how the depiction of a thing is not equal to thing itself. The `magrittr` package takes its name from the painter because it also includes a pipe that functions slightly differently from a command line pipe and uses different characters. Just like Magritte's pipe, `%>%` both is and isn't a pipe!\n\n## Group-Wise Summarizing\n\nNow that we've covered the `%>%` operator we can use it to do group-wise summarization! Technically this summarization does not *require* the pipe but it does inherently have two steps and thus benefits from using the pipe to chain together those technically separate instructions.\n\nTo summarize by groups we first define our groups using `dplyr`'s `group_by` function and then summarize using `summarize` (also from `dplyr`). `summarize` does require you to specify what calculations you want to perform within your groups though it uses similar syntax to `dplyr`'s `mutate` function.\n\n
\n\nDespite the similarity in syntax between `summarize` and `mutate` there are a few crucial differences:\n\n- `summarize` returns only a single row per group while `mutate` returns as many rows as are in the original dataframe\n- `summarize` will automatically remove any columns that aren't either (1) included in `group_by` or (2) created by `summarize`. `mutate` cannot remove columns so it only creates whatever you tell it to.\n\n### `group_by` + `summarize` Example: Summarize within Groups\n\n:::callout-note\n## Example\n\nBy using the `%>%` with `group_by` and `summarize`, we can calculate some summarized metric within our specified groups. To begin, let's find the average bill depth within each species of penguin.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin with the data and a pipe\npenguins %>%\n # Group by the desired column names\n dplyr::group_by(species) %>%\n # And summarize in the way we desire\n dplyr::summarize(mean_bill_dep_mm = mean(bill_depth_mm, na.rm = TRUE) )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 3 × 2\n species mean_bill_dep_mm\n \n1 Adelie 18.3\n2 Chinstrap 18.4\n3 Gentoo 15.0\n```\n\n\n:::\n:::\n\n\nNotice how the resulting dataframe only contains one row per value in the `group_by` call and only includes the grouping column and the column we created (`mean_bill_dep_mm`)? This reduction in dimensions is an inherent property of `summarize` and can be intensely valuable but be careful you don't accidentally remove columns that you want!\n:::\n\n### `group_by` + `summarize` Example: Calculate Multiple Metrics\n\n:::callout-note\n## Example\n\nLet's say we want to find *multiple* summary values for body mass of each species of penguin on each island. To accomplish this we can do the following:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin with the data and a pipe\npenguins %>%\n # Group by the desired column names\n dplyr::group_by(species, island) %>%\n # And summarize in the way we desire\n dplyr::summarize(\n # Get average body mass\n mean_mass_g = mean(body_mass_g, na.rm = TRUE),\n # Get the standard deviation\n sd_mass = sd(body_mass_g, na.rm = TRUE),\n # Count the number of individual penguins of each species at each island\n n_mass = dplyr::n(),\n # Calculate standard error from SD divided by count\n se_mass = sd_mass / sqrt(n_mass) )\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 6\n# Groups: species [3]\n species island mean_mass_g sd_mass n_mass se_mass\n \n1 Adelie Biscoe 3710. 488. 44 73.5\n2 Adelie Dream 3688. 455. 56 60.8\n3 Adelie Torgersen 3706. 445. 52 61.7\n4 Chinstrap Dream 3733. 384. 68 46.6\n5 Gentoo Biscoe 5076. 504. 124 45.3\n```\n\n\n:::\n:::\n\n\nYou can see that we also invoked the `n` function from `dplyr` to return the size of each group. This function reads any groups created by `group_by` and returns the count of rows in the dataframe for each group level.\n\nJust like `mutate`, `summarize` will allow you to create as many columns as you want. So, if you want metrics calculated within your groups, you only need to define each of them within the `summarize` function.\n:::\n\n### Challenge: `summarize`\n\n:::callout-important\n## Your Turn!\n\nUsing what we've covered so far, find the average flipper length in each year (regardless of any other grouping variable).\n:::\n\n## Grouping Cautionary Note\n\n`group_by` can be extremely useful in summarizing a dataframe or creating a new column without losing rows but you need to be careful. Objects created with `group_by` \"remember\" their groups until you change the groups or use the function `ungroup` from `dplyr`.\n\nLook at how the output of a grouped data object tells you the number of groups in the output (see beneath this code chunk).\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins %>%\n dplyr::group_by(species, island) %>%\n dplyr::summarize(penguins_count = dplyr::n())\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n# Groups: species [3]\n species island penguins_count\n \n1 Adelie Biscoe 44\n2 Adelie Dream 56\n3 Adelie Torgersen 52\n4 Chinstrap Dream 68\n5 Gentoo Biscoe 124\n```\n\n\n:::\n:::\n\n\nThis means that all future uses of that pipe will continue to use the grouping established to create the \"penguins_count\" column. We can stop this by doing the same pipe, but adding `ungroup` after we're done using the grouping established by `group_by`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\npenguins %>%\n dplyr::group_by(species, island) %>%\n dplyr::summarize(penguins_count = dplyr::n()) %>%\n dplyr::ungroup()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 5 × 3\n species island penguins_count\n \n1 Adelie Biscoe 44\n2 Adelie Dream 56\n3 Adelie Torgersen 52\n4 Chinstrap Dream 68\n5 Gentoo Biscoe 124\n```\n\n\n:::\n:::\n\n\nSee? We calculated with our desired groups but then dropped the grouping structure once we were finished with them. Note also that if you use `group_by` and do some calculation then re-group by something else by using `group_by` *again*, the second use of `group_by` **will not be affected** by the first. This means that you only need one `ungroup` per pipe.\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/visualize/execute-results/html.json b/_freeze/visualize/execute-results/html.json
new file mode 100644
index 0000000..3d7d756
--- /dev/null
+++ b/_freeze/visualize/execute-results/html.json
@@ -0,0 +1,17 @@
+{
+ "hash": "7451640f3e8e23cf8b3995b27c02c6b6",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Visualize\"\n---\n\n\n## Module Learning Objectives\n\nBy the end of this module, you will be able to:\n\n- Create a baseline plot with `ggplot2`'s `ggplot` function along with the `aes` helper function.\n- Add desired geometries to your baseline plot with the `geom_...` family of functions\n- Explain the advantage of using the `+` operator to add together multiple plot elements\n- Use `ggplot2`'s `labs` and `scale_fill_manual` helper functions to customize your plot's labels and colors \n- Differentiate `aes`' `fill` and `color` arguments \n- Generate separate plots based on grouping variable(s) with `ggplot2`'s `facet_grid` function\n- Design a custom format for your plots using `ggplot2`'s `theme` function\n\n\n::: {.cell}\n\n:::\n\n\n## `ggplot2` Overview\n\nWhile the bulk of the `tidyverse` is focused on modifying a given data object, `ggplot2` is also a package in the `tidyverse` that is more concerned with--intuitively enough--*plotting* tidy data. `ggplot2` does share some syntax with the functions and packages that we've discussed so far but it also introduces some new elements that we'll discuss as we encounter them.\n\n## Creating a Plot\n\nTo create the foundation for your plot, we'll use the `ggplot` function (note that the package is `ggplot2` while the function name lacks the \"2\"). This function allows you to globally define the data object you're using as well as which variable(s) should be mapped to x and y axes as well as aesthetic parameters--for example, which groups outline or fill color should be inherited from, etc. The `ggplot` function relies on an aesthetics helper function named `aes`. We'll give `ggplot` the data name and `aes`, while we'll pass all relevant variables to `aes` directly. This may be easier to follow once we've covered an example so let's do that now!\n\n:::callout-note\n## Example\n\nJust like the preceding chapters, let's use the `penguins` dataset to demonstrate this plot. Let's create the foundation of a graph that has `year` on the y-axis and `body_mass_g` in the x-axis.\n\nThe `ggplot` function wants both a `data` and `mapping` argument that we'll specify here for clarity but will exclude going forward. `mapping` expects the `aes` function that in turn defines all your variable placements.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = penguins, mapping = aes(x = year, y = body_mass_g))\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/ggplot-no-geom-1.png){width=672}\n:::\n:::\n\n\nIt may not look like it, but this was successful! The `ggplot` function creates the first \"layer\" of the plot including raw axis titles and tick marks but when creating a plot with the `ggplot2` package, choosing *which* plot type is actually done with a separate class of functions called **\"geometries\".**\n:::\n\n## Choosing a Plot Type\n\nNow that we have a baseline plot, we can add desired geometries using the `geom_...` family of functions. Broadly speaking, there is one `geom_...` for every possible way of plotting your data. Want to make a scatter plot? Use `geom_point`. Bar plot? `geom_bar`. Add a best-fit line? `geom_smooth`. When you first begin making plots with `ggplot2` you will likely have to Google which `geom_...` you want (that was certainly what the creators of this workshop did when we started out!) but over time you'll remember them more and more clearly.\n\n### Geometry Aside No. 1 - Adding Plot Elements\n\nYou may have noticed that the core plot is built with `ggplot` and `aes` but each subsequent component is added with one of the `geom_...` functions and realized the gap we haven't talked about yet: how do we combine these separate lines of code? The answer is part of what makes `ggplot` different from the rest of the `tidyverse`. In the rest of the `tidyverse` we chain together multiple lines of code with the `%>%` operator, however, **in `ggplot2` we use `+` to combine separate lines of code.**\n\nThis has a distinct advantage that we'll discuss later but we'll use the `+` in the following example to show its use.\n\n### `geom_...` Example: Adding a Geometry\n\n:::callout-note\n## Example\n\nLet's re-create our `year` by `body_mass_g` plot but let's make it a scatter plot by adding `geom_point`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = penguins, mapping = aes(x = year, y = body_mass_g)) +\n geom_point()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/geom-single-1.png){width=672}\n:::\n:::\n\n\nBecause `aes` gets all of the variable mapping information, we don't need to give *anything* to `geom_point`! This makes adding multiple geometries much easier than if we had to re-specify the variables separately in every geometry!\n:::\n\n### `geom_...` Example: Adding Multiple Geometries\n\n:::callout-note\n## Example\n\nLet's add a best-fit line to this graph to demonstrate how multiple geometries are added.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(data = penguins, mapping = aes(x = year, y = body_mass_g)) +\n geom_point() +\n geom_smooth()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/geom-multiple-1.png){width=672}\n:::\n:::\n\n\nNot a terribly informative graph but the code aptly demonstrates how several geometries can be layered.\n:::\n\n### Geomtery Aside No. 2 - Order Matters\n\nThe heading says it all: *order matters!* The order that you add `geom_...`s to your plot (using `+`) determines which geometries are \"above\" or \"in front of\" others. This is a desirable behavior for plots with multiple geometries but is something to keep in mind! Let's cover an example to clarify this.\n\n### `geom_...` Example: Order\n\n:::callout-note\n## Example\n\nLet's say we want to make a `ggplot` of bill depth within each penguin species and we want both a boxplot and the points that make up the boxplot. This involves using both the `geom_point` and `geom_boxplot` geometries but order will be crucial! To demonstrate our point better, we'll also add a new argument to our `aes`: `fill`. `fill` defines which variable should be used to fill in open spaces (like bar or box plots). Let's create our plot!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = species, y = bill_depth_mm, fill = species)) +\n geom_point() +\n geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/geom-order-1-1.png){width=672}\n:::\n:::\n\n\nOur plot has all the right elements, but the box plots are covering up the points between the ends of the box (the 25th and 75th quartiles). If we change the geometry order we'll get our points \"in front of\" the box plots.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = species, y = bill_depth_mm, fill = species)) +\n geom_boxplot() +\n geom_point()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/geom-order-2-1.png){width=672}\n:::\n:::\n\n\nNow our points can be seen even when they overlap with the box plot. Note that the order of geometries doesn't affect any overlap in either the x or y axis. In our above plot you'll see that our points are so densely stacked that it is difficult to see the whiskers' extent. We must change geometries in order to fix overlap issues like this. Fortunately, we can use `geom_jitter` instead of `geom_point` to make a scatter plot where all points are jittered in the x-axis to make them easier to see.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = species, y = bill_depth_mm, fill = species)) +\n geom_boxplot() +\n # The width argument lets us specify how far apart (in the x-axis) we want points to jitter\n geom_jitter(width = 0.25)\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/geom-order-3-1.png){width=672}\n:::\n:::\n\n\nGreat! Now we can see both the points and the box plots because we've achieved **(1) the order of geometries** and **(2) the type of geometries** that we need in this case.\n:::\n\n### Challenge: Geometries\n\n:::callout-important\n## Your Turn!\n\nUsing `ggplot2`, create a scatter plot of `bill_depth_mm` against `body_mass_g`.\n:::\n\n## Advantage of the `+`\n\nWhen we first introduced the `+` for adding together multiple plot elements we said that it had a key advantage but were vague about what exactly that benefit was. **The `+` allows you to build plots stepwise by adding elements to an object**. This can be extremely useful when you're experimenting with a plot as you could make several different objects, each with increasingly more plot elements, and allow all of them to inherit the same fundamental architecture.\n\nFor users who are writing their own functions, **the `+` allows you to build plots using conditions specified by arguments**. Say you want to write a function that always makes the same type of `ggplot` but you want to allow users to choose whether a best-fit line is added. Because of the `+` you can do exactly this!\n\nLet's cover a quick example of the value of stepwise plot creation.\n\n### Stepwise Plot Addition Example\n\n:::callout-note\n## Example\n\nLet's create a plot of bill length against bill depth and color the points by penguin species. First, we create the foundation of our plot without any geometries and assign it to an object using the `<-` operator.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_v1 <- ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species))\nplot_v1\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/stepwise-1-1.png){width=672}\n:::\n:::\n\n\nNow let's add points to this blank plot using `geom_point` (in this case we *do not* want `geom_jitter` because it changes the x-position of points).\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_v2 <- plot_v1 + \n geom_point()\nplot_v2\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/stepwise-2-1.png){width=672}\n:::\n:::\n\n\nNow we can further add a linear regression line (`geom_smooth(method = \"lm\")`) for each species' bill depth vs. length relationship. Because we colored by species earlier, `geom_smooth` will automatically create separate regression lines! Pretty cool, right?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nplot_v3 <- plot_v2 + \n geom_smooth(method = \"lm\")\nplot_v3\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/stepwise-3-1.png){width=672}\n:::\n:::\n\n\nGreat! We could have added the `ggplot` and `geom_point` and `geom_smooth` lines together at the same time but the `+` allows us to build graphics step-by-step if we so desire.\n:::\n\n## Customizing Plots\n\nThere are too many helper functions included in `ggplot2` to cover them all here but we can cover two that you may find particularly helpful: `labs` and `scale_fill_manual`.\n\n### Customizing - Labels\n\n`labs` is `ggplot2`'s one-stop shop solution to customizing your axis labels and plot title. Add `labs` in the same way you would add a geometry to add custom axis labels to your plot. This is especially useful if your column names are highly abbreviated or otherwise difficult for those outside of your group to interpret.\n\n### `labs` Example: Adding Labels\n\n:::callout-note\n## Example\n\nLet's make a plot of the number of penguins per island per species with more informative axis labels and a plot title. We'll need to calculate that quickly but we can leverage the tools we practiced in the \"Summarize\" chapter to do that.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Count penguins\npenguin_count <- penguins %>%\n dplyr::group_by(species, island) %>%\n dplyr::summarize(total = dplyr::n()) %>%\n dplyr::ungroup()\n\n# Take a look at that object\ndplyr::glimpse(penguin_count)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 5\nColumns: 3\n$ species Adelie, Adelie, Adelie, Chinstrap, Gentoo\n$ island Biscoe, Dream, Torgersen, Dream, Biscoe\n$ total 44, 56, 52, 68, 124\n```\n\n\n:::\n:::\n\n\nNow that we have that object, let's make a bar plot of penguins on each island and let's color by penguin species.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguin_count, aes(x = island, y = total, fill = species)) +\n # `stat` argument defines whether you've already done the calculation or you want `ggplot` to try to do it itself\n geom_bar(stat = 'identity') +\n labs(x = \"Island Name\",\n y = \"Penguins Measured\",\n title = \"Number Penguins Counted per Island and Species\")\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/labs-actual-1.png){width=672}\n:::\n:::\n\n\nWe now have much more human-readable axis labels and an informative title! `labs` can also be used to add a subtitle, caption, or even alt text (alternative text is special text embedded in an image that is useful for people who require a screen reader to describe visual elements). See `?labs` for more information on the other labels it controls.\n:::\n\n### Customizing - Using Custom Colors\n\nAs we've progressed through this training you may have noticed the classic `ggplot2` default colors (the first three are a mild red, blue, and green). It is often the case however that we want to choose our own colors; this is where `scale_fill_manual` becomes needed.\n\nThis function is added in the same way that the `geom_...` and `labs` functions are added and requires a vector of the colors that you want to use instead of the defaults. There are two key conditions to keep in mind when specifying your colors. First, **you must supply as many colors as there are groups to color _by_**; `ggplot2` will not fill in defaults if you supply too few colors and won't guess which to drop if you give too many. Second, **the order that you provide colors in must be correct** OR **you must \"name\" each color to assign it to a specific group**. To \"name\" the color, use the following syntax: `c(groupA = color1, groupB = color2, ...`. Order can be a simple vector but it is often nice to be explicit about which group should be which color.\n\nLet's return to our bar plot above and change the colors used for our bar plot.\n\n### `scale_fill_manual` Example: Specify Fill\n\n:::callout-note\n## Example\n\nLet's re-create the plot from the previous example and specify the colors manually using `scale_fill_manual`\n\n\n::: {.cell warnings='false'}\n\n```{.r .cell-code}\n# Create the plot\nggplot(penguin_count, aes(x = island, y = total, fill = species)) +\n geom_bar(stat = 'identity') +\n # Specify labels\n labs(x = \"Island Name\", y = \"Penguins Measured\") +\n # Choose colors!\n scale_fill_manual(values = c(\"red\", \"orange\", \"yellow\"))\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/scale-fill-1.png){width=672}\n:::\n:::\n\n\nTwo quick notes before we move on:\n\n1. You can use \"hexadecimal codes\" for colors (sort of a machine-readable way of specifying colors) in this function. We find the website [colorbrewer2.org](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3) particularly useful because it groups colors into palettes and includes a \"colorblind safe\" check box that limits your options to only those that will be accessible to all viewers. You can also download the `RColorBrewer` package to easily incorporate its many color palettes into your visualizations!\n\n2. Feel free to check out [NCEAS' R color cheatsheet](https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf) for more color options.\n:::\n\n### Customizing - `color` versus `fill`\n\nOne nuance of `ggplot2` aesthetics that we should cover before continuing is the difference between \"color\" and \"fill\". If used in the `aes` function, both of these at first glance seem to dictate what we might think of as \"plot color.\" In the example above, we used `scale_fill_manual` to specify the color of the bars in our bar plot.\n\nPut clearly: `ggplot2` defines \"fill\" versus \"color\" as follows:\n\n- `fill` = controls **color inside** of a hollow shape\n- `color` = controls **color on the _outside_** of a hollow shape *or* the **entire color of a solid shape**\n - From `ggplot2`'s point of view, a solid shape is essentially a shape that is all outline and both real outlines and non-hollow shapes fall under `color`'s jurisdiction\n\nLet's make some example plots to demonstrate this distinction.\n\n:::callout-note\n## Example\n\nLet's make a boxplot of `bill_length_mm` across penguin `species` and set `fill` to `species`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = species, y = bill_length_mm,\n fill = species)) +\n geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/color-vs-fill-1-1.png){width=672}\n:::\n:::\n\n\nNotice how the *interior* of the boxes is colored by species? Watch what happens when we make the same plot again but change `fill = ` to `color = ` in the `aes` function:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = species, y = bill_length_mm,\n color = species)) +\n geom_boxplot()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/color-vs-fill-2-1.png){width=672}\n:::\n:::\n\n\nSee how the interior of the box plots now defaults to white but the outline edges become colored by `species`? Let's demonstrate this again with a solid object to show `fill` versus `color` in another context.\n\nLet's create a scatterplot of `bill_length_mm` by `bill_depth_mm` and `color` by `species`\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm,\n color = species)) +\n geom_point()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/color-vs-fill-3-1.png){width=672}\n:::\n:::\n\n\nThe default point shape for a `ggplot2` scatterplot is a solid dot so the points are now fully colored by `species` (see `?pch` for other point options). What if we changed `color = ` to `fill = ` though?\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm,\n fill = species)) +\n geom_point()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/color-vs-fill-4-1.png){width=672}\n:::\n:::\n\n\nInterestingly, `ggplot2` knows that you're trying to differentiate by `species` so it still returns a legend but unfortunately all three species' points defaulted to black so we can't see the difference! If we change the points to be hollow however...\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = bill_depth_mm, y = bill_length_mm,\n fill = species)) +\n geom_point(shape = 21)\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/color-vs-fill-5-1.png){width=672}\n:::\n:::\n\n\nWe now get our points *filled* with the correct color with a black outline to each point!\n\nIf you ever try to manually specify one or the other and your plot looks the same, chances are you're mis-specifying `color` for `fill` or vice versa. Double check that and you will often resolve the issue\n:::\n\n### Challenge: Customizing\n\n:::callout-important\n## Your Turn!\n\nUsing `ggplot2`, create a scatter plot of `bill_depth_mm` against `body_mass_g` **and** where the points are colored based on the `sex` of the penguin. Make female penguins' points **red** and male penguins' points **blue**. Also, give both axes manually-specified titles (i.e., not the raw column names!).\n:::\n\n## Separating Plots by a Variable\n\nSometimes it is useful to create a plot but separate out the data by one of the columns. `ggplot2` includes the `facet_...` family of functions to accomplish this. `facet`ing a plot creates several panels that have the same labels and data but separated by whatever variable(s) you give to the `facet_...` function in question.\n\n### `facet_grid` Example\n\n:::callout-note\n## Example\n\nLet's return to our multi-colored bill length vs. bill depth plot to demonstrate `facet`ing in action. We'll create the same plot but this time we'll `facet` by island so that the penguins of each island are included in separate panes.\n\nThe syntax of `facet_grid` is as follows: `[column for columns of plots] ~ [column for rows of plots]`. If you only want rows *or* columns of your plot panes, replace that side of the `~` with a period.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +\n geom_point() +\n geom_smooth(method = \"lm\") +\n # Let's also add better axis labels\n labs(x = \"Bill Length (mm)\", y = \"Bill Depth (mm)\") +\n # Now facet by island\n facet_grid(. ~ island)\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/facet-1.png){width=672}\n:::\n:::\n\n\nNote that the values within `island` that make up each pane are included in the dark gray bar at the top of each pane. This behavior is automatic and makes your separated plots much more interpretable.\n:::\n\n### Challenge: `facet_grid`\n\n:::callout-important\n## Your Turn!\n\nUsing the plot you created for the previous challenge (`bill_depth_mm` vs. `body_mass_g` and `color = sex`, etc.), facet by `species` of penguin. Note that you can choose whether you want your plot panels stacked vertically or horizontally.\n:::\n\n## Changing Thematic Properties\n\nIn all of our preceding examples the plots have a characteristic `ggplot2` \"feel\" where they all have gray backgrounds with white grid lines of varying thickness depending on whether they are minor or major. If `color` or `fill` is specified in `aes` the legend is on the right and in all plots the axis labels and tick marks are (in our opinion) a very small font size.\n\n`ggplot2` placed all of the power to modify any of these parameters into a single, comprehensive function: `theme`. `theme` allows users to specify nearly any formatting component and modify it as they desire. Just like the `geom_...` functions, Google will very much be your friend as you try to remind yourself about the breadth of possibility within `theme` but over time you will become more comfortable and confident!\n\nLet's cover an example to demonstrate a few components of `theme`.\n\n### `theme` Example: Manual Specification\n\n:::callout-note\n## Example\n\nLet's create a plot of bill length versus flipper length and colored by penguin species.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot <- ggplot(penguins, aes(x = flipper_length_mm, y = bill_length_mm,\n color = species)) +\n geom_point() +\n geom_smooth(method = \"lm\") +\n # Let's also add better axis labels\n labs(x = \"Flipper Length (mm)\", y = \"Bill Length (mm)\")\n\ncore_plot\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-1-1.png){width=672}\n:::\n:::\n\n\nNow that we have this plot, let's change three things using theme:\n1. Increase axis text size\n2. Increase axis *title* text size\n3. Move the legend into the bottom right of the plot\n\nTo accomplish this, we'll add a `theme` to the `core_plot` object that we created above and specify each of those changes within that function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot +\n # Add `theme` to the plot\n theme(\n # Tell it the X/Y coordinates of the legend\n legend.position = c(0.9, 0.15),\n # Now change the text size of the axis ticks and labels\n axis.title = element_text(size = 18),\n axis.text = element_text(size = 14) )\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-2-1.png){width=672}\n:::\n:::\n\n\nNote that for certain parameters, you must also specify one of the `element_...` family of functions. `element_text` governs all things related to text (i.e., size, font, etc.) while `element_rect` changes things related to background boxes of plot components. There are other `element_...` functions that you'll encounter as you use Google and Stack Overflow (a coding help website Google will frequently identify when you search for error explanations) so rest assured that you'll be able to find support documentation for these when you need it.\n:::\n\n### `theme` Example: Pre-Packaged Themes\n\nIf that all seemed pretty involved, don't worry! `ggplot2` comes with several `theme_...` functions that are pre-built and modify a lot of these parameters in desirable ways without necessitating you doing a deep dive into Google. For instance, `theme_bw` or `theme_classic` are both great options that we'll demonstrate below.\n\n:::callout-note\n## Example\n\nLet's re-use the `core_plot` object we built above to quickly demonstrate `theme_bw` and `theme_classic`. You've likely seen the default `ggplot2` theme *ad nauseam* at this point but let's show it one more time for comparative purposes.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-default-1.png){width=672}\n:::\n:::\n\n\nNow let's call the same plot but add `theme_bw` to it.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot +\n theme_bw()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-bw-1.png){width=672}\n:::\n:::\n\n\nNot bad! Finally, let's check out `theme_classic`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot + \n theme_classic()\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-classic-1.png){width=672}\n:::\n:::\n\n\nEven better! If you'd like, you can use a pre-built `theme` *and* specify additional elements yourself. Let's add the `theme` content we came up with in the previous example to the `theme_classic` plot.\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncore_plot + \n # Add the 'classic' theme\n theme_classic() +\n # As before, change the legend position and axis text size\n theme(legend.position = c(0.9, 0.15),\n axis.title = element_text(size = 18),\n axis.text = element_text(size = 14) )\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-classic-manual-1.png){width=672}\n:::\n:::\n\n\nNow that is looking publication-quality! One quick note though: **whichever theme specification is last \"wins\".** This means that if you have a `theme` call that sets font size to 14 and then add a second `theme` beneath it that sets font size to 30, the font size of your plot will be 30. This is part of why you can add pre-built themes and your own custom modifications together but is good to keep in mind if you're sure your custom theme is specified correctly but isn't showing up in your plot.\n:::\n\n### Challenge: `theme`\n\n:::callout-important\n## Your Turn!\n\nUsing the plot you created for the previous challenge (`bill_depth_mm` vs. `body_mass_g` and `color = sex`, etc.), (1) add the `theme_bw` pre-built theme, (2) increase the axis title size to 16, and (3) move the legend into the top left of the plot.\n:::\n\n### Combining `theme` and `+`\n\nWe discussed the strengths of `ggplot2`'s `+` operator earlier but it bears repeating here now that we've covered `theme`. An extremely helpful combination of these principles allows you to **save all of your theme editing into an object and then add that object to all the different plots you make for the project!** This would guarantee that all of your plots have the same format without you needing to re-write that formatting for every plot. We'll demonstrate briefly below.\n\n### Example: `theme` & `+`\n\n:::callout-note\n## Example\n\nLet's begin by defining our theme that we want all plots to use. For simplicity's sake, let's use the `theme` code outlined in the previous example.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Begin with `theme_classic` and then add on our custom modifications\nmy_theme <- theme_classic() +\n # As before, we're moving the legend and increasing axis font size\n theme(legend.position = c(0.7, 0.8),\n axis.title = element_text(size = 18),\n axis.text = element_text(size = 14) )\n```\n:::\n\n\nNow we can make a handful of plots and add the `my_theme` object to each of them!\n\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = year, y = bill_depth_mm, color = sex)) +\n geom_jitter(width = 0.1) +\n labs(x = \"Year\", y = \"Bill Depth (mm)\",\n title = \"Bill Depth x Year x Sex\") +\n my_theme\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-plus-1-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = year, y = bill_length_mm, color = island)) +\n geom_jitter(width = 0.1) +\n labs(x = \"Year\", y = \"Bill Length (mm)\",\n title = \"Bill Length x Year x Island\") +\n my_theme\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-plus-2-1.png){width=672}\n:::\n:::\n\n::: {.cell}\n\n```{.r .cell-code}\nggplot(penguins, aes(x = year, y = flipper_length_mm, color = species)) +\n geom_jitter(width = 0.1) +\n labs(x = \"Year\", y = \"Flipper Length (mm)\",\n title = \"Flipper Length x Year x Species\") +\n my_theme\n```\n\n::: {.cell-output-display}\n![](visualize_files/figure-html/theme-plus-3-1.png){width=672}\n:::\n:::\n\n\nThis can be a great way of making your figure creation process dramatically more efficient when it becomes time to shift from exploratory plots to publication-quality figures!\n:::\n",
+ "supporting": [
+ "visualize_files"
+ ],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file
diff --git a/_freeze/visualize/figure-html/color-vs-fill-1-1.png b/_freeze/visualize/figure-html/color-vs-fill-1-1.png
new file mode 100644
index 0000000..ca815a8
Binary files /dev/null and b/_freeze/visualize/figure-html/color-vs-fill-1-1.png differ
diff --git a/_freeze/visualize/figure-html/color-vs-fill-2-1.png b/_freeze/visualize/figure-html/color-vs-fill-2-1.png
new file mode 100644
index 0000000..5507952
Binary files /dev/null and b/_freeze/visualize/figure-html/color-vs-fill-2-1.png differ
diff --git a/_freeze/visualize/figure-html/color-vs-fill-3-1.png b/_freeze/visualize/figure-html/color-vs-fill-3-1.png
new file mode 100644
index 0000000..763c390
Binary files /dev/null and b/_freeze/visualize/figure-html/color-vs-fill-3-1.png differ
diff --git a/_freeze/visualize/figure-html/color-vs-fill-4-1.png b/_freeze/visualize/figure-html/color-vs-fill-4-1.png
new file mode 100644
index 0000000..1b95ece
Binary files /dev/null and b/_freeze/visualize/figure-html/color-vs-fill-4-1.png differ
diff --git a/_freeze/visualize/figure-html/color-vs-fill-5-1.png b/_freeze/visualize/figure-html/color-vs-fill-5-1.png
new file mode 100644
index 0000000..d829b9e
Binary files /dev/null and b/_freeze/visualize/figure-html/color-vs-fill-5-1.png differ
diff --git a/_freeze/visualize/figure-html/facet-1.png b/_freeze/visualize/figure-html/facet-1.png
new file mode 100644
index 0000000..255e18f
Binary files /dev/null and b/_freeze/visualize/figure-html/facet-1.png differ
diff --git a/_freeze/visualize/figure-html/geom-multiple-1.png b/_freeze/visualize/figure-html/geom-multiple-1.png
new file mode 100644
index 0000000..d87277a
Binary files /dev/null and b/_freeze/visualize/figure-html/geom-multiple-1.png differ
diff --git a/_freeze/visualize/figure-html/geom-order-1-1.png b/_freeze/visualize/figure-html/geom-order-1-1.png
new file mode 100644
index 0000000..beef585
Binary files /dev/null and b/_freeze/visualize/figure-html/geom-order-1-1.png differ
diff --git a/_freeze/visualize/figure-html/geom-order-2-1.png b/_freeze/visualize/figure-html/geom-order-2-1.png
new file mode 100644
index 0000000..b60bb76
Binary files /dev/null and b/_freeze/visualize/figure-html/geom-order-2-1.png differ
diff --git a/_freeze/visualize/figure-html/geom-order-3-1.png b/_freeze/visualize/figure-html/geom-order-3-1.png
new file mode 100644
index 0000000..0048447
Binary files /dev/null and b/_freeze/visualize/figure-html/geom-order-3-1.png differ
diff --git a/_freeze/visualize/figure-html/geom-single-1.png b/_freeze/visualize/figure-html/geom-single-1.png
new file mode 100644
index 0000000..33bc50c
Binary files /dev/null and b/_freeze/visualize/figure-html/geom-single-1.png differ
diff --git a/_freeze/visualize/figure-html/ggplot-no-geom-1.png b/_freeze/visualize/figure-html/ggplot-no-geom-1.png
new file mode 100644
index 0000000..8ee7e05
Binary files /dev/null and b/_freeze/visualize/figure-html/ggplot-no-geom-1.png differ
diff --git a/_freeze/visualize/figure-html/labs-actual-1.png b/_freeze/visualize/figure-html/labs-actual-1.png
new file mode 100644
index 0000000..68bcfbd
Binary files /dev/null and b/_freeze/visualize/figure-html/labs-actual-1.png differ
diff --git a/_freeze/visualize/figure-html/scale-fill-1.png b/_freeze/visualize/figure-html/scale-fill-1.png
new file mode 100644
index 0000000..36670b4
Binary files /dev/null and b/_freeze/visualize/figure-html/scale-fill-1.png differ
diff --git a/_freeze/visualize/figure-html/stepwise-1-1.png b/_freeze/visualize/figure-html/stepwise-1-1.png
new file mode 100644
index 0000000..abf2635
Binary files /dev/null and b/_freeze/visualize/figure-html/stepwise-1-1.png differ
diff --git a/_freeze/visualize/figure-html/stepwise-2-1.png b/_freeze/visualize/figure-html/stepwise-2-1.png
new file mode 100644
index 0000000..ac93dcf
Binary files /dev/null and b/_freeze/visualize/figure-html/stepwise-2-1.png differ
diff --git a/_freeze/visualize/figure-html/stepwise-3-1.png b/_freeze/visualize/figure-html/stepwise-3-1.png
new file mode 100644
index 0000000..9cf75a9
Binary files /dev/null and b/_freeze/visualize/figure-html/stepwise-3-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-1-1.png b/_freeze/visualize/figure-html/theme-1-1.png
new file mode 100644
index 0000000..4929475
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-1-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-2-1.png b/_freeze/visualize/figure-html/theme-2-1.png
new file mode 100644
index 0000000..42ae6e9
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-2-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-bw-1.png b/_freeze/visualize/figure-html/theme-bw-1.png
new file mode 100644
index 0000000..ed125b6
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-bw-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-classic-1.png b/_freeze/visualize/figure-html/theme-classic-1.png
new file mode 100644
index 0000000..61976d1
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-classic-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-classic-manual-1.png b/_freeze/visualize/figure-html/theme-classic-manual-1.png
new file mode 100644
index 0000000..8bcdc95
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-classic-manual-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-default-1.png b/_freeze/visualize/figure-html/theme-default-1.png
new file mode 100644
index 0000000..4929475
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-default-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-plus-1-1.png b/_freeze/visualize/figure-html/theme-plus-1-1.png
new file mode 100644
index 0000000..4cfa03b
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-plus-1-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-plus-2-1.png b/_freeze/visualize/figure-html/theme-plus-2-1.png
new file mode 100644
index 0000000..3aa0ecd
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-plus-2-1.png differ
diff --git a/_freeze/visualize/figure-html/theme-plus-3-1.png b/_freeze/visualize/figure-html/theme-plus-3-1.png
new file mode 100644
index 0000000..f740966
Binary files /dev/null and b/_freeze/visualize/figure-html/theme-plus-3-1.png differ
diff --git a/_freeze/wrangle/execute-results/html.json b/_freeze/wrangle/execute-results/html.json
new file mode 100644
index 0000000..bf770a0
--- /dev/null
+++ b/_freeze/wrangle/execute-results/html.json
@@ -0,0 +1,15 @@
+{
+ "hash": "a355af1b2946b7cf053b2d71e5ac0292",
+ "result": {
+ "engine": "knitr",
+ "markdown": "---\ntitle: \"Wrangle\"\n---\n\n\n## Module Learning Objectives\n\nBy the end of this module, you will be able to:\n\n- Manipulate rows and columns with `dplyr`'s `select` and `filter` functions\n- Create new columns with `dplyr`'s `mutate` function and fill them conditionally with `case_when` (also from `dplyr`) \n- Use `tidyr`'s `separate_wider_delim` function to split a column into two\n\n## What are Tidy Data?\n\nWhat are some common things you like to do with your data? Maybe remove rows or columns, do calculations and add the results as new columns? These operations (and others) are called \"data wrangling\". The data we get to work with are rarely, if ever, in the format we need to do our analyses and data wrangling can help bridge that gap. `dplyr` and `tidyr` are two R packages from the `tidyverse` that provide a fairly complete and extremely powerful set of functions for us to do virtually all needed wrangling quickly. Here we introduce some commonly used functions from these two packages.\n\nWe can use `glimpse` from the `dplyr` package to look at part of the data while also getting some relevant structural information (i.e., what type of data are in each column, etc.).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# install.packages(\"tidyverse\", \"palmerpenguins\")\nlibrary(tidyverse)\nlibrary(palmerpenguins)\n\ndplyr::glimpse(penguins)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 8\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n```\n\n\n:::\n:::\n\n\n## Selecting Columns\n\nTo start off, how do we do a fundamental action like selecting the columns we want? `dplyr`'s `select` function provides us with a straightforward way to do just that. We only need to provide the column names!\n\n
\n\nNote that even if you `select` just one column, a dataframe will be returned. Whereas if you use the `$` operator you get a vector (e.g., `data$column` returns a vector, not a dataframe).\n\n### `select` Example: Including & Excluding\n\n::: callout-note \n## Example\n\nTo select only the `species`, `island`, and `body_mass_g` columns, we can use the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Provide the name of the data and then the columns that you want!\npenguins_selected <- dplyr::select(.data = penguins, species, island, body_mass_g)\n\n# Look at the product\ndplyr::glimpse(penguins_selected)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 3\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Ad…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, Tor…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, …\n```\n\n\n:::\n:::\n\n\nIf we want to remove specific columns, we can use the `-` operator.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Put a \"-\" in front of each column you would like to remove from your dataframe\npenguins_selected <- dplyr::select(.data = penguins, -flipper_length_mm, -sex)\n\n# Look at the product\ndplyr::glimpse(penguins_selected)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 6\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adelie,…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgersen, …\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, 42.…\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, 20.…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 425…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2…\n```\n\n\n:::\n:::\n\n:::\n\n### `select` Example: Selecting with Helper Functions\n\n::: callout-note\n## Example\n\nIf we want to select the columns that contain length measurements, we can manually type `bill_length_mm` and `flipper_length_mm`, but there's actually an easier way using the `contains` function, also from the `dplyr` package. Enter a string that matches what you're looking for among the column names.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Enter a matching string inside of a `select` call\npenguins_selected <- dplyr::select(.data = penguins, dplyr::contains(\"length\"))\n\n# Look at the product\ndplyr::glimpse(penguins_selected)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 2\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n```\n\n\n:::\n:::\n\n:::\n\n### `select` Example: Selecting a Range of Columns\n\n:::callout-note\n## Example\n\nNow what if we wanted all the columns from the first column `species` to the sixth column `body_mass_g`? We can use a colon, `:`, between the first (leftmost) and last (rightmost) columns in the range that we want to include.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Use a colon to indicate a range of columns you want to select\npenguins_selected <- dplyr::select(.data = penguins, species:body_mass_g)\n\n# Look at the product\ndplyr::glimpse(penguins_selected)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 6\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n```\n\n\n:::\n:::\n\n\nConveniently selecting a range of columns can be especially useful when you have a big dataframe and don't want to exhaustively list every column by name.\n:::\n\n## Subsetting Rows\n\nInstead of selecting certain columns, how can we get a subset of rows that meet certain conditions? For example, in the diagram below, how can we filter for rows that contain a diamond shape? We can use `dplyr`'s handy `filter` function along with logical and boolean operators!\n\n
\n\nFor reference, here are the operators we can use to specify our conditions with `filter`.\n\n
\n\nYou may have noticed that `filter` accepts the same operators that base R's `subset` function does. This is no accident and `filter` is one of the more accessible `tidyverse` functions because of the syntax it shares with its base R equivalent.\n\nTo get familiar with these operators, let's see some examples!\n\n### `filter` Example: Exactly Equal\n\n:::callout-note\n## Example\n\nTo make a subset of our data that only contains information on Chinstrap penguins, we would use the `==` operator for \"exactly equal to\"\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get all the rows where the species is \"Chinstrap\" \npenguins_filtered <- dplyr::filter(.data = penguins, species == \"Chinstrap\")\n\nhead(penguins_filtered)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 8\n species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g\n \n1 Chinstrap Dream 46.5 17.9 192 3500\n2 Chinstrap Dream 50 19.5 196 3900\n3 Chinstrap Dream 51.3 19.2 193 3650\n4 Chinstrap Dream 45.4 18.7 188 3525\n5 Chinstrap Dream 52.7 19.8 197 3725\n6 Chinstrap Dream 45.2 17.8 198 3950\n# ℹ 2 more variables: sex , year \n```\n\n\n:::\n:::\n\n\nNote that we need to write the value we're looking for as a character string bookended by quotation marks.\n:::\n\n### `filter` Example: Either / Or\n\n:::callout-note\n## Example\n\nWhat if we wanted to get all the rows where the penguin species is \"Chinstrap\" **or** \"Gentoo\"? In other words, we want all the rows where *either* condition is true. There are two options to do this. The first option is to use the \"or\" operator (`|`) between each of the conditions.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get all the rows where the species is \"Chinstrap\" or \"Gentoo\"\npenguins_filtered <- dplyr::filter(.data = penguins,\n species == \"Chinstrap\" | species == \"Gentoo\")\n\nunique(penguins_filtered$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] Gentoo Chinstrap\nLevels: Adelie Chinstrap Gentoo\n```\n\n\n:::\n:::\n\n\nThis method works fine for a few options but begins to get cumbersome when you have many possible conditions that you'd like to retain. In these cases you can use the `%in%` operator followed by a vector of values that you want to include in your filter.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get all the rows where the species is \"Chinstrap\" or \"Gentoo\"\npenguins_filtered <- dplyr::filter(.data = penguins,\n species %in% c(\"Chinstrap\", \"Gentoo\"))\n\nunique(penguins_filtered$species)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] Gentoo Chinstrap\nLevels: Adelie Chinstrap Gentoo\n```\n\n\n:::\n:::\n\n:::\n\n### `filter` Example: Multiple Conditions\n\n:::callout-note\n## Example\n\nWe can also keep rows where both conditions are met by using the `&` operator to specify multiple conditions that must *all* be true. To keep only the rows where the species is \"Adelie\" **and** the island is \"Dream\", we can use the following code:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get all the rows where the species is \"Adelie\" and the island is \"Dream\"\npenguins_filtered <- dplyr::filter(.data = penguins,\n species == \"Adelie\" & island == \"Dream\")\n\ndplyr::glimpse(penguins_filtered)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 56\nColumns: 8\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Dream, Dream, Dream, Dream, Dream, Dream, Dream, Dre…\n$ bill_length_mm 39.5, 37.2, 39.5, 40.9, 36.4, 39.2, 38.8, 42.2, 37.6…\n$ bill_depth_mm 16.7, 18.1, 17.8, 18.9, 17.0, 21.1, 20.0, 18.5, 19.3…\n$ flipper_length_mm 178, 178, 188, 184, 195, 196, 190, 180, 181, 184, 18…\n$ body_mass_g 3250, 3900, 3300, 3900, 3325, 4150, 3950, 3550, 3300…\n$ sex female, male, female, male, female, male, male, fema…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n```\n\n\n:::\n:::\n\n:::\n\n### `filter` Example: Greater Than / Less Than\n\n:::callout-note\n## Example\n\nWhen subsetting by numeric columns, we can use greater than (`>`) and less than (`<`) to capture the range of possible values that meet that criteria. If you want to include an \"or equal to\" clause, just add an equal sign to the right of the greater/less than sign (e.g., `>=` or `<=`). \n\nFor instance, we can subset the data for only penguins whose bills are longer than 50 millimeters.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Filter based on bill length\npenguins_filtered <- dplyr::filter(.data = penguins, bill_length_mm > 50)\n\nsort(penguins_filtered$bill_length_mm)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [1] 50.1 50.1 50.2 50.2 50.2 50.3 50.4 50.4 50.5 50.5 50.5 50.5 50.5 50.6 50.7\n[16] 50.7 50.8 50.8 50.8 50.8 50.9 50.9 51.0 51.1 51.1 51.3 51.3 51.3 51.3 51.4\n[31] 51.5 51.5 51.7 51.9 52.0 52.0 52.0 52.1 52.2 52.2 52.5 52.7 52.8 53.4 53.5\n[46] 54.2 54.3 55.1 55.8 55.9 58.0 59.6\n```\n\n\n:::\n:::\n\n\nNote that when filtering for numeric columns we do not need the quotation marks around the number(s) we use to filter.\n:::\n\n### `filter` Example: Exclusion Criteria\n\n:::callout-note\n## Example\n\nSometimes it's faster to subset the rows that **do not** meet a condition, rather than listing everything that we do want to keep. This is where the `!=` operator (or \"not equal to\") becomes useful. More generally, the exclamation mark indicates negation in the operator.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Get all the rows where the species is NOT \"Chinstrap\"\npenguins_filtered <- dplyr::filter(.data = penguins, species != \"Chinstrap\")\n\ndplyr::glimpse(penguins_filtered)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 276\nColumns: 8\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n```\n\n\n:::\n:::\n\n:::\n\n### Challenge: `filter`\n\n:::callout-important\n## Your Turn!\n\nUsing `filter`, how would you get all of the rows that **do not** have any `NA` values in the `sex` column?\n:::\n\n## Making and Modifying Columns\n\nAside from selecting columns and subsetting rows, we may want to create new columns in our data. For instance, in the diagram below, we have a dataframe that only contains column A, and then we add new columns B and C. We can use `dplyr`'s `mutate` function to add a new column, while keeping the existing columns.\n\n
\n\nThe general syntax to add a new column to your dataframe is as follows:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nyour_data_v2 <- dplyr::mutate(.data = your_data, new_column_name = what_it_contains)\n```\n:::\n\n\n### `mutate` Example: Making New Columns\n\n:::callout-note\n## Example\n\nIf we wanted to add a new column that has the penguin's body mass in kilograms, we can do some arithmetic on the `body_mass_g` column and store the result in a new column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create a new column with the penguins' body mass in kilograms\npenguins_mutated <- dplyr::mutate(.data = penguins, body_mass_kg = body_mass_g / 1000)\n\ndplyr::glimpse(penguins_mutated)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 9\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n$ body_mass_kg 3.750, 3.800, 3.250, NA, 3.450, 3.650, 3.625, 4.675,…\n```\n\n\n:::\n:::\n\n:::\n\n### `mutate` Example: Overwriting Existing Columns\n\n:::callout-note\n## Example\n\nAdditionally, `mutate` can be used to overwrite an existing column. If we give the new column the same name as an existing column, the existing column will be **replaced**. As you can see, `island` is currently a factor. To change its class to a character, we would need to overwrite the column.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Check current format of the `island` column\nclass(penguins$island)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"factor\"\n```\n\n\n:::\n\n```{.r .cell-code}\n# Modify the existing island column\npenguins_mutated <- dplyr::mutate(.data = penguins, island = as.character(island))\n\n# the `island` column is now a character!\nclass(penguins_mutated$island)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"character\"\n```\n\n\n:::\n:::\n\n\nNow `island` is a character column!\n:::\n\n### Conditional Operations\n\nSometimes in data wrangling we'll want to generate a new column where the contents of the column are dependent upon an existing column but we have many separate \"if X then Y\" type statements. Such statements are called \"conditional\" statements in programming. You may already be familiar with base R's `ifelse` function for handling cases where you have an either/or condition.\n\nIn the `tidyverse`--specifically `dplyr`--we have `case_when` for handling multiple conditions in an efficient and relatively straightforward way! Why are we talking about `case_when` here? Because you can use `case_when` inside of a `mutate` to create a new column based on the conditions that you specify.\n\nHere is what the general syntax for this operation looks like:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nyour_data_v2 <- dplyr::mutate(.data = your_data,\n new_column_name = dplyr::case_when(\n condition1 ~ value_for_condition1,\n condition2 ~ value_for_condition2,\n condition3 ~ value_for_condition3,\n ...\n TRUE ~ value_if_no_conditions_are_met))\n```\n:::\n\n\nLet's look at an example to make this somewhat more tangible.\n\n### `mutate` + `case_when` Example: Creating a New Column Conditionally\n\n:::callout-note\n## Example\n\nSuppose we want to add a new column called `flipper_rank` that contains the following:\n\n- \"short\" if `flipper_length_mm` is \\< 190 mm\n- \"long\" if `flipper_length_mm` is \\>= 190 mm\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Enter your conditions on the left side and the values on the right side of the tilde\npenguins_mutated <- dplyr::mutate(.data = penguins,\n flipper_rank = dplyr::case_when(\n flipper_length_mm < 190 ~ \"short\",\n flipper_length_mm >= 190 ~ \"long\")\n)\n\ndplyr::glimpse(penguins_mutated)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 9\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n$ flipper_rank \"short\", \"short\", \"long\", NA, \"long\", \"long\", \"short…\n```\n\n\n:::\n:::\n\n:::\n\n### Challenge: `mutate` + `case_when`\n\n:::callout-important\n## Your Turn!\n\nUsing `mutate` and `case_when`, create a new column called `size_bin` that contains the following:\n\n- \"large\" if body mass is greater than 4500 grams\n- \"medium\" if body mass is greater than 3000 grams, and less than or equal to 4500 grams\n- \"small\" if body mass is less than or equal to 3000 grams\n:::\n\n### Splitting a Column into Multiple Columns\n\nAnother relatively common task in data wrangling involves splitting the contents of one column into several columns. To demonstrate, let's first make a new column that contains the full scientific names for these penguins using `mutate` and `case_when`.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Remember that conditions are on the left side and the values are on the right side of the tilde\npenguins_v1 <- dplyr::mutate(.data = penguins, \n scientific_name = dplyr::case_when(\n species == \"Adelie\" ~ \"Pygoscelis_adeliae\",\n species == \"Chinstrap\" ~ \"Pygoscelis_antarcticus\",\n species == \"Gentoo\" ~ \"Pygoscelis_papua\"))\n\ndplyr::glimpse(penguins_v1)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 9\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n$ scientific_name \"Pygoscelis_adeliae\", \"Pygoscelis_adeliae\", \"Pygosce…\n```\n\n\n:::\n:::\n\n\nIf we want to split the scientific name into genus and specific epithet, we can use the `separate_wider_delim` function from the `tidyr` package.\n\n### `separate_wider_delim` Example: Splitting a Column Apart\n\n:::callout-note\n## Example\n\nUsing our new scientific name column, suppose we want to split it so that `scientific_name` becomes two columns: `genus` and `epithet`. Using `tidyr`'s `separate_wider_delim` function we can do this in a single step!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Indicate the column you want to split, the separator, and the new column names!\npenguins_separated <- tidyr::separate_wider_delim(data = penguins_v1,\n cols = scientific_name,\n delim = \"_\",\n names = c(\"genus\", \"epithet\"))\n\ndplyr::glimpse(penguins_separated)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nRows: 344\nColumns: 10\n$ species Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…\n$ island Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…\n$ bill_length_mm 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …\n$ bill_depth_mm 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …\n$ flipper_length_mm 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…\n$ body_mass_g 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …\n$ sex male, female, female, NA, female, male, female, male…\n$ year 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…\n$ genus \"Pygoscelis\", \"Pygoscelis\", \"Pygoscelis\", \"Pygosceli…\n$ epithet \"adeliae\", \"adeliae\", \"adeliae\", \"adeliae\", \"adeliae…\n```\n\n\n:::\n:::\n\n:::\n",
+ "supporting": [],
+ "filters": [
+ "rmarkdown/pagebreak.lua"
+ ],
+ "includes": {},
+ "engineDependencies": {},
+ "preserve": {},
+ "postProcess": true
+ }
+}
\ No newline at end of file