Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Aug 22, 2024
1 parent dcfcc88 commit 4ca37a3
Show file tree
Hide file tree
Showing 12 changed files with 43 additions and 27 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
c544bed4
3b912f51
Binary file added images/logos/hex_magrittr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 16 additions & 6 deletions join.html
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ <h2 class="anchored" data-anchor-id="module-learning-objectives">Module Learning
</section>
<section id="combining-data" class="level2">
<h2 class="anchored" data-anchor-id="combining-data">Combining data</h2>
<p>Now that we know how to manipulate a single dataframe, how do we manipulate multiple dataframes? If we have multiple sources of data and we want to combine them together into one dataframe or table, we can <strong>join</strong> them through any shared column(s)! Data you’ll be joining can be called “relational data”, because there is some kind of relationship between the dataframes that you’ll be leveraging. In the <code>tidyverse</code>, combining data that has a relationship is called “joining”. Let’s look at some of <code>dplyr</code>’s many <code>join</code> functions!</p>
<p>Now that we know how to manipulate a single dataframe, how do we manipulate multiple dataframes? If we have multiple sources of data and we want to combine them together into one dataframe or table, we can <strong>join</strong> them through any shared column(s)! Data you’ll be joining can be called “relational data”, because there is some kind of relationship between the dataframes that you’ll be leveraging. In the Tidyverse, combining data that has a relationship is called “joining”. Let’s look at some of <code>dplyr</code>’s many <code>join</code> functions!</p>
<p>In each of the following <code>join</code> functions, you provide two dataframes, the one you arbitrarily provide first is called the “left” dataframe while the other is called the “right” dataframe. This is important because each of the different <code>join</code> functions brings the columns from one of the dataframes into the other depending on (1) which dataframe is left and which is right and (2) what type of <code>join</code> you specify.</p>
<p>This becomes somewhat more intuitive when looking at tangible examples so let’s prepare some data to <code>join</code> in different ways!</p>
<section id="join-data-preparation" class="level3">
Expand Down Expand Up @@ -348,7 +348,9 @@ <h3 class="anchored" data-anchor-id="left_join-example-prioritize-the-left-dataf
</div>
<div class="callout-body-container callout-body">
<p>In a <code>left_join</code>, we bring the columns from the right dataframe that match rows found in the specified column(s) of the left dataframe.</p>
<p><img src="images/join-left.png" align="center" width="50%"></p>
<p align="center">
<img src="images/join-left.png" alt="Graphic showing a left join" width="50%">
</p>
<p>We can specify the column that we want to join based on with <code>by = ...</code>. If we don’t provide this argument, then <code>dplyr</code> will automatically join on <strong>all</strong> matching columns between the left and right dataframes. In our case, we want to <code>left_join</code> by <code>record_number</code>.</p>
<p>To better demonstrate that only rows found in the left dataframe will be joined from the right dataframe, we’ll use the pipe <code>%&gt;%</code> to <code>filter</code> the left dataframe before <code>join</code>ing.</p>
<div class="cell">
Expand Down Expand Up @@ -392,7 +394,9 @@ <h3 class="anchored" data-anchor-id="right_join-example-prioritize-the-right-dat
</div>
<div class="callout-body-container callout-body">
<p>In a <code>right_join</code>, we bring rows from the left dataframe into the right dataframe based on the values in the specified column(s) of the right dataframe.</p>
<p><img src="images/join-right.png" align="center" width="50%"></p>
<p align="center">
<img src="images/join-right.png" alt="Graphic showing a right join" width="50%">
</p>
<p>As the names imply, a <code>right_join</code> is the opposite of a <code>left_join</code>.</p>
</div>
</div>
Expand All @@ -410,7 +414,9 @@ <h3 class="anchored" data-anchor-id="inner_join-example-keep-rows-found-in-both-
</div>
<div class="callout-body-container callout-body">
<p>In an <code>inner_join</code>, we keep only the rows where the values in the column we are joining <code>by</code> are found in both dataframes.</p>
<p><img src="images/join-inner.png" align="center" width="50%"></p>
<p align="center">
<img src="images/join-inner.png" alt="Graphic showing an inner join" width="50%">
</p>
<p>This can be really useful when one of the dataframes includes supplementary data that has incomplete coverage on the other dataframe and you want to simultaneously combine the dataframes and remove the inevitable <code>NA</code>s that will be created.</p>
<p>For example, imagine that you have a dataframe of 100 study sites with information on plant growth and a second dataframe of soil chemistry information. Your grant budget was really tight though so you needed to prioritize sample processing and you only have soil chemistry for 20 of the sites where you have plant growth data.</p>
<p>If you use <code>inner_join</code> on your plant growth and soil chemistry datasets, you will create a single dataframe with both chemistry and plant data that only has the sites (i.e., rows) where you had data for both. This dataframe then would likely be ready for analysis because you’d have complete data for every site in the new <code>join</code>ed dataframe!</p>
Expand All @@ -431,7 +437,9 @@ <h3 class="anchored" data-anchor-id="full_join-example-combine-all-data-in-both-
</div>
<div class="callout-body-container callout-body">
<p>In a <code>full_join</code>, we keep all values and all rows.</p>
<p><img src="images/join-full.png" align="center" width="50%"></p>
<p align="center">
<img src="images/join-full.png" alt="Graphic showing a full join" width="50%">
</p>
<p>A <code>full_join</code> is “smart” enough to fill with <code>NA</code>s in all rows that don’t match between the two dataframes. Also, just like an <code>inner_join</code>, a <code>full_join</code> doesn’t care about which dataframe is “left” and which is “right” because all columns are getting combined regardless of which is left vs.&nbsp;right.</p>
</div>
</div>
Expand All @@ -449,7 +457,9 @@ <h3 class="anchored" data-anchor-id="anti_join-example-keep-only-columns-that-ar
</div>
<div class="callout-body-container callout-body">
<p>In an <code>anti_join</code>, we return rows of the left dataframe that do not have a match in the right dataframe. This can be used to see what will <strong>not</strong> be included in a join.</p>
<p><img src="images/join-anti.png" align="center" width="50%"></p>
<p align="center">
<img src="images/join-anti.png" alt="Graphic showing an anti join" width="50%">
</p>
<p>One case where an <code>anti_join</code> is particularly useful is that of “text mining” where you have one dataframe with a column of individual words that you’ve split apart from a larger block of free text. If you also have a dataframe of one column that contains words that you want to remove from your “actual” data (e.g., “and”, “not”, “I”, “me”, etc.), you can <code>anti_join</code> the two dataframes to quickly remove all of those unwanted words from your text mining dataframe.</p>
</div>
</div>
Expand Down
8 changes: 6 additions & 2 deletions reshape.html
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,9 @@ <h3 class="anchored" data-anchor-id="pivot_wider-example-reshaping-to-wide-forma
</div>
<div class="callout-body-container callout-body">
<p><code>pivot_wider</code> takes long format data and reshapes it into wide format.</p>
<p><img src="images/reshape-pivot-wide.png" width="50%"></p>
<p align="center">
<img src="images/reshape-pivot-wide.png" alt="Graphic of a table with 'A' and 'B' columns being pivoted to a table with 'A', 'C', and 'D' columns and a 'B' row" width="50%">
</p>
<p>Let’s say that we want to take that data object and reshape it into wide format so that each island is a column and each species of penguin is a row. The contents of each cell then are going to be the average bill length values that we just calculated.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Begin by naming the objects</span></span>
Expand Down Expand Up @@ -369,7 +371,9 @@ <h3 class="anchored" data-anchor-id="pivot_longer-example-reshaping-to-long-form
<div class="callout-body-container callout-body">
<p>Now that we have a small wide format data object, we can feed it to <code>pivot_longer</code> and reshape our data into long format! <code>pivot_longer</code> has very similar syntax <em>except</em> that with <code>pivot_longer</code> you need to tell the function which columns should be reshaped.</p>
<p><code>pivot_wider</code> on the other hand knows which columns to move around because you manually specify them in the “names_from” and “values_from” arguments.</p>
<p><img src="images/reshape-pivot-long.png" width="50%"></p>
<p align="center">
<img src="images/reshape-pivot-long.png" alt="Graphic of a table with 'A' and 'B' columns being pivoted to a table with 'C' and 'D' columns and 'A' and 'B' rows" width="50%">
</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Begin with our wide data</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>penguins_wide <span class="sc">%&gt;%</span></span>
Expand Down
12 changes: 6 additions & 6 deletions search.json

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions sitemap.xml
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,26 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://lter.github.io/workshop-tidyverse/join.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
<url>
<loc>https://lter.github.io/workshop-tidyverse/wrangle.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
<url>
<loc>https://lter.github.io/workshop-tidyverse/reshape.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
<url>
<loc>https://lter.github.io/workshop-tidyverse/summarize.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
<url>
<loc>https://lter.github.io/workshop-tidyverse/visualize.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
<url>
<loc>https://lter.github.io/workshop-tidyverse/index.html</loc>
<lastmod>2024-08-22T15:14:04.379Z</lastmod>
<lastmod>2024-08-22T15:26:00.445Z</lastmod>
</url>
</urlset>
10 changes: 6 additions & 4 deletions summarize.html
Original file line number Diff line number Diff line change
Expand Up @@ -292,7 +292,7 @@ <h2 class="anchored" data-anchor-id="module-learning-objectives">Module Learning
</section>
<section id="pipe-operator" class="level2">
<h2 class="anchored" data-anchor-id="pipe-operator">Pipe Operator (<code>%&gt;%</code>)</h2>
<p>Before diving into the <code>tidyverse</code> functions that allow for summarization and group-wise operations, let’s talk about the pipe operator (<code>%&gt;%</code>). The pipe is from the <code>magrittr</code> package and allows chaining together multiple functions without needing to create separate objects at each step as you would have to without the pipe.</p>
<p>Before diving into the Tidyverse functions that allow for summarization and group-wise operations, let’s talk about the pipe operator (<code>%&gt;%</code>). The pipe is from the <code>magrittr</code> package and allows chaining together multiple functions without needing to create separate objects at each step as you would have to without the pipe.</p>
<section id="example-using-the-pipe" class="level3">
<h3 class="anchored" data-anchor-id="example-using-the-pipe"><code>%&gt;%</code> Example: Using the Pipe</h3>
<div class="callout callout-style-default callout-note callout-titled">
Expand All @@ -306,7 +306,7 @@ <h3 class="anchored" data-anchor-id="example-using-the-pipe"><code>%&gt;%</code>
</div>
<div class="callout-body-container callout-body">
<p>As in the other chapters, let’s use the “penguins” data object found in the <code>palmerpenguins</code> package. Let’s say we want to keep only specimens that have a measurement for both bill length and bill depth and then remove the flipper and body mass columns.</p>
<p>Without the pipe–but still using other <code>tidyverse</code> functions–we could go about this like this:</p>
<p>Without the pipe–but still using other Tidyverse functions–we could go about this like this:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Filter out the NAs</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>penguins_v2 <span class="ot">&lt;-</span> dplyr<span class="sc">::</span><span class="fu">filter</span>(<span class="at">.data =</span> penguins,</span>
Expand Down Expand Up @@ -372,15 +372,17 @@ <h3 class="anchored" data-anchor-id="challenge">Challenge: <code>%&gt;%</code></
</section>
<section id="aside-fun-history-of-why-is-a-pipe" class="level3">
<h3 class="anchored" data-anchor-id="aside-fun-history-of-why-is-a-pipe">Aside: Fun History of Why <code>%&gt;%</code> is a “Pipe”</h3>
<p><img src="images/magrittr_hex.png" align="right" width="15%"></p>
<p><img src="images/logos/hex_magrittr.png" alt="Hex logo for the 'magrittr' R package" align="right" width="15%"></p>
<p>The Belgian painter René Magritte famously created a painting titled “<a href="https://collections.lacma.org/node/239578">The Treachery of Images</a>” featuring a depiction of a smoking pipe above the words “<em>Cest ci n’est pas une pipe</em>” (French for “This is not a pipe”). Magritte’s point was about how the depiction of a thing is not equal to thing itself. The <code>magrittr</code> package takes its name from the painter because it also includes a pipe that functions slightly differently from a command line pipe and uses different characters. Just like Magritte’s pipe, <code>%&gt;%</code> both is and isn’t a pipe!</p>
</section>
</section>
<section id="group-wise-summarizing" class="level2">
<h2 class="anchored" data-anchor-id="group-wise-summarizing">Group-Wise Summarizing</h2>
<p>Now that we’ve covered the <code>%&gt;%</code> operator we can use it to do group-wise summarization! Technically this summarization does not <em>require</em> the pipe but it does inherently have two steps and thus benefits from using the pipe to chain together those technically separate instructions.</p>
<p>To summarize by groups we first define our groups using <code>dplyr</code>’s <code>group_by</code> function and then summarize using <code>summarize</code> (also from <code>dplyr</code>). <code>summarize</code> does require you to specify what calculations you want to perform within your groups though it uses similar syntax to <code>dplyr</code>’s <code>mutate</code> function.</p>
<p><img src="images/summarize-group-by.png" align="center" width="50%"></p>
<p align="center">
<img src="images/summarize-group-by.png" alt="Graphic of a table with an 'A' and 'B' column where the 'A' column contains one of two shapes becoming a smaller table with one row per type of shape and an 'A' and 'C' column" width="50%">
</p>
<p>Despite the similarity in syntax between <code>summarize</code> and <code>mutate</code> there are a few crucial differences:</p>
<ul>
<li><code>summarize</code> returns only a single row per group while <code>mutate</code> returns as many rows as are in the original dataframe</li>
Expand Down
4 changes: 2 additions & 2 deletions visualize.html
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ <h2 class="anchored" data-anchor-id="module-learning-objectives">Module Learning
</section>
<section id="ggplot2-overview" class="level2">
<h2 class="anchored" data-anchor-id="ggplot2-overview"><code>ggplot2</code> Overview</h2>
<p>While the bulk of the <code>tidyverse</code> is focused on modifying a given data object, <code>ggplot2</code> is also a package in the <code>tidyverse</code> that is more concerned with–intuitively enough–<em>plotting</em> tidy data. <code>ggplot2</code> does share some syntax with the functions and packages that we’ve discussed so far but it also introduces some new elements that we’ll discuss as we encounter them.</p>
<p>While the bulk of the Tidyverse is focused on modifying a given data object, <code>ggplot2</code> is also a package in the Tidyverse that is more concerned with–intuitively enough–<em>plotting</em> tidy data. <code>ggplot2</code> does share some syntax with the functions and packages that we’ve discussed so far but it also introduces some new elements that we’ll discuss as we encounter them.</p>
</section>
<section id="creating-a-plot" class="level2">
<h2 class="anchored" data-anchor-id="creating-a-plot">Creating a Plot</h2>
Expand Down Expand Up @@ -356,7 +356,7 @@ <h2 class="anchored" data-anchor-id="choosing-a-plot-type">Choosing a Plot Type<
<p>Now that we have a baseline plot, we can add desired geometries using the <code>geom_...</code> family of functions. Broadly speaking, there is one <code>geom_...</code> for every possible way of plotting your data. Want to make a scatter plot? Use <code>geom_point</code>. Bar plot? <code>geom_bar</code>. Add a best-fit line? <code>geom_smooth</code>. When you first begin making plots with <code>ggplot2</code> you will likely have to Google which <code>geom_...</code> you want (that was certainly what the creators of this workshop did when we started out!) but over time you’ll remember them more and more clearly.</p>
<section id="geometry-aside-no.-1---adding-plot-elements" class="level3">
<h3 class="anchored" data-anchor-id="geometry-aside-no.-1---adding-plot-elements">Geometry Aside No.&nbsp;1 - Adding Plot Elements</h3>
<p>You may have noticed that the core plot is built with <code>ggplot</code> and <code>aes</code> but each subsequent component is added with one of the <code>geom_...</code> functions and realized the gap we haven’t talked about yet: how do we combine these separate lines of code? The answer is part of what makes <code>ggplot</code> different from the rest of the <code>tidyverse</code>. In the rest of the <code>tidyverse</code> we chain together multiple lines of code with the <code>%&gt;%</code> operator, however, <strong>in <code>ggplot2</code> we use <code>+</code> to combine separate lines of code.</strong></p>
<p>You may have noticed that the core plot is built with <code>ggplot</code> and <code>aes</code> but each subsequent component is added with one of the <code>geom_...</code> functions and realized the gap we haven’t talked about yet: how do we combine these separate lines of code? The answer is part of what makes <code>ggplot</code> different from the rest of the Tidyverse. In the rest of the Tidyverse we chain together multiple lines of code with the <code>%&gt;%</code> operator, however, <strong>in <code>ggplot2</code> we use <code>+</code> to combine separate lines of code.</strong></p>
<p>This has a distinct advantage that we’ll discuss later but we’ll use the <code>+</code> in the following example to show its use.</p>
</section>
<section id="geom_...-example-adding-a-geometry" class="level3">
Expand Down
Binary file modified visualize_files/figure-html/geom-order-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified visualize_files/figure-html/theme-plus-1-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified visualize_files/figure-html/theme-plus-2-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified visualize_files/figure-html/theme-plus-3-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4ca37a3

Please sign in to comment.