From 5027055425ad7ef8c41ef75c3f0a893b403a03f9 Mon Sep 17 00:00:00 2001 From: "phillip.buckham-bonnett@justice.gov.uk" Date: Mon, 19 Feb 2024 21:13:08 +0000 Subject: [PATCH 1/5] Update README - closes #41 --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/README.md b/README.md index 5cb3ef7..caeb309 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ This introduction to R charting using ggplot2 is suitable for those who have com [Intro R training](https://github.com/moj-analytical-services/IntroRTraining) or have at least reached an equivalent standard to having done this. +
+ ## For attendees You will find the following documents useful: @@ -19,6 +21,14 @@ through the material by yourself please leave feedback about the material **Please contact [Aidan Mews](aidan.mews@justice.gov.uk) if you have any questions.** +### Set up instructions +* [Ensure you have access to the Analytical Platform](https://user-guidance.analytical-platform.service.justice.gov.uk/get-started.html#quickstart-guide). +* [Ensure you have linked RStudio and GitHub with an SSH key](https://user-guidance.analytical-platform.service.justice.gov.uk/github/set-up-github.html). +* [Clone](https://user-guidance.analytical-platform.service.justice.gov.uk/github/rstudio-git.html#work-with-git-in-rstudio) the [R Charting repo](https://github.com/moj-analytical-services/ggplotTraining) as an RStudio project. +* Run `renv::restore()` to install the required R packages. + +
+ ## For presenters * `index.Rmd` is the markdown for the presentation slides (knitted to `index.html`). From c6a38207d42806f50d6187f46cb5d3a995b434ac Mon Sep 17 00:00:00 2001 From: "phillip.buckham-bonnett@justice.gov.uk" Date: Mon, 19 Feb 2024 21:24:21 +0000 Subject: [PATCH 2/5] Added note about R_X11 warning - closes #38 --- index.Rmd | 12 ++++++++++++ index.html | 9 +++++++++ 2 files changed, 21 insertions(+) diff --git a/index.Rmd b/index.Rmd index 92b6e86..ca11811 100644 --- a/index.Rmd +++ b/index.Rmd @@ -160,6 +160,18 @@ The other functions are optional, with default values used if they are not speci To fix the problem with exercise 4, make a scatter plot of `class` vs `cty` and colour the points by `drv` (more on this in the next section) +### Note + +The first time you plot something in each R session on the Analytical Platform you may get the +following warning. You can ignore it. + +``` +Warning message: +In grSoftVersion() : + unable to load shared object '/usr/local/lib/R/modules//R_X11.so': + libXt.so.6: cannot open shared object file: No such file or directory +``` + ## Aesthetics Typical aesthetics: x axis position, y axis position, colour, fill, size, alpha (transparency). diff --git a/index.html b/index.html index 2fb7a69..a9e5592 100644 --- a/index.html +++ b/index.html @@ -3362,6 +3362,15 @@

Extension

To fix the problem with exercise 4, make a scatter plot of class vs cty and colour the points by drv (more on this in the next section)

+

Note

+ +

The first time you plot something in each R session on the Analytical Platform you may get the following warning. You can ignore it.

+ +
Warning message:
+In grSoftVersion() :
+  unable to load shared object '/usr/local/lib/R/modules//R_X11.so':
+  libXt.so.6: cannot open shared object file: No such file or directory
+

Aesthetics

Typical aesthetics: x axis position, y axis position, colour, fill, size, alpha (transparency).

From ef9ff9c63fe3181a27053052570088887d1bd548 Mon Sep 17 00:00:00 2001 From: "phillip.buckham-bonnett@justice.gov.uk" Date: Mon, 19 Feb 2024 22:16:06 +0000 Subject: [PATCH 3/5] Various section 2 updates - closes #40 --- index.Rmd | 119 +++++++++++++++++++++++++-------------------------- index.html | 123 ++++++++++++++++++++++++++--------------------------- 2 files changed, 119 insertions(+), 123 deletions(-) diff --git a/index.Rmd b/index.Rmd index ca11811..b74297f 100644 --- a/index.Rmd +++ b/index.Rmd @@ -185,7 +185,7 @@ ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy, colour = class)) + --- -ggplot2 looks at the data type to set a default colour scale - categorical variables get discrete colours, continuous variables get a continous colour scale. These can be customised using `scale_` functions (shown later). +ggplot2 looks at the data type to set a default colour scale - categorical variables get discrete colours, continuous variables get a continuous colour scale. These can be customised using `scale_` functions (shown later). ```{r, fig.height=4, fig.width=8} # ggplot2 looks at the data type to set a default colour scale @@ -295,6 +295,50 @@ The topics for the second part will include: - Chart Positioning - Themes, titles and Multiple Plots +## Avoiding overplotting + +What is wrong with this plot? + +```{r, fig.height=3} +# What is wrong with this plot? +ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) + + ggplot2::geom_point() +``` + +--- + +Because there are multiple observations, some are **overplotted**. To correct this, you can add some random noise to the data with `position = "jitter"` or `ggplot2::geom_jitter()`. This is a bit of a compromise - you either have a chart that is accurate but suffers from over plotting, or one that contains some random noise but reveals the size of the data. + +```{r, fig.height=3} +# Because there are multiple observations, some are **overplotted**. To correct this, you can add +# some random noise to the data with `position="jitter"` or `geom_jitter`. +ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) + + ggplot2::geom_jitter(colour = 'red') + + ggplot2::geom_point() +``` + +## Fitted lines + +`ggplot2::geom_smooth()` takes data points and returns a regression model with confidence intervals. For example, if we just replace `ggplot2::geom_point()` with `ggplot2::geom_smooth()`, we get a loess curve. + +```{r, fig.height=3} +ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) + + ggplot2::geom_smooth() +``` + +--- + +We can add the points as well, using the + operator; and have added a line break using `ggplot2::geom_line()` with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line. + +```{r, fig.height=3} +# We can add the points as well, using the + operator +ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) + + ggplot2::geom_smooth() + + ggplot2::geom_point() + + ggplot2::geom_line(aes(x = 4.5)) +``` + + ## Bar charts Counts of a single discrete variable. @@ -319,8 +363,6 @@ mpg %>% ``` -# Positioning - ## Bar chart positioning How could this chart be made easier to use? @@ -404,49 +446,6 @@ ggplot2::economics %>% ``` ---- - -What is wrong with this plot? - -```{r} -# What is wrong with this plot? -ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) + - ggplot2::geom_point() -``` - ---- - -Because there are multiple observations, some are **overplotted**. To correct this, you can add some random noise to the data with `position = "jitter"` or `ggplot2::geom_jitter()`. This is a bit of a compromise - you either have a chart that is accurate but suffers from over plotting, or one that contains some random noise but reveals the size of the data. - -```{r, fig.height=3} -# Because there are multiple observations, some are **overplotted**. To correct this, you can add -# some random noise to the data with `position="jitter"` or `geom_jitter`. -ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) + - ggplot2::geom_jitter(colour = 'red') + - ggplot2::geom_point() -``` - -## geom_smooth() - -`ggplot2::geom_smooth()` takes data points and returns a regression model with confidence intervals. For example, if we just replace `ggplot2::geom_point()` with `ggplot2::geom_smooth()`, we get a loess curve. - -```{r, fig.height=4} -ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) + - ggplot2::geom_smooth() -``` - ---- - -We can add the points as well, using the + operator; and have added a line break using `ggplot2::geom_line()` with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line. - -```{r, fig.height=3} -# We can add the points as well, using the + operator -ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) + - ggplot2::geom_smooth() + - ggplot2::geom_point() + - ggplot2::geom_line(aes(x = 4.5)) -``` - ## Exercises: Section 3 1. Using the `mpg` dataset create a histogram of `cty`. What impact do different values for the `bins` argument have? @@ -457,10 +456,10 @@ ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) + *Hint* You will need to supply a new geom layer with a new aesthetic mapping. -# Themes, titles, and multiple plots +# Labels, Themes, and multiple plots -## Titles -Labels and titles can be added using the labs command. +## Lables +Labels and titles can be added using the `labs()` layer. ```{r, fig.height=3} # Themes, titles, and multiple plots @@ -474,21 +473,20 @@ ggplot2::ggplot(data = mpg, aes(x = class, y = ..prop.., group = 1)) + Plots can be arranged using the `grid.arrange` command from the `gridExtra` package. First we store the plots in a variable using the `<-` operator. ```{r} # Multiple plots -plot1<- +plot1 <- ggplot2::ggplot(data = mpg, aes(x = cyl, y = ..prop.., group = 1)) + ggplot2::geom_bar(fill = "red") + - ggplot2::labs(title = "Proportion of sample by cylinder", - x = "Cylinder", - y = "Proportion", - subtitle = " ") + ggplot2::labs(title = "Proportion of sample by engine type", + x = "Number of cylinders", + y = "Proportion") -plot2<- +plot2 <- ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) + ggplot2::geom_jitter(colour = 'red') + - ggplot2::labs(title = "Highway fuel efficiency\number of cylinders", - subtitle = "Note, the points are jittered", - x = "Number of cylinders", - y = "Fuel efficiency") + ggplot2::labs(title = "Fuel efficiency comparison", + subtitle = "Note: the points are jittered", + x = "City fuel efficiency", + y = "Highway fuel efficienty") ``` @@ -537,7 +535,8 @@ plot2 + ugly_theme --- -Pre-built MoJ and Government Analysis Function themes and colour schemes are available in the `mojchart` package. +Pre-built MoJ and Government Analysis Function themes and colour schemes are available in the +`mojchart` package (https://github.com/moj-analytical-services/mojchart). ```{r, fig.height=3} # MoJ colour scheme diff --git a/index.html b/index.html index a9e5592..4b3dc4f 100644 --- a/index.html +++ b/index.html @@ -3385,7 +3385,7 @@

Note

-

ggplot2 looks at the data type to set a default colour scale - categorical variables get discrete colours, continuous variables get a continous colour scale. These can be customised using scale_ functions (shown later).

+

ggplot2 looks at the data type to set a default colour scale - categorical variables get discrete colours, continuous variables get a continuous colour scale. These can be customised using scale_ functions (shown later).

# ggplot2 looks at the data type to set a default colour scale
 ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy, colour = year)) +
@@ -3508,6 +3508,49 @@ 

4

  • Themes, titles and Multiple Plots
  • +

    Avoiding overplotting

    + +

    What is wrong with this plot?

    + +
    # What is wrong with this plot?
    +ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) +
    +  ggplot2::geom_point()
    + +

    + +

    + +

    Because there are multiple observations, some are overplotted. To correct this, you can add some random noise to the data with position = "jitter" or ggplot2::geom_jitter(). This is a bit of a compromise - you either have a chart that is accurate but suffers from over plotting, or one that contains some random noise but reveals the size of the data.

    + +
    # Because there are multiple observations, some are **overplotted**. To correct this, you can add 
    +# some random noise to the data with `position="jitter"` or `geom_jitter`.
    +ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) +
    +  ggplot2::geom_jitter(colour = 'red') +
    +  ggplot2::geom_point()
    + +

    + +

    Fitted lines

    + +

    ggplot2::geom_smooth() takes data points and returns a regression model with confidence intervals. For example, if we just replace ggplot2::geom_point() with ggplot2::geom_smooth(), we get a loess curve.

    + +
    ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
    +  ggplot2::geom_smooth()
    + +

    + +

    + +

    We can add the points as well, using the + operator; and have added a line break using ggplot2::geom_line() with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line.

    + +
    # We can add the points as well, using the + operator
    +ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
    +  ggplot2::geom_smooth() +
    +  ggplot2::geom_point() +
    +  ggplot2::geom_line(aes(x = 4.5))
    + +

    +

    Bar charts

    Counts of a single discrete variable.

    @@ -3531,8 +3574,6 @@

    4

    -

    Positioning

    -

    Bar chart positioning

    How could this chart be made easier to use?

    @@ -3615,49 +3656,6 @@

    4

    -

    - -

    What is wrong with this plot?

    - -
    # What is wrong with this plot?
    -ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) +
    -  ggplot2::geom_point()
    - -

    - -

    - -

    Because there are multiple observations, some are overplotted. To correct this, you can add some random noise to the data with position = "jitter" or ggplot2::geom_jitter(). This is a bit of a compromise - you either have a chart that is accurate but suffers from over plotting, or one that contains some random noise but reveals the size of the data.

    - -
    # Because there are multiple observations, some are **overplotted**. To correct this, you can add 
    -# some random noise to the data with `position="jitter"` or `geom_jitter`.
    -ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) +
    -  ggplot2::geom_jitter(colour = 'red') +
    -  ggplot2::geom_point()
    - -

    - -

    geom_smooth()

    - -

    ggplot2::geom_smooth() takes data points and returns a regression model with confidence intervals. For example, if we just replace ggplot2::geom_point() with ggplot2::geom_smooth(), we get a loess curve.

    - -
    ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
    -  ggplot2::geom_smooth()
    - -

    - -

    - -

    We can add the points as well, using the + operator; and have added a line break using ggplot2::geom_line() with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line.

    - -
    # We can add the points as well, using the + operator
    -ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
    -  ggplot2::geom_smooth() +
    -  ggplot2::geom_point() +
    -  ggplot2::geom_line(aes(x = 4.5))
    - -

    -

    Exercises: Section 3

      @@ -3668,11 +3666,11 @@

      4

      Hint You will need to supply a new geom layer with a new aesthetic mapping.

      -

    Themes, titles, and multiple plots

    +

    Labels, Themes, and multiple plots

    -

    Titles

    +

    Lables

    -

    Labels and titles can be added using the labs command.

    +

    Labels and titles can be added using the labs() layer.

    # Themes, titles, and multiple plots
     ggplot2::ggplot(data = mpg, aes(x = class, y = ..prop.., group = 1)) +
    @@ -3686,28 +3684,27 @@ 

    4

    Plots can be arranged using the grid.arrange command from the gridExtra package. First we store the plots in a variable using the <- operator.

    # Multiple plots
    -plot1<-
    +plot1 <-
       ggplot2::ggplot(data = mpg, aes(x = cyl, y = ..prop.., group = 1)) +
       ggplot2::geom_bar(fill = "red") +
    -  ggplot2::labs(title = "Proportion of sample by cylinder", 
    -                x = "Cylinder", 
    -                y = "Proportion", 
    -                subtitle = " ")
    +  ggplot2::labs(title = "Proportion of sample by engine type", 
    +                x = "Number of cylinders", 
    +                y = "Proportion")
     
    -plot2<-
    +plot2 <-
       ggplot2::ggplot(data = mpg, aes(x = cty, y = hwy)) +
       ggplot2::geom_jitter(colour = 'red') +
    -  ggplot2::labs(title = "Highway fuel efficiency\number of cylinders", 
    -                subtitle = "Note, the points are jittered", 
    -                x = "Number of cylinders", 
    -                y = "Fuel efficiency")
    + ggplot2::labs(title = "Fuel efficiency comparison", + subtitle = "Note: the points are jittered", + x = "City fuel efficiency", + y = "Highway fuel efficienty")

    # gridExtra
     gridExtra::grid.arrange(plot1, plot2, nrow = 1)
    -

    +

    Themes

    @@ -3720,7 +3717,7 @@

    4

    plot2 + ggplot2::theme_bw(), plot2 + ggplot2::theme_classic(), plot2 + ggplot2::theme_dark(), plot2 + ggplot2::theme_light()) -

    +

    @@ -3739,11 +3736,11 @@

    4

    plot2 + ugly_theme
    -

    +

    -

    Pre-built MoJ and Government Analysis Function themes and colour schemes are available in the mojchart package.

    +

    Pre-built MoJ and Government Analysis Function themes and colour schemes are available in the mojchart package (https://github.com/moj-analytical-services/mojchart).

    # MoJ colour scheme
     ggplot2::ggplot(data = mpg, aes(x = class, fill = drv)) +
    
    From 2ac4061db887abfccada36b8801ac9dcd12316fe Mon Sep 17 00:00:00 2001
    From: "phillip.buckham-bonnett@justice.gov.uk"
     
    Date: Mon, 4 Mar 2024 11:55:26 +0000
    Subject: [PATCH 4/5] Fix typo
    
    ---
     index.Rmd  | 2 +-
     index.html | 2 +-
     2 files changed, 2 insertions(+), 2 deletions(-)
    
    diff --git a/index.Rmd b/index.Rmd
    index b74297f..a3c2492 100644
    --- a/index.Rmd
    +++ b/index.Rmd
    @@ -328,7 +328,7 @@ ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
     
     ---
     
    -We can add the points as well, using the + operator; and have added a line break using `ggplot2::geom_line()` with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line. 
    +We can add the points as well, using the + operator; and have added a line break using `ggplot2::geom_line()` with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be drawn on top (and obscure) the line. 
     
     ```{r, fig.height=3}
     # We can add the points as well, using the + operator
    diff --git a/index.html b/index.html
    index 4b3dc4f..cec5f1f 100644
    --- a/index.html
    +++ b/index.html
    @@ -3541,7 +3541,7 @@ 

    4

    -

    We can add the points as well, using the + operator; and have added a line break using ggplot2::geom_line() with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be draw on top (and obscure) the line.

    +

    We can add the points as well, using the + operator; and have added a line break using ggplot2::geom_line() with a constant x value (this can be used for example to show a significant event in a time series). Note that the order of layers determines their order on the plot. As points are after the smooth line, they will be drawn on top (and obscure) the line.

    # We can add the points as well, using the + operator
     ggplot2::ggplot(data = mpg, aes(x = displ, y = hwy)) +
    
    From aead642264e420820157708fa84a22dc3605076d Mon Sep 17 00:00:00 2001
    From: "phillip.buckham-bonnett@justice.gov.uk"
     
    Date: Tue, 5 Mar 2024 15:24:47 +0000
    Subject: [PATCH 5/5] Fix typo
    
    ---
     index.Rmd  | 2 +-
     index.html | 8 ++++----
     2 files changed, 5 insertions(+), 5 deletions(-)
    
    diff --git a/index.Rmd b/index.Rmd
    index a3c2492..b74d8ad 100644
    --- a/index.Rmd
    +++ b/index.Rmd
    @@ -486,7 +486,7 @@ plot2 <-
       ggplot2::labs(title = "Fuel efficiency comparison", 
                     subtitle = "Note: the points are jittered", 
                     x = "City fuel efficiency", 
    -                y = "Highway fuel efficienty")
    +                y = "Highway fuel efficiency")
     
     ```
     
    diff --git a/index.html b/index.html
    index cec5f1f..a324fba 100644
    --- a/index.html
    +++ b/index.html
    @@ -3697,14 +3697,14 @@ 

    4

    ggplot2::labs(title = "Fuel efficiency comparison", subtitle = "Note: the points are jittered", x = "City fuel efficiency", - y = "Highway fuel efficienty")
    + y = "Highway fuel efficiency")

    # gridExtra
     gridExtra::grid.arrange(plot1, plot2, nrow = 1)
    -

    +

    Themes

    @@ -3717,7 +3717,7 @@

    4

    plot2 + ggplot2::theme_bw(), plot2 + ggplot2::theme_classic(), plot2 + ggplot2::theme_dark(), plot2 + ggplot2::theme_light()) -

    +

    @@ -3736,7 +3736,7 @@

    4

    plot2 + ugly_theme
    -

    +