Skip to content

Commit

Permalink
docs(logbook): update logbooks
Browse files Browse the repository at this point in the history
  • Loading branch information
amyheather committed Jul 9, 2024
1 parent 605434c commit eb462d8
Show file tree
Hide file tree
Showing 8 changed files with 165 additions and 2 deletions.
2 changes: 1 addition & 1 deletion logbook/posts/2024_07_04/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ No dependency management, so will create renv based on the imports and the dates

* Simmer (version 4.1.0)

The article dates areL
The article dates are:

* Received - 31 March 2019
* Accepted - 4 June 2019
Expand Down
Binary file added logbook/posts/2024_07_08/fig2a_example4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added logbook/posts/2024_07_08/fig2a_example5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added logbook/posts/2024_07_08/fig2a_example6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
142 changes: 141 additions & 1 deletion logbook/posts/2024_07_08/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ Hence, I feel we can mark in-text result 1 as reproduced at this time (11.14), w
# Add timings for this result
```

### 11.15-12.30, 13:15-13.50, 13.55-X: Figure 2
### 11.15-12.30, 13:15-13.50, 13.55-14.55: Working on Figure 2

Figure 2 uses the results from the scenarios above but creates plots where:

Expand Down Expand Up @@ -160,3 +160,143 @@ ggsave(path_fig2a)
```

![Figure 2A scaled to 0 to 1](fig2a_example3.png)

I then tried creating a dataframe of counts for each wait time, then calculated probability based on number of people with no wait time. However, many were tiny (as count e.g. 1 of wait time 0.00000000000002842171). Tried it with rounding first. However, it is still then the same, as most are just 0, and then e.g. 1 wait time 0.2, 3 wait time 0.5.

```
# Filter to just AngioINR for ED and round wait times to 2dp
base_angio <- res_base %>%
filter(category == "ed", resource == "angio_inr")
# Set negative wait times to 0
base_angio$wait_time[base_angio$wait_time < 0] <- 0
# Round everything to 1dp
base_angio$wait_time <- round(base_angio$wait_time, 1)
# Get probability of no wait time
n_zero = length(which(base_angio$wait_time == 0))
prob_zero = n_zero / nrow(base_angio)
# Convert dataframe to counts of each wait time
wait_df = base_angio %>%
group_by(wait_time) %>%
summarise(count=n())
```

I tried transforming by the density of 0 (`density_data$y[which.min(abs(density_data$x - 0))]`) but that worked out to just be the same as `max(density_data$y)`, since 0 has the max density.

I tried transforming the x axis, which also appears to be a sqrt transformation, although this has an issue of introducing Inf values and losing where x=0 and density=1. I explored a few different ways of doing this transformation to see if anything helps

## 15.10-15.30: Research into transformations

As I'm struggling with these transformations - to the x axis, and to the probability density function. As such, it seems a good idea to do a bit more research into these and what exactly they are doing, to see if that helps.

### Square root axis transformation

I read a few articles and looked at the documentation for the square root transformation, and understand that this simply applying the `sqrt()` function.

You get the same graph if you do this:

```
density_df %>%
mutate(x_sqrt = sqrt(x)) %>%
ggplot(aes(x=x_sqrt, y=y)) + geom_line() + xlim(0, sqrt(200)) + scale_y_continuous(transform="sqrt")
```

The only difference is the x axis labels - when we use the ggplot axis transformation, it keeps the old labels to maintain interpretation of the original data.

### Density functions

A probability density function is used to describe a continuous distribution. It can be used to find the likelihood of values of a continuous random variable.

`ggplot::geom_density()` is described as plotting a smoothed version of the histogram.

## 15.31-16.55: Returning to Figure 2

I add the sqrt x axis transformation to the basic density plot, and suddenly got a result that looked alot like the article! The only differences are the range of each axis, and the min/max values for y (ranges from 0 to 0.2...)

```
# Filter to just AngioINR for ED and round wait times to 2dp
base_angio <- res_base %>%
filter(category == "ed", resource == "angio_inr")
# Set negative wait times to 0
base_angio$wait_time[base_angio$wait_time < 0] <- 0
ggplot(base_angio, aes(x = wait_time)) +
geom_density() +
scale_y_continuous(transform="sqrt") +
scale_x_continuous(transform="sqrt")
```

![Figure 2A... getting closer!](fig2a_example4).png

I tried out using previous transforms but they didn't look right. Then I came across [this stack Overflow post](https://stackoverflow.com/questions/51385455/geom-density-y-axis-goes-above-1) which suggested you can scale the density estimate to a maximum of one by inputting `..scaled..`. This is the computed `..scaled..` value from `geom_density()` which provides the density estimate scaled to a maximum of 1. From the [documentation](https://ggplot2.tidyverse.org/reference/aes_eval.html), can see that `..scaled..` has been replaced with `after_stat(scaled)`.

This is however assuming that scaling to 1 is the same as scaling by probability of 0 wait time (which is at least true in this case, as we saw above).

```
# Filter to just AngioINR for ED and round wait times to 2dp
base_angio <- res_base %>%
filter(category == "ed", resource == "angio_inr")
# Set negative wait times to 0
base_angio$wait_time[base_angio$wait_time < 0] <- 0
# Create the plot, scaling the density estimate to a maximum of 1
ggplot(base_angio, aes(x=wait_time, y=after_stat(scaled))) +
geom_density() +
scale_y_continuous(transform="sqrt") +
scale_x_continuous(transform="sqrt")
```

![Figure 2A example 5](fig2a_example5.png)

I tried adding all the resources in to the plots, and converting it into a function so I can apply it to all three dataframes. To easily show the plots side-by-side with a shared legend, I installed the package ggpubr.

Installation of ggpubr failed with message `ERROR: configuration failed for package ‘nloptr’`. It suggested I install cmake so, as prompted, I ran `sudo apt install cmake`. This then installed fine.

Creating the plots and making various tweaks to the plotting and appearance, we're getting a bit closer to the paper.

```
create_plot <- function(df, title, xlim=c(0, 200)) {
#' Create sub-plots for Figure 2A
#'
#' @param df Dataframe with wait times across replications
#' @param xlim Tuple with limits for x axis
# Filter to just ED
base_angio <- df %>%
filter(category == "ed")
# Set negative wait times to 0
base_angio$wait_time[base_angio$wait_time < 0] <- 0
# Create the plot, scaling the density estimate to a maximum of 1
ggplot(base_angio, aes(x = wait_time,
colour = resource,
y = after_stat(scaled))) +
geom_density() +
# Apply square transformation to each axis, removing x points beyond limits
scale_y_continuous(transform = "sqrt") +
scale_x_continuous(transform = "sqrt",
breaks = scales::breaks_width(50),
limits = xlim,
oob = scales::censor,
guide = guide_axis(angle=45)) +
# Titles and styling
ggtitle(title) +
xlab("") +
ylab("") +
theme_bw(base_size=10)
}
p1 <- create_plot(res_base, title="Baseline")
p2 <- create_plot(res_exc, title="Exclusive-use", xlim=c(0, 250))
p3 <- create_plot(res_two, title="Double angio INRs")
ggarrange(p1, p2, p3, nrow=1, common.legend=TRUE, legend="bottom", labels=c("A", "B", "C"))
ggsave(path_fig2a)
```

![Figure 2A example 6](fig2a_example6.png)
Binary file added logbook/posts/2024_07_09/fig2a_example1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added logbook/posts/2024_07_09/fig2a_example2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions logbook/posts/2024_07_09/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Day 5"
author: "Amy Heather"
date: "2024-07-09"
categories: [reproduce]
bibliography: ../../../quarto_site/references.bib
---

## 09.04-09.06, 09.14-09.15: Continuing on Figure 2

I was curious to see how a different seed would impact the appearance of the figures, and so tried changing the seed from 200 to 300. However, it looks fairly similar.

![Figure 2A Example 1](fig2a_example1.png)

And with seed 500 too:

![Figure 2A Example 2](fig2a_example2.png)

## Untimed: Fixing GitHub commits and action for Quarto book

Had commit the produced .csv files without recognising they were too large. Undid commits with `git reset HEAD^`, and switched reproduction to only save relevant rows and columns, and to save as a compressed `.csv.gz` file. This required adding `R.utils` to the environment.

Also, modified `quarto_publish.yaml` to use `setup-r-dependencies`.

0 comments on commit eb462d8

Please sign in to comment.