Function census #133

athowes · 2024-06-27T09:02:54Z

athowes
Jun 27, 2024
Maintainer

This is a placeholder discussion for me writing about all the existing functions in the package. For each function I'll write whether I think we should keep it, and whether I see that there is an alternative better way to implement the functionality given the package redesign.

I will ping people when I am finished and ready to discuss this.

Status:

Simulation

I do find simulation functions to be useful as a developer (including e.g. to simulate data for a vignette or test), but am unsure that "running a simulation study" is a part of the core functionality of the package. Perhaps it is for more advanced users who might want to know what kind of biases they can expect for their observation process.

simulate_double_censored_pmf()

Simulate a censored PMF. Unsure exactly what this is doing. I like that you can put in any distribution for rprimary and rdelay. I don't know why there is a restriction to specify alpha and beta: suggests restriction to gamma distributions? Can we generalise here? Also how does it fit in with the below? Directing simulating the delay distribution rather than the case data?

Simulation of case data

So these three functions I think are all simulating case data in the same output format (a data.table with columns for case and ptime). Can we make these into one function with an argument for the type of simulated case data? Seems to be good to me to do that and probably not hard. Simplest is just distribution %in% c("exponential", "uniform", "gillespie"). (Maybe "Gillespie" isn't really a distribution like the others and we need whatever distribution it produces.)

simulate_exponential_cases(): Simulate exponential cases
simulate_uniform_cases(): Simulate cases from a uniform distribution
simulate_gillespie(): Simulate cases from a stochastic SIR model

simulate_secondary()

Simulate secondary events based on a delay distribution. So I think this is a good complement to the simulation of case data and makes sense. Perhaps in that vein we should have simulate_primary above.

Preprocess

calculate_censor_delay()

Calculate the mean difference between continuous and discrete event time.

What's continuous event time? What's discrete event time?
This takes truncated_obs as input. What's that? Observations with right truncation?

combine_obs()

Combine truncated and fully observed observations

I think this is a helper function for running the paper
It binds together truncated_obs and obs (appending a new column as obs_at)

construct_cases_by_obs_window()

Construct case counts by observation window based on secondary observations

Unsure what this is doing
It's outputting case counts

event_to_incidence()

Convert from event based to incidence based data.

Unsure what this is doing

linelist_to_counts()

For a target variable convert from individual data to counts. I found that linelist_to_counts(sim_obs) works. What's target_time? The time that the counts should correspond to? pad_zeros is whether there should be a row with zero cases if there are no cases? Need to try additional_by argument.

Needs to be documented and add unit tests. Possible unit tests:

contains time and cases columns
time increasing monotonically
total cases sum to correct number

linelist_to_cases()

I found that linelist_to_counts(sim_obs) works.

reverse_obs_at()

For the observation observed at variable reverse the factor ordering

Observe

drop_zero(): Drop zero observations as unstable in a lognormal distribution
filter_obs_by_obs_time(): Filter observations based on a observation time of secondary events
filter_obs_by_ptime(): Filter observations based on the observation time of primary events
observe_process(): Observation process for primary and secondary events
pad_zero(): Pad zero observations as unstable in a lognormal distribution

S3 generics

Should be kept.

Method default

Should be kept.

Latent individual model

Should be kept. (Perhaps reorganise these 3 sections [S3 generics, Method default, Latent individual model] see issue #162.)

Postprocess

add_natural_scale_mean_sd()

Add natural scale summary parameters for a lognormal distribution
So essentially this takes a dataframe of draws and will transform the meanlog and sdlog parameters to "natural scale" mean and sd parameters
Ways to generalise this beyond the lognormal? We have many possible families now. Do we need the classes?

correct_primary_censoring_bias()

Primary event bias correction
This takes draws, alters the mean by removing a random number in zero one (I guess this is supposing that the censoring interval is of length 1, but what about the two censoring windows?)
It then adds meanlog and sdlog parameters (the inverse of add_natural_scale_mean_sd)
Created issue to delete this Issue 167: Remove correct_primary_censoring_bias #168

draws_to_long()

Convert posterior lognormal samples to long format
I'd say something like this doesn't need to be in the package
If we do want to support functionality like this, then it shouldn't be so hard coded as to require the columns meanlog sdlog mean and sd

extract_lognormal_draws()

Extract posterior samples for a lognormal brms model
So this takes the linear predictors and transforms them out to predictors, and makes some name changes
Want to think about if this can be generalised to work with any epidist_family
Again maybe we'd need classes here. Or to give special support to e.g. lognormal and gamma

make_relative_to_truth()

Make posterior lognormal samples relative to true values
Joins some truth onto the draws
Creates some relative measure of truth recovery
Not sure we should be supporting this

sample_model()

Sample from the posterior of a model with additional diagnositics
If we do keep any of this functionality it should be integrated into epidist. Added an issue for this: Transfer functionality of sample_model to epidist::epidist #163.

summarise_draws() and summarise_variable()

Summarise posterior draws or a variable
I can see why this is useful functionality to have but I'm unsure we should be supporting it

Plot

calculate_cohort_mean(): Calculate the cohort-based or cumulative mean
calculate_truncated_means(): Calculate the truncated mean by observation horizon
plot_cases_by_obs_window(): Plot cases by observation window
plot_censor_delay(): Plot the mean difference between continuous and discrete event time
plot_cohort_mean(): Plot empirical cohort-based or cumulative mean
plot_empirical_delay(): Plot the empirical delay distribution
plot_mean_posterior_pred(): plot empirical cohort-based or cumulative mean vs posterior mean
plot_recovery(): Plot the posterior estimates as densities
plot_relative_recovery(): Plot the relative difference between true values and posterior estimates

Data

We should keep this.

Utility functions

epidist_stan_chunk(): Read in a epidist Stan code chunk
epidist_version_stanvar(): Label a epidist Stan model with a version indicator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function census #133

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Function census #133

athowes Jun 27, 2024 Maintainer

Simulation

Simulation of case data

Preprocess

Observe

S3 generics

Method default

Latent individual model

Postprocess

summarise_draws() and summarise_variable()

Plot

Data

Utility functions

Replies: 0 comments

athowes
Jun 27, 2024
Maintainer