Application of denovo package on simulated dataset

This section replicates analyses used in the Lee et al (2021) paper (the paper) with synthetic data. This section includes two sets of files:

Application
Simulation

Application

This part provides scripts that uses the simulated dataset ("simulated_dataset.csv") to conduct all the analyses used in the reference paper. Figure 2,3 and Table 4 can be replicated in a similar fashion.

Simulation

This part provides guidelines to replicating the simulation results such as Table 1 in the manuscript and Table 5 in the online supplementary materials.

Files description

analysis_conti.R: application of the denovo method with continuous outcomes (including sensitivity analyses)
analysis_binary.R: application of the denovo method with binary outcomes
analysis_binary_sensi.R: sensitivity analysis when outcomes are binary
simulated_dataset.csv: this data set contains 110,091 matched pairs with 4 individual level covariates (age, sex, race, medicaid eligibility) and 11 zip code-level covariates. The Zip code-level covariates are the within-pair averages. The outcomes are simulated by the authors.
simulated_dataset_continuous.csv:this dataset is similar to the previous dataset. However, the outcomes are generated as continuous outcomes instead. This data set will be used with "analysis_conti.R.
discovery_set_index.csv: this includes an index vector indicating that which subjects were used as the discovery sub-sample. The same index vector was used in analyzing the actual Medicare data used in the paper.
pvalue-discovery_continues.R: For continues outcome simulations
pvalue_discovery_binary.R: For binary outcome simulations
test_all.R: This script runs all mentioned tests and can be used by developers as a functional test.

Notes

Since handling continuous outcomes is much simpler than handling binary outcomes, it is recommended to start with this script first. In this script, we demonstrates how to implement the denovo method in order to discover a tree structure and conduct hypothesis tests. analysis_binary.R and analysis_binary_sensi.R can be considered to deal with binary outcomes. These two R script demonstrate the denovo method discussed in Section 3.5 of the paper.

The data generating process is discussed in Section 4. We considered 5 covariates, and two are effect modifiers among five. Also, three different splitting ratios (10%, 90%), (25%, 75$), (50%, 50%) were considered.

The outputs for both R scripts for simulations consist of two matrices: (1) pval.matrix and (2) check.matrix. pval.matrix includes the p-values that was used for power computation like in Table 1. check.matrix includes the discovery ratio of each covariate. This was used for Table 5 in the supplementary materials.

First, the number of columns for pval.matrix is 12. Each splitting ratio takes 4 columns. Among these four columns, first two are from CART and the next two are from CT. The first two columns represent the p-values by the truncated product method and the denovo method. Therefore, for each splitting ratio, (1) p-value from truncated product with CART, (2) p-value from denovo with CART, (3) p-value from truncatedP with CT, and (4) p-value from denovo with CT.
Second, the number of columns for check.matrix is 30. Each splitting ratio takes 10 columns. Among these 10 columns, first five are from CART and the next five are from CT. The first five columns represent whether each covariates x_i is used for split or not in CART. Similarly, the next five columns are defined for CT.

Both functions_binary.R and functions_binary_sensi.R are modified from the R scripts provided by the supplementary materials of Fogarty et al. (2016) and Fogarty et al. (2017)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

application_on_synthetic_data.md

application_on_synthetic_data.md

Application of denovo package on simulated dataset

Application

Simulation

Files description

Notes

Files

application_on_synthetic_data.md

Latest commit

History

application_on_synthetic_data.md

File metadata and controls

Application of denovo package on simulated dataset

Application

Simulation

Files description

Notes