Skip to content

vandenman/EqualitySampler.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EqualitySampler.jl

Docs Build Status

EqualitySampler.jl is a Julia library for considering all possible equality constraints across parameters and sampling fromt the posterior distribution over equality constraints.

Installation

EqualitySampler is a registered package, so it can be installed with the usual incantations:

julia> ]add EqualitySampler

or alternatively

julia> import Pkg; Pkg.add("EqualitySampler")

Functionality

EqualitySampler defines four distributions over partitions of a set:

  • UniformPartitionDistribution
  • BetaBinomialPartitionDistribution
  • CustomInclusionPartitionDistribution
  • RandomProcessPartitionDistribution

Each of these is a subtype of the abstract type AbstractPartitionDistribution, which is a subtype of Distributions.DiscreteMultivariateDistribution.

Thus, each of these types can be called with e.g., rand and logpdf.

While a partition is usually defined without duplicates, the methods here do assume duplicates are present. For example, given 3 parameters (θ₁, θ₂, θ₃) there exist 5 unique partitions:

partition constraints representation
{{θ₁, θ₂, θ₃}} θ₁ == θ₂ == θ₃ [1, 1, 1]
{{θ₁, θ₂}, {θ₃}} θ₁ == θ₂ != θ₃ [1, 1, 2]
{{θ₁, θ₃}, {θ₂}} θ₁ == θ₃ != θ₂ [1, 2, 1]
{{θ₁}, {θ₂, θ₃}} θ₁ != θ₂ == θ₃ [1, 2, 2]
{{θ₁}, {θ₂}, {θ₃}} θ₁ != θ₂ != θ₃ [1, 2, 3]

However, we also consider [2, 2, 2] and [3, 3, 3] to be valid and identical to [1, 1, 1]. The main reason for this is that in a Gibbs sampling scheme, a transition from [1, 2, 2] to [1, 1, 1] by updating only the first element would be a natural but impossible without duplicated partitions. The default logpdf accounts for duplicated partitions, use logpdf_model_distinct to evaluate the logpdf without duplicated partitions.

Built-in tests

The package contains two functions to explore equality constraints in specific models. Both use Turing.jl under the hood and return a Chains object with posterior samples from MCMCChains.jl.

Independent Binomials

proportion_test can be used to explore equality constraints across a series of independent Binomials.

using EqualitySampler, Distributions
import Random

Random.seed!(42)
n_groups = 5 # no. binomials.
true_partition     = rand(UniformPartitionDistribution(n_groups))
# [1, 2, 2, 1, 1]
temp_probabilities = rand(n_groups)
true_probabilities = temp_probabilities[true_partition]
# [0.165894, 0.613478, 0.613478, 0.165894, 0.165894]

# total no. trials
observations = rand(100:200, n_groups)
# [166, 164, 134, 127, 152]

# no. successes
successes = rand(product_distribution(Binomial.(observations, true_probabilities)))
# [27, 101, 90, 16, 23]

obs_proportions = successes ./ observations
[true_probabilities obs_proportions]
# 5×2 Matrix{Float64}:
#  0.165894  0.162651
#  0.613478  0.615854
#  0.613478  0.671642
#  0.165894  0.125984
#  0.165894  0.151316

partition_prior = BetaBinomialPartitionDistribution(n_groups, 1, n_groups)

no_iter = 100_000

alg_full = EqualitySampler.SampleRJMCMC(;iter = no_iter, fullmodel_only = true)
alg_eq   = EqualitySampler.EnumerateThenSample(;iter = no_iter)

prop_samples_full   = proportion_test(observations, successes, alg_full, partition_prior)
prop_samples_eqs    = proportion_test(observations, successes, alg_eq,   partition_prior)


estimated_probabilities_full = vec(mean(prop_samples_full.parameter_samples.θ_p_samples, dims = 2))
estimated_probabilities_eqs  = vec(mean(prop_samples_eqs.parameter_samples.θ_p_samples, dims = 2))
[true_probabilities obs_proportions estimated_probabilities_full estimated_probabilities_eqs]
# 5×4 Matrix{Float64}:
#  0.165894  0.162651  0.166634  0.150795
#  0.613478  0.615854  0.614287  0.638377
#  0.613478  0.671642  0.669006  0.641879
#  0.165894  0.125984  0.131644  0.149105
#  0.165894  0.151316  0.155732  0.150185

# note the shrinkage towards the group mean. Row 4 has a small observed proportion.
# Since rows 1 and 5 have a high posterior probability of being equal to row 4,

compute_post_prob_eq(prop_samples_eqs)
# 5×5 Matrix{Float64}:
#  0.0          0.0          0.0          0.0     0.0
#  3.33618e-18  0.0          0.0          0.0     0.0
#  3.09553e-20  0.93484      0.0          0.0     0.0
#  0.934021     1.05727e-18  1.0493e-20   0.0     0.0
#  0.944021     1.88964e-18  1.78492e-20  0.9391  0.0

# true equalities
[true_partition[i] == true_partition[j] && i < j for j in eachindex(true_partition), i in eachindex(true_partition)]
# 5×5 Matrix{Bool}:
#  0  0  0  0  0
#  0  0  0  0  0
#  0  1  0  0  0
#  1  0  0  0  0
#  1  0  0  1  0

compute_model_counts(prop_samples_eqs, false)
# OrderedCollections.OrderedDict{Vector{Int8}, Int64} with 10 entries:
#   [1, 2, 2, 1, 1] => 86102
#   [1, 2, 3, 1, 1] => 4938
#   [1, 2, 2, 3, 1] => 2777
#   [1, 2, 2, 3, 3] => 2466
#   [1, 2, 2, 1, 3] => 1993
#   [1, 2, 3, 4, 1] => 542
#   [1, 2, 3, 4, 4] => 442
#   [1, 2, 3, 1, 4] => 370
#   [1, 2, 2, 3, 4] => 264
#   [1, 2, 3, 4, 5] => 106

# and it so happens to be that the most visited model is also the true model
true_partition
# 5-element Vector{Int64}:
#  1
#  2
#  2
#  1
#  1

Post hoc tests in One-Way ANOVA

anova_test can be used to explore equality constraints across the levels of a single categorical predictor. The setup uses a grand mean $\mu$ and offsets $\theta_i$ for every level of the categorical predictor. To identify the model, the constraint $\sum_i\theta_i = 1$ is imposed.

using EqualitySampler, Distributions
import Random

# Simulate some data
Random.seed!(31415)
n_groups = 7
n_obs_per_group = 50
true_partition = rand(UniformPartitionDistribution(n_groups))

temp_θ = randn(n_groups)
true_θ = temp_θ[true_partition]
true_θ .-= mean(true_θ) # center temp_θ to avoid identification constraints

data_obj = simulate_data_one_way_anova(n_groups, n_obs_per_group, true_θ, true_partition)
data = data_obj.data

obs_offset = ([mean(data.y[data.g[i]]) for i in eachindex(data.g)] .- mean(data.y)) / std(data.y)
[true_θ obs_offset]
# 7×2 Matrix{Float64}:
#   0.253441   0.289433
#   0.253441   0.231883
#   0.253441   0.0709005
#  -0.197943  -0.297703
#   0.253441   0.339342
#   0.204135   0.277509
#  -1.01996   -0.911364

iter = 100_000

alg_full = EqualitySampler.SampleRJMCMC(;iter = no_iter, fullmodel_only = true)
alg_eq   = EqualitySampler.EnumerateThenSample(;iter = no_iter)

partition_prior = BetaBinomialPartitionDistribution(n_groups, 1, n_groups)
anova_samples_full   = anova_test(data.y, data.g, alg_full, partition_prior)
anova_samples_eqs    = anova_test(data.y, data.g, alg_eq,   partition_prior)

estimated_θ_full = vec(mean(anova_samples_full.parameter_samples.θ_cp, dims = 2))
estimated_θ_eqs  = vec(mean( anova_samples_eqs.parameter_samples.θ_cp, dims = 2))
[true_θ obs_offset estimated_θ_full estimated_θ_eqs]
# 7×4 Matrix{Float64}:
#   0.253441   0.289433    0.287095    0.250173
#   0.253441   0.231883    0.23036     0.237068
#   0.253441   0.0709005   0.0710826   0.155485
#  -0.197943  -0.297703   -0.296033   -0.280944
#   0.253441   0.339342    0.337171    0.258582
#   0.204135   0.277509    0.275538    0.247956
#  -1.01996   -0.911364   -0.905213   -0.86832


# posterior probabilities of equalities among the probabilities
compute_post_prob_eq(anova_samples_eqs)
# 7×7 Matrix{Float64}:
#  0.0         0.0         0.0         0.0        0.0         0.0         0.0
#  0.73923     0.0         0.0         0.0        0.0         0.0         0.0
#  0.591839    0.618516    0.0         0.0        0.0         0.0         0.0
#  0.0366705   0.0481045   0.177259    0.0        0.0         0.0         0.0
#  0.762704    0.736683    0.57209     0.0316826  0.0         0.0         0.0
#  0.753677    0.739104    0.597044    0.0384047  0.75827     0.0         0.0
#  5.46569e-8  2.93418e-7  5.64801e-5  0.129007   1.40949e-8  7.66761e-8  0.0

# true matrix
[i > j && p == q for (i, p) in enumerate(true_partition), (j, q) in enumerate(true_partition)]
# 7×7 Matrix{Bool}:
#  0  0  0  0  0  0  0
#  1  0  0  0  0  0  0
#  1  1  0  0  0  0  0
#  0  0  0  0  0  0  0
#  1  1  1  0  0  0  0
#  0  0  0  0  0  0  0
#  0  0  0  0  0  0  0

# note that the model mistakenly identifies groups 5 and 6 as equal
# these had 0.253441 and 0.339342 as their sample values and 0.204135 and 0.277509 as their true means.

# list the visited models (use compute_model_probs to obtain their posterior probability)
compute_model_counts(anova_samples_eqs, false)
# OrderedCollections.OrderedDict{Vector{Int8}, Int64} with 242 entries:
#   [1, 1, 1, 2, 1, 1, 3] => 32242
#   [1, 1, 1, 2, 1, 1, 2] => 10256
#   [1, 1, 2, 2, 1, 1, 3] => 9223
#   [1, 1, 2, 3, 1, 1, 4] => 4421
#   [1, 2, 2, 3, 1, 1, 4] => 2511
#   [1, 1, 1, 1, 1, 1, 2] => 2469
#   [1, 2, 2, 3, 1, 2, 4] => 1807
#   [1, 1, 2, 3, 1, 2, 4] => 1784
#   [1, 1, 1, 2, 3, 1, 4] => 1772
#   [1, 1, 1, 2, 3, 3, 4] => 1666
#   [1, 2, 1, 3, 2, 2, 4] => 1637
#   [1, 2, 2, 3, 2, 2, 4] => 1419
#   ⋮                     => ⋮

Supplementary Analyses

The simulations and analyses for the manuscript 'Flexible Bayesian Multiple Comparison Adjustment Using Dirichlet Process and Beta-Binomial Model Priors' are in the folder "simulations". Note that this folder is a Julia project, so in order to rerun the simulations it is necessary to first activate and instantiate the project.