diff --git a/06-naomi-aghq.Rmd b/06-naomi-aghq.Rmd index 57898a2..d330055 100755 --- a/06-naomi-aghq.Rmd +++ b/06-naomi-aghq.Rmd @@ -1555,7 +1555,7 @@ knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depe Latent field point estimates obtained from GPCA-AGHQ were closer to the gold-standard results from NUTS than those obtained from GEB (Figure \@ref(fig:mean-sd-alt-latent)). The root mean square error (RMSE) between posterior mean estimates from GPCA-AGHQ and NUTS (`r rmse_aghq_mean`) was `r abs(rmse_diff_mean)`% lower than that between GEB and NUTS (`r rmse_tmb_mean`). -For the posterior standard deviation estimates, there was a substantial `r abs(rmse_diff_sd)`% reduction in RMSE: from `r abs(rmse_tmb_sd)` (TMB) to `r abs(rmse_aghq_sd)` (PCA-AGHQ). +For the posterior standard deviation estimates, there was a substantial `r abs(rmse_diff_sd)`% reduction in RMSE: from `r abs(rmse_tmb_sd)` (GEB) to `r abs(rmse_aghq_sd)` (GPCA-AGHQ). However, puzzlingly, improvements in latent field estimate accuracy only transferred to model outputs to a limited extent (Figures \@ref(fig:mean-alt-output) and \@ref(fig:sd-alt-output)). #### Distributional quantities {#distributional-quantities} @@ -1682,7 +1682,7 @@ GEB and GPCA-AGHQ were substantially faster than NUTS, which took over two days Inaccuracies in model outputs from GEB and GPCA-AGHQ do have potential to meaningfully mislead policy (Sections \@ref(second90) and \@ref(one-inc)). As such, where possible, gold-standard NUTS results should be computed. -Though NUTS is to slow too run during a workshop, it could be run afterwards. +Though NUTS is too slow to run during a workshop, it could be run afterwards. As the UNAIDS HIV estimates process occurs annually, requiring days to compute more accurate estimates is viable. That said, Malawi is one of the countries with the fewest number of districts. As NUTS took days to run in Malawi, for larger countries, with hundreds of districts, it may be impossible to run NUTS to convergence, and approximate methods may be required. @@ -1721,7 +1721,7 @@ However, as `inlabru`, like `R-INLA`, is based on a formula interface, it may no #### Better quadrature grids -PCA-AGHQ is a sensible approach to allocating more computational to dimensions which contribute more to the integral in question. +PCA-AGHQ is a sensible approach to allocating more computational effort to dimensions which contribute more to the integral in question. However, its application to Naomi surfaced instances where it overlooked potential benefits, or otherwise did not behave as one might wish: 1. The amount of variation explained in the Hessian matrix may not be of direct interest. diff --git a/92-appendixC.Rmd b/92-appendixC.Rmd index f0b7ba6..6cfea37 100755 --- a/92-appendixC.Rmd +++ b/92-appendixC.Rmd @@ -419,7 +419,6 @@ knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depe knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/lognormconst.png")) ``` - ## Inference comparison ### Point estimates diff --git a/docs/404.html b/docs/404.html index c94a6c6..6abede6 100644 --- a/docs/404.html +++ b/docs/404.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/a-model-for-risk-group-proportions.html b/docs/a-model-for-risk-group-proportions.html index ef80e09..2fc1bf0 100644 --- a/docs/a-model-for-risk-group-proportions.html +++ b/docs/a-model-for-risk-group-proportions.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/bayes-st.html b/docs/bayes-st.html index 9c0a496..0a8037a 100644 --- a/docs/bayes-st.html +++ b/docs/bayes-st.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/beyond-borders.html b/docs/beyond-borders.html index 02417ee..47ae72b 100644 --- a/docs/beyond-borders.html +++ b/docs/beyond-borders.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/conclusions.html b/docs/conclusions.html index 11ff591..f5d6824 100644 --- a/docs/conclusions.html +++ b/docs/conclusions.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/hiv-aids.html b/docs/hiv-aids.html index f41c1a7..17e642d 100644 --- a/docs/hiv-aids.html +++ b/docs/hiv-aids.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/index.html b/docs/index.html index 1e0af51..c612345 100644 --- a/docs/index.html +++ b/docs/index.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/introduction.html b/docs/introduction.html index e7ca36d..5703d37 100644 --- a/docs/introduction.html +++ b/docs/introduction.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/main.pdf b/docs/main.pdf index 507d9a1..86d11dc 100644 Binary files a/docs/main.pdf and b/docs/main.pdf differ diff --git a/docs/models-for-areal-spatial-structure.html b/docs/models-for-areal-spatial-structure.html index 0cda943..352fd38 100644 --- a/docs/models-for-areal-spatial-structure.html +++ b/docs/models-for-areal-spatial-structure.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/multi-agyw.html b/docs/multi-agyw.html index ae10443..54b3688 100644 --- a/docs/multi-agyw.html +++ b/docs/multi-agyw.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/naomi-aghq-appendix.html b/docs/naomi-aghq-appendix.html index 8716e93..773d725 100644 --- a/docs/naomi-aghq-appendix.html +++ b/docs/naomi-aghq-appendix.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } diff --git a/docs/naomi-aghq.html b/docs/naomi-aghq.html index b72da0f..9b25ed5 100644 --- a/docs/naomi-aghq.html +++ b/docs/naomi-aghq.html @@ -75,7 +75,7 @@ } @media print { pre > code.sourceCode { white-space: pre-wrap; } -pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } +pre > code.sourceCode > span { display: inline-block; text-indent: -5em; padding-left: 5em; } } pre.numberSource code { counter-reset: source-line 0; } @@ -1785,7 +1785,7 @@

6.5.4.1 Point estimates6.24). The root mean square error (RMSE) between posterior mean estimates from GPCA-AGHQ and NUTS (0.063) was 20% lower than that between GEB and NUTS (0.078). -For the posterior standard deviation estimates, there was a substantial 60% reduction in RMSE: from 0.14 (TMB) to 0.05 (PCA-AGHQ). +For the posterior standard deviation estimates, there was a substantial 60% reduction in RMSE: from 0.14 (GEB) to 0.05 (GPCA-AGHQ). However, puzzlingly, improvements in latent field estimate accuracy only transferred to model outputs to a limited extent (Figures C.15 and C.16).

@@ -1888,7 +1888,7 @@

6.6.2 PCA-AGHQ with application t GEB and GPCA-AGHQ were substantially faster than NUTS, which took over two days to reach convergence.

Inaccuracies in model outputs from GEB and GPCA-AGHQ do have potential to meaningfully mislead policy (Sections 6.5.5.1 and 6.5.5.2). As such, where possible, gold-standard NUTS results should be computed. -Though NUTS is to slow too run during a workshop, it could be run afterwards. +Though NUTS is too slow to run during a workshop, it could be run afterwards. As the UNAIDS HIV estimates process occurs annually, requiring days to compute more accurate estimates is viable. That said, Malawi is one of the countries with the fewest number of districts. As NUTS took days to run in Malawi, for larger countries, with hundreds of districts, it may be impossible to run NUTS to convergence, and approximate methods may be required.

@@ -1925,7 +1925,7 @@

6.6.3.1 Further comparisons

6.6.3.2 Better quadrature grids

-

PCA-AGHQ is a sensible approach to allocating more computational to dimensions which contribute more to the integral in question. +

PCA-AGHQ is a sensible approach to allocating more computational effort to dimensions which contribute more to the integral in question. However, its application to Naomi surfaced instances where it overlooked potential benefits, or otherwise did not behave as one might wish:

  1. The amount of variation explained in the Hessian matrix may not be of direct interest. diff --git a/docs/search_index.json b/docs/search_index.json index d1ec1a8..69a2504 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "Bayesian spatio-temporal methods for small-area estimation of HIV indicators Welcome Acknowledgements Abbreviations Notations", " Bayesian spatio-temporal methods for small-area estimation of HIV indicators Adam Thomas Howes Abstract Progress towards ending AIDS as a public health threat by 2030 is not being made fast enough. Effective public health response requires accurate, timely, high-resolution estimates of epidemic and demographic indicators. Limitations of available data and statistical methodology make obtaining these estimates difficult. I developed and applied Bayesian spatio-temporal methods to meet this challenge. First, I used scoring rules to compare models for area-level spatial structure with both simulated and real data. Second, I estimated district-level HIV risk group proportions, enabling behavioural prioritisation of prevention services, as put forward in the UNAIDS Global AIDS Strategy. Third, I developed a novel deterministic Bayesian inference method, combining adaptive Gauss-Hermite quadrature with principal component analysis, motivated by the Naomi district-level model of HIV indicators. In developing this method, I implemented integrated nested Laplace approximations using automatic differentiation, enabling use of this algorithm for a wider class of models. Together, the contributions in this thesis help to guide precision HIV policy in sub-Saharan Africa, as well as advancing Bayesian methods for spatio-temporal data. Welcome This is the e-book version of my PhD thesis, submitted to Imperial College London in accordance with the requirements of the degree of Doctor of Philosophy in Modern Statistics and Statistical Machine Learning. If you would prefer, you can view the PDF version. The associated GitHub repository for this thesis is athowes/thesis. A concise introduction to the work is available via my thesis defense slides, or (slightly less concise) longer slides for a lab group meeting. The corrections for this thesis are also available online. If you notice any typos or other issues with the work, feel free to open an issue on GitHub, or submit a pull request. Acknowledgements I would first like to express my gratitude to Seth Flaxman and Jeff Imai-Eaton for their mentorship. Their guidance has been crucial in shaping this thesis, and my development as a scientist. Thanks to the HIV Inference Group at Imperial for exposing me to impact driven research, helping me to learn to present my work, and tolerating a statistician. I am grateful to have been a part of the Modern Statistics and Statistical Machine Learning Centre for Doctoral Training at Imperial and Oxford, and the Machine Learning and Global Health Network. Thanks to Antoine, Chris, Enrico, Phil, Yanni, Tim, Liza, and Theo for conversations, some of which were about research. This work was made possible by funding provided by the EPSRC and Bill & Melinda Gates Foundation. There are many worse ways to spend billions of dollars than fighting poverty and disease. Thanks to Mike McLaren, Kevin Esvelt, the Nucleic Acid Observatory team, and the Sculpting Evolution lab for hosting my visit to the MIT Media Lab. I left Cambridge with appropriately raised aspirations, Google document templates, and only a little terrified about the future. Thanks to Trenton, Lenni, Lenny, Geetha, Janika, Simon, Phil, Frances, Leilani and Tammy. Thanks to Alex Stringer, and the Department of Statistics and Actuarial Science, for hosting my visit to the University of Waterloo. Without Alex, Chapter 6 would not have been possible, and I’d still be waiting Markov chains began in Chapter 4 to converge. Tim Lucas and Patrick Brown put me in touch with Alex, and Håvard Rue and Finn Lindgren gave helpful answers on the R-INLA discussion group. Thanks also to Kate, my tour guide in Waterloo, and Midtown Yoga for helping me stay balanced. My sense for what matters has been shaped, and arguably improved, by the Effective Altruism community. Thank you to the Meridian, Trajan, and LEAH offices for hosting me this final year. Thanks to my housemates in Hackney: August, Dewi, Henry, Jerome, Johnny, and Tamara. Not to be all Bay area, but I’m proud of the community we’ve built. Pınar believed in me and my research at times when I didn’t. Thanks to Mr Sam, and attendees of the Manshead grit salt, for conferring upon me the status of stats man. No thanks to Simon Marshall, he didn’t help, if anything he held me back. I extend my deepest thanks to my parents, Deborah and Karl, and my grandparents, Kath and Tony, whose love and support have granted me the privilege to pursue my interests. Abbreviations Abbreviation Definition AIDS Acquired ImmunoDeficiency Syndrome AIS AIDS Indicator Survey ANC Antenatal Clinic AGHQ Adaptive Gauss-Hermite Quadrature ART Antiretroviral Therapy BIC Bayesian Information Criterion BF Bayes Factor CAR Conditionally Auto-regressive CCD Central Composite Design CDC Centers for Disease Control and Prevention CPO Conditional Predictive Ordinate CRPS Continuous Ranked Probability Score DALY Disability Adjusted Life Year DDC Data Defect Correlation DHS Demographic and Health Surveys DIC Deviance Information Criterion EB Empirical Bayes ECDF Empirical Cumulative Difference Function ELGM Extended Latent Gaussian Model ESS Effective Sample Size FSW Female Sex Worker(s) GA Gaussian Process GLM Generalised Linear Model GLMM Generalised Linear Mixed effects Model GMRF Gaussian Markov Random Field Global Fund Global Fund to Fight AIDS, Tuberculosis, and Malaria HMC Hamiltonian Monte Carlo HIV Human Immunodeficiency Virus ICAR Intrinsic Conditionally Auto-regressive IID Independent and Identically Distributed INLA Integrated Nested Laplace Approximation LM Linear Model LGM Latent Gaussian Model LS Log Score MCMC Markov Chain Monte Carlo MSM Men who have Sex with Men NUTS No-U-Turn Sampler PEP Post-Exposure Prophylaxis PEPFAR President’s Emergency Plan for AIDS Relief PHIA Population-based HIV Impact Assessment PIT Probability Integral Transform PLHIV People Living with HIV PPL Probabilistic Programming Language PrEP Pre-Exposure Prophylaxis PMTCT Prevention of Mother-to-Child Transmission PWID People Who Inject Drugs SAE Small-Area Estimation SR Scoring Rule SPSR Strictly Proper Scoring Rule SSA Sub-Saharan Africa STI Sexually Transmitted Infection TGP Transgender People TaSP Treatment as Prevention UNAIDS The Joint United Nations Programme on HIV/AIDS VI Variational Inference VMMC Voluntary Medical Male Circumcision WAIC Watanabe-Akaike Information Criterion Notations Notation Definition \\(\\propto\\) Proportional to. \\(\\mathbb{R}\\) The set of real numbers. \\(\\mathbb{Z}\\) The set of integers. \\(\\mathbb{Z}^+\\) The set of positive integers. \\(\\rho\\) HIV prevalence. \\(\\lambda\\) HIV incidence. \\(\\alpha\\) ART coverage. \\(\\mathcal{S}\\) Spatial study region \\(\\mathcal{S} \\subseteq \\mathbb{R}^2\\). \\(s \\in \\mathcal{S}\\) Point location. \\(\\mathcal{T}\\) Temporal study period \\(\\mathcal{T} \\subseteq \\mathbb{R}\\). \\(t \\in \\mathcal{T}\\) Time. \\(\\mathbf{y}\\) Data, a \\(n\\)-vector \\((y_1, \\ldots, y_n)\\). \\(\\boldsymbol{\\phi}\\) Parameters, a \\(d\\)-vector \\((\\phi_1, \\ldots, \\phi_d)\\). \\(\\mathbf{x}\\) Latent field, a \\(N\\)-vector \\((x_1, \\ldots, x_N)\\). \\(\\boldsymbol{\\theta}\\) Hyperparameters, a \\(m\\)-vector \\((\\theta_1, \\ldots, \\theta_m)\\). \\(x \\sim p(x)\\) \\(x\\) has the probability distribution \\(p(x)\\). \\(A_i\\) Areal unit. \\(A_i \\sim A_j\\) Adjacency between areal units. \\(\\mathbf{u}\\) Random effects, often spatial. \\(\\mathbf{H}\\) Hessian matrix. \\(\\mathbf{R}\\) Structure matrix. \\(\\mathbf{Q}\\) Precision matrix. \\(\\boldsymbol{\\mathbf{\\Sigma}}\\) Covariance matrix. \\(\\mathbf{M}^{-}\\) The generalised inverse of a (potentially rank-deficient) matrix \\(\\mathbf{M}\\). \\(\\mathcal{N}\\) Gaussian distribution. \\(k: \\mathcal{X} \\times \\mathcal{X} \\to \\mathbb{R}\\) Kernel function on the space \\(\\mathcal{X}\\). \\(A_i \\sim A_j\\) Adjacency between areal units. \\(\\mathcal{Q}\\) A set of quadrature nodes. \\(\\omega: \\mathcal{Q} \\to \\mathbb{R}\\) A quadrature weighting function. \\(\\mathcal{Q}(m, k)\\) Gauss-Hermite quadrature points in \\(m\\) dimensions with \\(k\\) nodes per dimension, constructed according to a product rule. \\(\\varphi\\) A standard (multivariate) Gaussian density. "],["introduction.html", "1 Introduction 1.1 Chapter overview", " 1 Introduction This thesis is about applied and methodological Bayesian statistics. It is applied and methodological in that the primary concern is real-world questions and the means to answer them. The statistical approach is Bayesian because probability theory is used to arrive at conclusions based on models for observed data. The applied focus of this thesis is in obtaining the strategic information needed to plan the response to the HIV (human immunodeficiency virus) epidemic in sub-Saharan Africa (SSA). Over 40 years since the beginning of the epidemic, HIV is the largest annual cause of disability adjusted life years (DALYs) among non-infants in SSA [Global Burden of Disease Collaborative Network (2019); Figure 1.1]. Quantification of the epidemic using statistics is a crucial part of the public health response. Effective implementation of HIV prevention and treatment requires strategic information. However, producing suitable estimates of relevant indicators is complicated by a range of statistical challenges. Figure 1.1: HIV is the largest cause of annual DALYs among individuals aged >1 year in SSA (Global Burden of Disease Collaborative Network 2019). One DALY represents the loss of the equivalent of one year of full health, and is calculated by the sum of years of life lost and years lost due to disability. Weights used to account for disability vary between 0 (full health) and 1 (death) depending on the severity of the condition. The data used were gathered in national household surveys or routinely collected from healthcare facilities providing HIV services. An important feature of these data are the location and time at which observations were recorded. Spatio-temporal data have important recurring commonalities across a diverse range of application settings. The work conducted in this thesis uses and aspires to contribute to techniques from spatio-temporal statistics. Computation is an essential part of modern statistical practice. Each project in this thesis, and the thesis itself, is accompanied by R (R Core Team 2022) code, hosted on GitHub at https://github.com/athowes. To facilitate reproducible research, the R package orderly (FitzJohn et al. 2023) was used to structure code repositories. 1.1 Chapter overview This thesis is structured as follows: Chapter 2 provides an overview of the HIV/AIDS epidemic and describes the challenges faced by surveillance efforts. Chapter 3 introduces the statistical concepts and notation used throughout the thesis, focusing on Bayesian modelling and computation, spatio-temporal statistics, and survey methods. Chapter 4: The prevailing model for spatial structure used in small-area estimation (Besag, York, and Mollié 1991) was intended to analyse a grid of pixels. In disease mapping, areas correspond to the administrative divisions of a country, which are typically not a grid. I used simulation and survey data studies to evaluate the practical consequences of this concern. Chapter 5: Adolescent girls and young women are a demographic group at disproportionate risk of HIV infection. The Global AIDS Strategy recommends prioritising interventions on the basis of behaviour to prevent the most new infections using the limited available resources. I estimated the size of behavioural risk groups across priority countries to enable implementation of this strategy. Additionally, I assessed the potential benefits of the strategy in terms of numbers of new infections prevented. This work (Howes et al. 2023) was included in the UNAIDS (Joint United Nations Programme on HIV/AIDS) Global AIDS Update 2022 and 2023. Chapter 6: The Naomi small-area estimation model (Eaton et al. 2021) is used by countries to estimate district-level HIV indicators. First, to allow for compatibility with Naomi, I implemented the integrated nested Laplace approximations using automatic differentiation, opening the door to a new class of fast, flexible, and accurate Bayesian inference algorithms. The implementation was using models for a clinical trial of an epilepsy drug, and for the prevalence of the parasitic worm Loa loa. Second, I developed an approximate Bayesian inference method combining adaptive Gauss-Hermite quadrature with principal components analysis. I applied these methods to data from Malawi, and analysed the consequences of the inference method choice for policy relevant outcomes. Chapter 7: Finally, I discuss contributions of the research, avenues for future work, and some broader reflections. Though chronological order is recommended, Chapters 4, 5 and 6 may be read in any order, or as stand-alone studies, if preferred. References Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. FitzJohn, Rich, Robert Ashton, Alex Hill, Martin Eden, Wes Hinsley, Emma Russell, and James Thompson. 2023. Orderly: Lightweight Reproducible Reporting. Global Burden of Disease Collaborative Network. 2019. “Global Burden of Disease Study 2019 (GBD 2019) Results.” Institute for Health Metrics and Evaluation (IHME). https://vizhub.healthdata.org/gbd-results/. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. "],["hiv-aids.html", "2 The HIV/AIDS epidemic 2.1 Background 2.2 HIV surveillance", " 2 The HIV/AIDS epidemic 2.1 Background HIV is a retrovirus which infects humans. If untreated, infection with HIV can develop into a more advanced stage known as acquired immunodeficiency syndrome (AIDS). HIV primarily attacks a type of white blood cell vital for proper function of the immune system. As a result, AIDS is characterised by increased risk of developing opportunistic infections such as tuberculosis or Pneumocystis pneumonias, which can result in death. The first AIDS cases were reported in Los Angeles in the early 1980s (Gottlieb et al. 1981; Barré-Sinoussi et al. 1983). Since then, HIV has spread globally. Transmission occurs by exposure to specific bodily fluids of an infected person. The most common mode of transmission is via unprotected anal or vaginal sex. Transmission can also occur from a mother to her baby, or when drug injection equipment is shared. Approximately 86 million people have become infected with HIV, and of those 40 million have died of AIDS-related causes (UNAIDS 2023a). An ongoing global effort has been made to respond to the epidemic. The multifaceted response has been shaped by local communities, civil society organisations, national governments, research institutions, pharmaceutical companies, international agencies like the Joint United Nations Programme on HIV/AIDS (UNAIDS), and global health initiatives such like the President’s Emergency Plan for AIDS Relief (PEPFAR) and the Global Fund to Fight AIDS, Tuberculosis, and Malaria (the Global Fund). As an indication of the scale of the response, the investment of $100 billion by PEPFAR constitutes the “largest commitment by a single nation to address a single disease in history” (U.S. Department of State 2022). Figure 2.1: Globally, yearly new HIV infections peaked in 1995, and have since decreased by 59%. Yearly AIDS-related deaths peaked in 2004, and have since decreased by 68% (UNAIDS 2023a). Much of the global disease burden is concentrated in eastern and southern Africa, as well as western and central Africa. The unit “M” refers to millions. The colour palette used in this figure, and throughout the thesis, is that of Okabe and Ito (2008). It is designed to be colour-blind friendly, and the default used by Wilke (2019). Implementation of HIV prevention and treatment has significantly reduced the number of new HIV infections and AIDS-related deaths per year since their respective peaks (Figure 2.1). The most significant evidence-based interventions, in more or less chronological order of introduction, are described below: Condoms are an inexpensive and effective method for prevention of HIV and other sexually transmitted infections (STIs) such as Chlamydia trachomatis, Neisseria gonorrhoeae, syphilis, and Trichomonas vaginalis. Condom usage has increased significantly since 1990, which is estimated to have averted 117 million new HIV infections (Stover and Teng 2021). However, there remain significant but difficult to close gaps in condom usage. Antiretroviral therapy (ART) is a combination of drugs which stop the virus from replicating in the body. A person living with HIV who takes ART daily can live a full and healthy life, transforming what was once a terminal illness to a treatable chronic condition. Of the 39 million people living with HIV (PLHIV) in 2022, around 76% were accessing ART. The number of AIDS-related deaths, 21 million, estimated to have been averted by ART is staggering (UNAIDS 2023b). ART reduces the amount of virus in the blood and genital secretions. If the virus is undetectable then there is significant evidence that it cannot be transmitted sexually (Cohen et al. 2011; Broyles et al. 2023). For this reason, in addition to providing life saving treatment, ART also operates as prevention. Approaches to lowering risk of HIV transmission in this way are referred to as treatment as prevention (TaSP). Particular efforts have been made to provide pregnant women with ART to reduce the chance of mother-to-child transmission (MTCT) (Siegfried et al. 2011). Voluntary medical male circumcision (VMMC) partially protects against female-to-male HIV acquisition. Three landmark randomised control trials (RCTs) (Auvert et al. 2005; Gray et al. 2007; R. C. Bailey et al. 2007) found complete surgical removal of the foreskin to result in a reduction of HIV acquisition in men by 50-60%. Based on this evidence, VMMC has been recommended since 2007 by the World Health Organization (WHO) and UNAIDS as a key HIV intervention in high-prevalence settings (WHO and UNAIDS 2007). Scale up of VMMC across 15 priority countries between 2008 and 2019 is estimated to have already averted 340 thousand new HIV infections, though the future number of new HIV infections averted is likely to be much higher (McGillen et al. 2018; UNAIDS and WHO 2021). Pre-exposure prophylaxis (PrEP) and post-exposure prophylaxis (PEP) are antiretroviral drugs which can be taken before and after exposure to prevent transmission. PrEP has been shown to be effective at an individual level across a number of RCTs (Baeten et al. 2012; Thigpen et al. 2012), but there are few population-level studies. Though PEP cannot be studied with RCTs, observational studies indicate it is highly effective (Dominguez et al. 2016). These medical interventions are more costly than other prevention options, so are primarily useful in high risk settings. Though implementation of these interventions has enabled important progress, there remains much more to do. In 2022, 1.3 million people were newly infected with HIV and there were 630 thousand AIDS-related deaths, more than one death every minute (UNAIDS 2022). Bold fast-track targets have been set to accelerate the end of AIDS as a global public health threat by 2030 (UN General Assembly 2016). To meet these targets in the context of disruption to HIV services caused by the COVID-19 pandemic and a potential shortfall in HIV funding, renewed commitments are required (Economist Impact 2023). For available resources to have the greatest impact, it is important that the right HIV interventions are prioritised to the right populations, in the right place, and at the right time. By analogy to precision medicine, this paradigm has been termed precision public health (Khoury, Iademarco, and Riley 2016). While precision medicine tailors treatments to individuals, precision public health tailors treatments to populations. The importance of precision public health is underscored by the vast potential differences in the cost-effectiveness of any given intervention, with some interventions orders of magnitude more impactful than others (Ord 2013). Figure 2.2: Adult (15-49) HIV prevalence varies substantially both within and between countries in SSA. The estimates from 2023 were generated by country teams using the Naomi small-area estimation model in a process supported by UNAIDS, and are available from UNAIDS (2023a). White filled points are country-level estimates, and coloured points are district-level estimates. Results from Nigeria were not published. Data collection in the Cabo Delgado province of Mozambique was disrupted by conflict. Obtaining results for the Democratic Republic of the Congo required removing some districts from the model. Disease burden varies substantially across multiple spatial scales. In some countries, the epidemic is concentrated in small populations, and national HIV prevalence is low. In others, the epidemic is sustained by heterosexual transmission, and national HIV prevalence is higher (typically >1%). These two epidemic settings are sometimes described as concentrated and generalised, respectively. Most countries severely affected by HIV are in sub-Saharan Africa (SSA). It is estimated that 66% of the 39 million PLHIV worldwide live in SSA. HIV prevalence in adults aged 15-49 is above 10% in some countries in southern Africa. Some districts even exceed 20% (Figure 2.2). Indeed, just as there is variation between countries, there is variation within countries. As an illustration, adult HIV prevalence at the district municipality level in South Africa ranges from 6% in Namakwa to 30% in uMkhanyakude. Accordingly, the work in this thesis is centred on measurement of HIV at the district level in SSA. In all countries and contexts, some groups of people are at much higher risk than others. Groups of people at increased risk of HIV infection are known as key populations (KPs). Examples include men who have sex with men (MSM), female sex workers (FSW), people who inject drugs (PWID), and transgender people (TGP) (Stevens et al. 2023). KPs are often marginalised, and face legal and social barriers. Concentrated settings are defined by the majority of new HIV infections occurring in KPs and their sexual partners. In generalised settings like SSA, though concentrated subepidemics do occur (Tanser et al. 2014), risk is more diffuse across the population. In SSA adolescent girls and young women (AGYW) are a large demographic group at increased risk of HIV infection (Risher et al. 2021; Monod et al. 2023) but not typically considered a KP. Chapter 5 focuses on measurement of HIV for AGYW and FSW. There are a number of ways to practically implement differentiated HIV treatment and prevention services (Godfrey-Faussett et al. 2022). These include geographic and demographic prioritisation (Meyer-Rath et al. 2018), key population services (Organization et al. 2022), and risk screening based on individual-level risk characteristics (Jia et al. 2022). Each approach requires strategic information about HIV disease burden. This thesis focuses on using HIV surveillance to inform geographic and demographic prioritisation. 2.2 HIV surveillance HIV surveillance refers to the collection, analysis, interpretation and dissemination of data relating to HIV (Pisani et al. 2003). Surveillance can be used to track epidemic indicators, identify at-risk populations, uncover drivers of transmission, implement prevention and treatment programs, and assess their impact. Important indicators to measure include: HIV prevalence is the proportion \\(\\rho \\in [0, 1]\\) of a population who have HIV. The number of PLHIV is given by \\(N\\rho\\), where \\(N\\) is the (living) population size. Increases in HIV prevalence, and the number of PLHIV, can be caused either by new HIV infections or more PLHIV remaining alive by taking treatment. For this reason caution should be taken in directly interpreting changes in HIV prevalence. Nonetheless, as a primary measure of population disease burden, HIV prevalence is vital in calculating all of the other indicators given below. HIV incidence is the rate \\(\\lambda > 0\\) of new HIV infections. In writing, HIV incidence is often given as a number of new infections per 1000 person years. The number of new HIV infections that occur during a given time is the integral of the rate of HIV incidence over time \\(\\lambda_t\\) multiplied by the size of the susceptible population. Let \\(\\rho_t\\) be the HIV prevalence, and \\(N_t\\) be the population size, at time \\(t\\). Then the number of new HIV infections which occur during a given period of time are given by \\[ I = \\int \\lambda_t \\cdot (1 - \\rho_t) \\cdot N_t \\text{d}t. \\] Planning, delivery, and evaluation of prevention programming relies on estimates of HIV incidence and the number of new HIV infections. Knowing whether the rate of new infections is rising or declining within specific populations is crucial. ART coverage is the proportion \\(\\alpha \\in [0, 1]\\) of PLHIV who are on ART. The number of people taking ART is given by \\(N \\cdot \\rho \\cdot \\alpha\\). Estimates of ART coverage play a direct role in planning provision of treatment services, and finding unmet treatment need. Recent infection is the proportion \\(\\kappa \\in [0, 1]\\) of PLHIV who have been recently infected. Recency assays use biomarkers to distinguish between recent and longstanding infection, with varying sensitivity and specificity. Estimates of recent infection are primarily used to help estimate HIV incidence (Kassanjee et al. 2012; UNAIDS, WHO, et al. 2022). Awareness of status is the proportion \\(\\xi \\in [0, 1]\\) of PLHIV who have been diagnosed with HIV. Programming of HIV testing and diagnosis is informed by estimates of awareness of HIV status. HIV diagnosis allows for linkage to care and progression along the HIV treatment cascade and care continuum (CDC 2014). 2.2.1 Data Measuring the HIV indicators above requires data. To give the most complete picture of the epidemic, it is important to use multiple sources of data. The most prominent categories are: Household surveys are large, national, cross-sectional studies. The surveys conducted in the most countries are Demographic and Health Surveys [DHS ;USAID (2012)], which include a wide range of health related questions, and more HIV-specific Population-based HIV Impact Assessment [PHIA; ICAP (2023)] and AIDS Indicator Surveys (AIS). Some countries also implement their own survey series, such as the South Africa Behavioural, Sero-status and Media Impact Survey (SABSSM). Household surveys provide high quality standardised data about HIV, typically designed to furnish nationally-representative estimates. Both DHS and PHIA surveys collect demographic, behavioural, and clinical information. Additionally, HIV testing is conducted via home-based testing, with results returned immediately, or anonymous dried blood spot testing. Programmatic data refer to data routinely collected during delivery of health services. Examples include data from antenatal care (ANC), HIV testing, and ART service delivery. Due to their integration with regular service delivery, programmatic data are available at higher frequency than other data sources. However, in comparison with designed studies, less control can be exercised over collection of programmatic data. It is common to encounter issues of data quality and reliability, as well as bias, in working with programmatic data. Cohort studies follow a group of people over time. Outcomes may be measured more systematically in a cohort study than in other study designs. The data from cohort studies have particular use in informing otherwise difficult to estimate epidemiological parameters. Such parameters include disease progression and mortality rates, transmission dynamics, and treatment outcomes. Examples of population-based cohort studies in SSA include the Manicaland Project Open Cohort Study in Zimbabwe (Gregson et al. 2006), the Rakai Community Cohort Study in Uganda (Grabowski et al. 2017), and the Karonga Demographic Surveillance Site in Malawi (Crampin et al. 2012). 2.2.2 Challenges Obtaining reliable, timely estimates of the HIV indicators at an appropriate spatial resolution using the available data sources is challenging. The most significant difficulties faced are enumerated below, providing important context for the work in this thesis: Data sparsity: Collection of data is costly and time consuming. As a result, limited direct data might be available for the particular time, location, or population of interest. For example, in many countries the last conducted household survey is several years out of date. Furthermore, the sample sizes in household surveys are typically designed to be representative at a national-level. As a result, data for subpopulations are usually sparse. Missing data: The sampling frame of a survey may not correspond to the target population. For example, some KPs are difficult to reach, and may be omitted from sampling frames (Jin, Restar, and Beyrer 2021). Additionally, individuals included on the sampling frame may choose not to respond. Each of these issues can be characterised as being problems of missing data. Response and measurement biases: Individuals may be hesitant to disclose their HIV status, or report higher risk behaviours, due to social desirability bias or a fear of discrimination or stigma. Furthermore, individuals may be unaware of their HIV status. When available, biomarker data can be used to overcome under-reporting of HIV status, but still may be subject to measurement errors. Biases in behavioural data can be more difficult to disentangle. Denominators and demography: Many indicators are rates or proportions, which rely on estimates of the population at risk in the denominator. For example, HIV prevalence is a proportion of the population, and HIV incidence is a rate per person-years at risk. Accurately estimating population denominators over space, time, and demographies is itself a challenging task (Tatem 2017). Taking a ratio of uncertain quantities amplifies uncertainty, but is rarely properly accounted for. Inconsistent data collection and reporting: The sources of data that are collected might vary across space and time. Additionally, reporting protocols or definitions for the same data source can also change. Though household surveys tend to be more consistent than programmatic data, the questions included and design of the surveys do change. Reliance on epidemiological parameters: Indicators rely on estimates of epidemiological parameters such as rates of disease progression. These parameters may not generalise to the setting of interest. Further, they are typically applied coarsely, and without proper accounting for uncertainty. 2.2.3 Statistical approaches The challenges above make direct interpretation of the data often misleading or impossible. Careful statistical modelling is required to mitigate these limitations as effectively as possible. The most important statistical approaches for estimating HIV indicators used in this thesis are: Borrowing information: When little direct data are available, data judged to be indirectly related can be used to help improve estimation. For example, if limited data are available for individuals of a certain age, it is likely reasonable to make use of data for individuals of a similar age. As well as over age groups, information can be borrowed between and within countries, and across times. Chapter 4 discusses models for borrowing information over space. These models, along with others for borrowing information in other dimensions, are applied in Chapters 5 and 6. Evidence synthesis: Multiple sources of evidence can be combined to overcome the limitations of any one data source. For example, infrequently run household surveys can be complemented by more up-to-date programmatic data. Chapter 6 develops methods suitable for the complex statistical models required to integrate data sources. Multiple data sources are used in Chapter 5 to overcome the limitations of household surveys for measuring KP population sizes. Expert guidance: Expert epidemiological, demographic, and local stakeholder guidance can be used to improve estimates. Ensuring the quality of any data used in the estimation process is essential. Indeed, careful validation of data by country teams is a crucial part of the yearly UNAIDS HIV estimates process. Uncertainty quantification: Conclusions drawn by synthesising multiple incomplete data sources are unlikely to be firm and unanimous. It is therefore especially important that the uncertainties inherent to any statistical analysis are accurately and transparently presented. The Bayesian statistical paradigm introduced in Chapter 3 and used throughout this thesis is particularly well suited to handling of uncertainty. References Auvert, Bertran, Dirk Taljaard, Emmanuel Lagarde, Joelle Sobngwi-Tambekou, Rémi Sitta, and Adrian Puren. 2005. “Randomized, controlled intervention trial of male circumcision for reduction of HIV infection risk: the ANRS 1265 Trial.” PLOS Medicine 2 (11): e298. Baeten, Jared M, Deborah Donnell, Patrick Ndase, Nelly R Mugo, James D Campbell, Jonathan Wangisi, Jordan W Tappero, et al. 2012. “Antiretroviral Prophylaxis for HIV Prevention in Heterosexual Men and Women.” New England Journal of Medicine 367 (5): 399–410. Bailey, Robert C, Stephen Moses, Corette B Parker, Kawango Agot, Ian Maclean, John N Krieger, Carolyn FM Williams, Richard T Campbell, and Jeckoniah O Ndinya-Achola. 2007. “Male circumcision for HIV prevention in young men in Kisumu, Kenya: a randomised controlled trial.” The Lancet 369 (9562): 643–56. Barré-Sinoussi, Françoise, Jean-Claude Chermann, Fran Rey, Marie Therese Nugeyre, Sophie Chamaret, Jacqueline Gruest, Charles Dauguet, et al. 1983. “Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS).” Science 220 (4599): 868–71. Broyles, Laura N, Robert Luo, Debi Boeras, and Lara Vojnov. 2023. “The risk of sexual transmission of HIV in individuals with low-level HIV viraemia: a systematic review.” The Lancet. CDC. 2014. “Understanding the HIV Care Continuum.” CDC. http://www.cdc.gov/hiv/pdf/dhap_continuum.pdf. Cohen, Myron S, Ying Q Chen, Marybeth McCauley, Theresa Gamble, Mina C Hosseinipour, Nagalingeswaran Kumarasamy, James G Hakim, et al. 2011. “Prevention of HIV-1 infection with early antiretroviral therapy.” New England Journal of Medicine 365 (6): 493–505. Crampin, Amelia C, Albert Dube, Sebastian Mboma, Alison Price, Menard Chihana, Andreas Jahn, Angela Baschieri, et al. 2012. “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology 41 (3): 676–85. Dominguez, Kenneth L., Dawn K. Smith, Vasavi Thomas, Nicole Crepaz, Karen Lang, Walid Heneine, Janet M. McNicholl, et al. 2016. “Updated Guidelines for Antiretroviral Postexposure Prophylaxis After Sexual, Injection Drug Use, or Other Nonoccupational Exposure to HIV—United States, 2016.” https://stacks.cdc.gov/view/cdc/38856. Economist Impact. 2023. “A triple dividend: the health, social and economic gains from financing the HIV response in Africa.” Godfrey-Faussett, Peter, Luisa Frescura, Quarraisha Abdool Karim, Michaela Clayton, Peter D Ghys, and 2025 prevention targets working group). 2022. “HIV Prevention for the Next Decade: Appropriate, Person-Centred, Prioritised, Effective, Combination Prevention.” PLOS Medicine 19 (9): e1004102. Gottlieb, Michael S, Howard M Schanker, Peng Thim Fan, Andrew Saxon, Joel D Weisman, Irving Pozalski, et al. 1981. “Pneumocystis pneumonia—Los Angeles.” Morbidity and Mortality Weekly Report 30 (21): 1–3. Grabowski, M Kate, David M Serwadda, Ronald H Gray, Gertrude Nakigozi, Godfrey Kigozi, Joseph Kagaayi, Robert Ssekubugu, et al. 2017. “HIV prevention efforts and incidence of HIV in Uganda.” New England Journal of Medicine 377 (22): 2154–66. Gray, Ronald H, Godfrey Kigozi, David Serwadda, Frederick Makumbi, Stephen Watya, Fred Nalugoda, Noah Kiwanuka, et al. 2007. “Male circumcision for HIV prevention in men in Rakai, Uganda: a randomised trial.” The Lancet 369 (9562): 657–66. Gregson, Simon, Geoffrey P Garnett, Constance A Nyamukapa, Timothy B Hallett, James JC Lewis, Peter R Mason, Stephen K Chandiwana, and Roy M Anderson. 2006. “HIV decline associated with behavior change in eastern Zimbabwe.” Science 311 (5761): 664–66. ICAP. 2023. “Population-based HIV impact assessment: guiding the global HIV response.” https://phia.icap.columbia.edu. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Jin, Harry, Arjee Restar, and Chris Beyrer. 2021. “Overview of the Epidemiological Conditions of HIV Among Key Populations in Africa.” Journal of the International AIDS Society 24: e25716. Kassanjee, Reshma, Thomas A. McWalter, Till Bärnighausen, and Alex Welte. 2012. “A New General Biomarker-Based Incidence Estimator.” Epidemiology 23 (5). Khoury, Muin J, Michael F Iademarco, and William T Riley. 2016. “Precision public health for the era of precision medicine.” American Journal of Preventive Medicine 50 (3): 398–401. McGillen, Jessica B, John Stover, Daniel J Klein, Sinokuthemba Xaba, Getrude Ncube, Mutsa Mhangara, Geraldine N Chipendo, et al. 2018. “The Emerging Health Impact of Voluntary Medical Male Circumcision in Zimbabwe: An Evaluation Using Three Epidemiological Models.” PLOS One 13 (7): e0199453. Meyer-Rath, Gesine, Jessica B McGillen, Diego F Cuadros, Timothy B Hallett, Samir Bhatt, Njeri Wabiri, Frank Tanser, and Thomas Rehle. 2018. “Targeting the Right Interventions to the Right People and Places: The Role of Geospatial Analysis in HIV Program Planning.” AIDS (London, England) 32 (8): 957. Monod, Mélodie, Andrea Brizzi, Ronald M. Galiwango, Robert Ssekubugu, Yu Chen, Xiaoyue Xi, Edward Nelson Kankaka, et al. 2023. “Longitudinal Population-Level HIV Epidemiologic and Genomic Surveillance Highlights Growing Gender Disparity of HIV Transmission in Uganda.” Nature Microbiology. Okabe, Masataka, and Kei Ito. 2008. “Color Universal Design (CUD): How to Make Figures and Presentations That Are Friendly to Colorblind People.” 2008. http://jfly.iam.u-tokyo.ac.jp/color/. Ord, Toby. 2013. “The moral imperative toward cost-effectiveness in global health.” Center for Global Development 12. Organization, World Health et al. 2022. Consolidated Guidelines on HIV, Viral Hepatitis and STI Prevention, Diagnosis, Treatment and Care for Key Populations. World Health Organization. Pisani, Elizabeth, Stefano Lazzari, Neff Walker, and Bernhard Schwartländer. 2003. “HIV surveillance: a global perspective.” Journal of Acquired Immune Deficiency Syndromes 32: S3–11. Risher, Kathryn A, Anne Cori, Georges Reniers, Milly Marston, Clara Calvert, Amelia Crampin, Tawanda Dadirai, et al. 2021. “Age patterns of HIV incidence in eastern and southern Africa: a modelling analysis of observational population-based cohort studies.” The Lancet HIV 8 (7): e429–39. Siegfried, Nandi, Lize van der Merwe, Peter Brocklehurst, and Tin Tin Sint. 2011. “Antiretrovirals for reducing the risk of mother-to-child transmission of HIV infection.” Cochrane Database of Systematic Reviews, no. 7. Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. Stover, John, and Yu Teng. 2021. “The impact of condom use on the HIV epidemic.” Gates Open Research 5. Tanser, Frank, Tulio de Oliveira, Mathieu Maheu-Giroux, and Till Bärnighausen. 2014. “Concentrated HIV sub-epidemics in generalized epidemic settings.” Current Opinion in HIV and AIDS 9 (2): 115. Tatem, Andrew J. 2017. “WorldPop, open data for spatial demography.” Scientific Data 4 (1): 1–4. Thigpen, Michael C, Poloko M Kebaabetswe, Lynn A Paxton, Dawn K Smith, Charles E Rose, Tebogo M Segolodi, Faith L Henderson, et al. 2012. “Antiretroviral Preexposure Prophylaxis for Heterosexual HIV Transmission in Botswana.” New England Journal of Medicine 367 (5): 423–34. UN General Assembly. 2016. “Political Declaration on HIV and AIDS: On the Fast Track to Accelerate the Fight Against HIV and to End the AIDS Epidemic by 2030.” In. UNAIDS. 2022. “In Danger: UNAIDS Global AIDS Update 2022.” https://www.unaids.org/en/resources/documents/2022/in-danger-global-aids-update. ———. 2023a. “AIDSinfo: Global data on HIV epidemiology and response.” https://aidsinfo.unaids.org/. ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. UNAIDS and WHO. 2021. “Voluntary Medical Male Circumcision Progress Brief.” UNAIDS. https://hivpreventioncoalition.unaids.org/wp-content/uploads/2021/04/JC3022_VMMC_4-pager_En_v3.pdf. UNAIDS, WHO, et al. 2022. Using Recency Assays for HIV Surveillance: 2022 Technical Guidance. World Health Organization. U.S. Department of State. 2022. “Latest Global Program Results.” https://www.state.gov/wp-content/uploads/2022/11/PEPFAR-Latest-Global-Results_December-2022.pdf. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. WHO and UNAIDS. 2007. “New Data on Male Circumcision and HIV Prevention: Policy and Programme Implications.” Geneva: World Health Organization. Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. "],["bayes-st.html", "3 Bayesian spatio-temporal statistics 3.1 Bayesian statistics 3.2 Spatio-temporal statistics 3.3 Model structure 3.4 Model comparison 3.5 Survey methods", " 3 Bayesian spatio-temporal statistics 3.1 Bayesian statistics Bayesian statistics is a mathematical paradigm for learning from data. Two reasons stand out as to why it is especially well suited to facing the challenges presented in Section 2.2. First, it allows for principled and flexible integration of prior domain knowledge. Second, uncertainty over all unknown quantities is handled as an integral part of the Bayesian paradigm. This section provides a brief and at times opinionated overview of Bayesian statistics. For a more complete introduction, I recommend Gelman et al. (2013), McElreath (2020) or Gelman et al. (2020). 3.1.1 Bayesian modelling The Bayesian approach to data analysis is based on construction of a probability model for the observed data \\(\\mathbf{y} = (y_1, \\ldots, y_n)\\). Parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\phi_1, \\ldots, \\phi_d)\\) are used to describe features of the data. Both the data and parameters are assumed to be random variables, and their joint probability distribution is written as \\(p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\). Subsequent calculations, and the conclusions which follow from them, are made based on manipulating the model using probability theory. Models are most naturally constructed from two parts, known as the likelihood \\(p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}})\\) and the prior distribution \\(p(\\boldsymbol{\\mathbf{\\phi}})\\). The joint distribution is obtained by the product of these two parts \\[\\begin{equation} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) = p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}}). \\tag{3.1} \\end{equation}\\] The likelihood, as a function of \\(\\boldsymbol{\\mathbf{\\phi}}\\) with \\(\\mathbf{y}\\) fixed, reflects the probability of observing the data when the value of the parameters is \\(\\boldsymbol{\\mathbf{\\phi}}\\). The prior distribution encapsulates beliefs about the parameters \\(\\boldsymbol{\\mathbf{\\phi}}\\) before the data are observed. Recommendations for specifying prior distributions vary. The extent to which subjective information should be incorporated into the prior distribution is a central issue. Proponents of the objective Bayesian paradigm (Berger 2006) put forward that the prior distribution should be non-informative, so as not to introduce subjectivity into the analysis. Others see subjectivity as fundamental to scientific inquiry, with no viable alternative (Goldstein 2006). Though subjectivity is typically discussed with regard to the prior distribution, as we will in Section 3.3, the distinction between prior distribution and likelihood is not always clear. As such, it may be argued that issues of subjectivity are not unique to prior distribution specification, and ultimately that the challenge of specifying the data generating process – that is, \\(p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\) – is better thought of more holistically (Gelman, Simpson, and Betancourt 2017). The probability model can be simulated from to obtain samples \\((\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\sim p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\). If samples of the data \\(\\mathbf{y}\\) differ too greatly from what the analyst would expect to see in reality, then the model fails to capture their prior scientific understanding. Models that do not produce plausible data samples can be refined. Checks of this kind [Gelman et al. (2013); Chapter 6] can be used to help iteratively build models, gradually adding complexity as required. 3.1.2 Bayesian computation Figure 3.1: An example of Bayesian modelling and computation for a simple one parameter model. Here the likelihood is \\(y_i \\sim \\text{Poisson}(\\phi)\\) for \\(i = 1, 2, 3\\) and the prior distribution on the rate parameter \\(\\phi > 0\\) is \\(\\phi \\sim \\text{Gamma}(3, 1)\\). Observed data \\(\\mathbf{y} = (1, 2, 3)\\) was simulated from the distribution \\(\\text{Poisson}(2.5)\\). As such, the true data generating process is within the space of models being considered. This situation is sometimes known (Bernardo and Smith 2001) as the \\(\\mathcal{M}\\)-closed world, in contrast to the \\(\\mathcal{M}\\)-open world where the model is said to be misspecified. Further, the posterior distribution is available in closed form as \\(\\text{Gamma}(9, 4)\\). This is because the posterior distribution is in the same family of probability distributions as the prior distribution. Models of this kind are described as being conjugate. Conjugate models are often used because of their convenience. Though other models may be more suitable, Bayesian inference will typically be more computationally demanding than for conjugate models. The posterior distribution here is more tightly peaked than the prior distribution. Contraction of this kind is typical, but not always the case. Having constructed a model (Equation (3.1)), the primary goal in a Bayesian analysis is to obtain the posterior distribution \\(p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\). This distribution encapsulates probabilistic beliefs about the parameters given the observed data. As such, the posterior distribution has a central role in use of the statistical analysis for decision making. Using the eponymous Bayes’ theorem, the posterior distribution is obtained by \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\frac{p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})}{p(\\mathbf{y})} = \\frac{p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}})}{p(\\mathbf{y})}. \\tag{3.2} \\end{equation}\\] Unfortunately, most of the time it is intractable to calculate the posterior distribution analytically. This is because of the potentially high-dimensional integral \\[\\begin{equation} p(\\mathbf{y}) = \\int p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}} \\tag{3.3} \\end{equation}\\] in the denominator of Equation (3.2). The result of this integral is known as the evidence \\(p(\\mathbf{y})\\), and quantifies the probability of obtaining the data under the model. Hence, although it is easy to evaluate a quantity proportional to the posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\propto p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}}), \\end{equation}\\] it is typically difficult to evaluate the posterior distribution itself. Further, even given a closed form expression for the posterior distribution, if \\(\\boldsymbol{\\mathbf{\\phi}}\\) is of moderate to high dimension, then it is not obvious how to evaluate expressions of interest, which usually themselves are integrals, or expectations, with respect to the posterior distribution. The difficulty in performing Bayesian inference may be thought of as analogous to the difficulty in calculating integrals. As with integration, in specific cases closed form analytic solutions are available. Figure 3.1 illustrates one such case, where the prior distribution and posterior distribution are in the same family of probability distributions. In the more general case, no analytic solution is available, and computational methods must be relied on. Broadly, computational strategies for approximating the posterior distribution (Martin, Frazier, and Robert 2023) may be divided into Monte Carlo algorithms and deterministic approximations. 3.1.2.1 Monte Carlo algorithms Monte Carlo algorithms (Robert and Casella 2005) aim to generate samples from the posterior distribution \\[\\begin{equation} \\boldsymbol{\\mathbf{\\phi}}_s \\sim p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}), \\quad s \\in 1, \\ldots S. \\tag{3.4} \\end{equation}\\] These samples may be used in any future computations involving the posterior distribution or functions of it. For example, if \\(G = G(\\boldsymbol{\\mathbf{\\phi}})\\) is a function, then the expectation of \\(G\\) with respect to the posterior distribution can be approximated by \\[\\begin{equation} \\mathbb{E}(G \\, | \\, \\mathbf{y}) = \\int G(\\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\phi}} \\approx \\frac{1}{S} \\sum_{s = 1}^S G(\\boldsymbol{\\mathbf{\\phi}}_s), \\end{equation}\\] using the samples from the posterior distribution in Equation (3.4). Most quantities of interest can be cast as posterior expectations, which may then be approximated empirically using samples in this way. Of course, it remains to discuss how the samples are obtained. Markov chain Monte Carlo (MCMC) methods (Roberts and Rosenthal 2004) are the most popular class of sampling algorithms. Using MCMC, posterior samples are generated by simulating from an ergodic Markov chain with the posterior distribution as its stationary distribution. The Metropolis-Hastings [MH; Metropolis et al. (1953); Hastings (1970)] algorithm uses a proposal distribution \\(q(\\boldsymbol{\\mathbf{\\phi}}_{s + 1} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}_s)\\) to generate candidate parameters for the next step in the Markov chain. These candidate parameters are then accepted or rejected with some probability determined based on their log-posterior evaluation. Many MCMC algorithms, including the Gibbs sampler (Geman and Geman 1984), can be thought of as special cases of MH. Other notable classes of sampling algorithms include importance sampling [IS; Tokdar and Kass (2010)] methods, which uses weighted samples, sequential Monte Carlo [SMC; Chopin, Papaspiliopoulos, et al. (2020)] methods, which are based on sampling from a sequence of distributions, and approximate Bayesian computation [ABC; Sisson, Fan, and Beaumont (2018)], which works by comparing simulated data to observed data, and does not require evaluation of the log-posterior. Though these methods have found applications in specific domains, MCMC is currently more widely used. The most important benefits of MCMC are its generality, theoretical reliability, and implementation in accessible software packages. Illustrating the use of MCMC being supported by software, this thesis uses the No-U-Turn sampler [NUTS; Hoffman, Gelman, et al. (2014)], a Hamiltonian Monte Carlo [HMC; Duane et al. (1987); Neal et al. (2011)] algorithm, as implemented in the Stan (Carpenter et al. 2017) probabilistic programming language (PPL). HMC uses derivatives of the posterior distribution to generate efficient MH proposal distributions based on Hamiltonian dynamics. Three tuning parameters control the behaviour of the HMC algorithm [Section 15.2; Stan Development Team (2023)]. NUTS automatically adapts these parameters based on local properties of the posterior distribution. Though not a one-size-fits-all solution, NUTS has been shown empirically to be a good choice for sampling from a range of posterior distributions. Figure 3.2 shows an example of using the NUTS MCMC algorithm to sample from a posterior distribution. After running an MCMC sampler, it is important that diagnostic checks are used to evaluate whether the Markov chain has reached its stationary distribution. If so, the Markov chain is said to have converged, and its samples may be used to compute posterior quantities. Though it is possible to check poor convergence in some cases, we may never be sure that a Markov chain has converged, and thus that results computed from MCMC will be accurate. Panel 3.2B shows the traceplot for a Markov chain which appears to have converged, and moves freely through the range of plausible parameter values. A range of convergence diagnostics have been developed for MCMC (Roy 2020; C. C. Margossian and Gelman 2023). Two widely used examples are the potential scale reduction factor \\(\\hat R\\) (Gelman and Rubin 1992), which compares the variance between and within parallel Markov chains, and the effective sample size (ESS), which measures the efficiency of samples drawn from MCMC. Figure 3.2: NUTS can be used to sample from the posterior distribution described in Figure 3.1. Panel A shows a histogram of the NUTS samples as compared to the true posterior. The visual appearance of a histogram depends highly on the number of bins chosen, though it does not depend on tuning parameters like kernel density estimation. Other visualisations, such as empirical cumulative difference function plots, though less initially intuitive, are preferred for accurate distributional sample comparisons. Panel B is a traceplot showing the path of the Markov chain \\(\\{\\phi_s\\}_{s = 1}^{1000}\\) as it explores the posterior distribution. In this case, the Markov chain moves freely throughout the posterior distribution, without getting stuck in any one location for long, indicating good performance of the sampler. Panel C shows convergence of the empirical posterior mean \\(\\frac{1}{s} \\sum_{l \\leq s} \\phi_l\\) to the true value of \\(\\mathbb{E}(\\phi)\\) as more iterations of the Markov chain are included in the sum. In this case, the samples from NUTS are highly accurate in estimating this posterior expectation. 3.1.2.2 Deterministic approximations The Monte Carlo methods discussed in Section 3.1.2.1 make use of stochasticity to generate samples from the posterior distribution. Deterministic approximations offer an alternative approach, often focused more directly on approximating the posterior distribution or posterior normalising constant. These approaches can be faster than Monte Carlo methods, especially for large datasets or models. That said, they lack strong theoretical guarantees of accuracy. One prominent deterministic approximation is the Laplace approximation. It involves approximating the posterior normalising constant using Laplace’s method of integration. This is equivalent to approximating the posterior distribution by a Gaussian distribution. Numerical integration, or quadrature, is another deterministic approach in which the posterior normalising constant is approximated using a weighted sum of evaluations of the unnormalised posterior distribution. The integrated nested Laplace approximation [INLA; Håvard Rue, Martino, and Chopin (2009)] combines quadrature with the Laplace approximation. These methods are used throughout this thesis. In depth discussion is left to Chapter 6. Variational inference [VI; Blei, Kucukelbir, and McAuliffe (2017)] is another important deterministic approximation. The well-known expectation maximisation [EM; Dempster, Laird, and Rubin (1977)] and expectation propagation [EP; Minka (2001)] algorithms are closely related to VI. In VI, the approximate posterior distribution is assumed to belong to a particular family of functions. Optimisation algorithms are then used to choose the best member of that family, typically by minimising the Kullback-Leibler divergence to the posterior distribution. VI lacks theoretical guarantees and is known to often inaccurately estimate posterior variances (Giordano, Broderick, and Jordan 2018). As such, statisticians tend to approach VI with caution, despite its relative widespread acceptance within the machine learning community. Developing diagnostics to evaluate the accuracy of VI is an important area of ongoing research (Yao et al. 2018). 3.1.3 Interplay between modelling and computation Modern computational techniques and software like PPLs have succeeded in abstracting away calculation of the posterior distribution from the analyst for many models. However, computation remains intractable in, depending on the measure used, what can be argued to be the majority of cases. The analyst needs therefore not only to be concerned with choosing a model suitable for the data, but with choosing a model for which the posterior distribution may tractably be calculated in reasonable time. As such, there is an important interplay between modelling and computation, wherein models are bound by the limits of computation. As computational techniques and tools improve, the space of models available to the analyst expands. Exactly the focus of Chapter 6 is on expanding the space of models practically available to analysts. 3.2 Spatio-temporal statistics Space and time are important features of infectious disease data, including those related to HIV. The field of spatio-temporal statistics (Cressie and Wikle 2015) is concerned with such observations, indexed by spatial and temporal location. It unifies the fields of spatial statistics (Bivand et al. 2008), concerned with observations indexed by space, and time series analysis (Shumway and Stoffer 2017), concerned with observations indexed by time. First, Section 3.2.1 characterises the shared properties of spatio-temporal data. Then, Section 3.2.2 describes how these properties facilitate the class of small-area estimation methods used in this thesis. 3.2.1 Properties of spatio-temporal data Three important properties are discussed in this section: scale, correlation structure, and size. 3.2.1.1 Scale Figure 3.3: In Panel A, the spatial location of Cape Town in South Africa can be considered a point, and the ZF Mgcawu District Municipality (DM) can be considered as an an area. In Panel B, World AIDS Day, designated on the 1st of December every year, can be considered a point in time, whereas the second fiscal quarter, running through April, May and June, and denoted by Q2 represents a period of time. In reality, both Cape Town and World AIDS Day are areas, rather than true point locations. Instances of infinitesimal point locations in everyday life, outside of mathematical abstraction, are rare. The scale of spatio-temporal data refers to its extent and resolution. Its extent is the size of the spatial study region and length of time over which data was collected. Its resolution is how fine-grained those observations were. In this thesis, the spatial study region \\(\\mathcal{S} \\subseteq \\mathbb{R}^2\\) used is typically a country or collection of countries. It is assumed to have two dimensions, corresponding to latitude and longitude. Observations may be associated to a point \\(s \\in \\mathcal{S}\\) or area \\(A \\subseteq \\mathcal{S}\\) in the spatial study region, illustrated in Panel A3.3. The temporal study period \\(\\mathcal{T} \\subseteq \\mathbb{R}\\) can more generally be assumed to be one-dimensional. This feature, together with the fact that time only moves forward, is what distinguishes space and time. As with space, observations may be associated to a point \\(t \\in \\mathcal{T}\\) or period of time \\(T \\subseteq \\mathcal{T}\\), illustrated in Panel B3.3. The change-of-support problem (Gelfand, Zhu, and Carlin 2001) occurs when data are modelled at a scale different to the one it was observed at. For example, in this thesis, particularly Chapter 4, point data are modelled at an area-level. Special cases of the change-of-support problem include downscaling, upscaling, and dealing with so-called misaligned data. It is also possible that spatio-temporal observations of the same process are made at multiple scales. Jointly modelling data at different scales simultaneously is another closely related challenge to the change-of-support problem. 3.2.1.2 Correlation structure In “The Design of Experiments” Fisher (1936) observed that neighbouring crops were more likely to have similar yields than those far apart. This observation was later termed Tobler’s first law of geography: “everything is related to everything else, but near things are more related than distant things” (Tobler 1970). As well as space, Tobler’s first law applies to time, in that observations made close together in time tend to be similar. This law can be formalised using space-time covariance functions, measuring the dependence of observations across their spatial and temporal dimensions. A space-time covariance structure (Porcu, Furrer, and Nychka 2021) is said to be separable when it can be factorised as a product of individual spatial and temporal covariances, and nonseparable when it can’t. A separable space-time covariance could have spatial and temporal components which are either independent and identically distributed (IID) or structured (Knorr-Held 2000). Spatial covariance functions are called isotropic when they apply equally in all directions, and stationary when they are invariant over space. Temporal covariance structures are often periodic, corresponding to daily, weekly, monthly, quarterly, or yearly cycles. That spatio-temporal data are rarely IID is a statistically important point. The consequence is that it is rare to have true replicates available. Typically, only a single instance of a spatio-temporal can ever be realised. 3.2.1.3 Size Data with both spatial and temporal dimensions are often large. For example, observations collected every week across a number of sites in a country can easily number in the thousands. Storage and mathematical operations with large spatio-temporal data can be challenging. Further, models for spatio-temporal data typically require many parameters. Whereas large IID data can be modelled using a small number of parameters, each observation in a spatio-temporal dataset may need to be characterised by its own parameters. In combination, large data (big \\(n\\)) and models with a large number of parameters (big \\(d\\)) make Bayesian inference, and other complex mathematical operations, challenging for spatio-temporal data. 3.2.2 Small-area estimation Figure 3.4: Simulation of a simple random sample \\(y_i \\sim \\text{Bin}(m, p_i)\\) with varying sample size \\(m = 5, 25, 125\\) in each of the \\(i = 1, \\ldots, 156\\) constituencies of Zambia. Direct estimates were obtained by the empirical ratio of data to sample size. Modelled estimates were obtained using a logistic regression with linear predictor given by an intercept and a spatial random effect. Estimates of HIV indicators for Zambia have previously been generated at the district-level, comprising 116 spatial units. Moving forward, there is interest in generating estimates at the higher-resolution constituency level, as program planning is devolved locally. The viridis colour palette, as implemented by the viridis R package (Garnier et al. 2023), was used in this figure. It is used often throughout this thesis because it is perceptually uniform and accessible to colourblind viewers (Smith and Walt 2015). This figure was adapted from a presentation given for the Zambia HIV Estimates Technical Working Group, available from https://github.com/athowes/zambia-unaids. Figure 3.5: The setting of this figure matches that of Figure 3.4. Estimates from surveys with higher sample size have higher sample Pearson correlation coefficient \\(R\\) with the underlying truth, illustrating the benefit of collecting more data. For a fixed sample size however, correlation can be improved by using modelled estimates to borrow information across spatial units, rather than using the higher variance direct estimates. Points along the dashed diagonal line correspond to agreement between the estimate obtained from the survey and the underlying truth used to generate the data. For each sample size, using a spatial model increases the correlation between the estimates and underlying truth. The effect is more pronounced for lower sample sizes. Data always has some cost to collect. This cost can be significant and prohibitive. Especially for data relating to people, where collection is difficult to automate. In spatio-temporal statistics, there are a large number of possible locations in space and time. Given the cost of data collection, often no or limited direct observations may be available for any given space-time location. Direct estimates of indicators of interest are either impossible or inaccurate in this setting. Small-area estimation [SAE; Pfeffermann et al. (2013)] methods aim to overcome the limitations of small data by sharing information. In the spatio-temporal setting sharing of information occurs across space and time. Prior knowledge that observations in one spatio-temporal location are correlated with those at another (Section 3.2.1.2) can be used to improve estimates. Figures 3.4 and 3.5 illustrate the unreliability of direct estimates from small sample sizes, and the benefit of using a spatial model to overcome this limitation. The effect is most pronounced for the sample size of 5, where the only possible direct estimates are 0, 0.2, 0.4, 0.6, 0.8 and 1. Using a spatial model to borrow information across space in this case results in improvement of the Pearson correlation coefficient between the estimates and the true underlying values from 0.34 to 0.53. SAE methods are not only useful in the spatio-temporal setting. More generally, they apply in any situation where data are limited for subpopulations of interest. Just as these subpopulations can be generated by spatio-temporal variables, they can be generated by other variables. One such example is demographic variables. Analogous to spatio-temporal correlation structure, we also can often expect there to be demographic correlation structure. For example, those of the same sex are more likely to be similar, as are those of similar ages or socio-economic strata. 3.3 Model structure The spatio-temporal data used in this thesis are not IID (Section 3.2.1.2). This section discusses ways to use statistical models to encode more complex relations between observations mathematically. Simple structures are discussed first, beginning with the linear model. Extensions are introduced one at a time, culminating in the model structures used throughout the thesis. 3.3.1 Linear model In a linear model, each observation \\(y_i\\) with \\(i \\in [n]\\) is modelled using a Gaussian distribution \\[\\begin{equation} y_i \\sim \\mathcal{N}(\\mu_i, \\sigma). \\end{equation}\\] The conditional mean \\(\\mu_i\\) is assumed to be linearly related to a collection of \\(p\\) covariates \\(z_{1i}, \\ldots, z_{pi}\\) \\[\\begin{align} \\mu_i &= \\eta_i \\\\ \\eta_i &= \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li}. \\tag{3.5} \\end{align}\\] Priors may be placed on the regression coefficients, as well as the observation standard deviation \\[\\begin{align} \\beta_l &\\sim p(\\beta_l), \\quad l = 0, \\ldots, p, \\\\ \\sigma &\\sim p(\\sigma). \\end{align}\\] While the linear model provides a useful foundation, its strong assumptions and limited flexibility call for careful use. 3.3.2 Generalised linear model Generalised linear models (GLMs) extend the linear model by allowing the conditional mean \\(\\mu_i\\) to be connected to the linear predictor \\(\\eta_i\\) via a link function \\(g\\) as follows \\[\\begin{align} y_i &\\sim p(y_i \\, | \\, \\eta_i), \\\\ \\mu_i &= \\mathbb{E}(y_i \\, | \\, \\eta_i) = g(\\eta_i). \\end{align}\\] The logistic function \\(g(\\eta) = \\exp(\\eta) / (1 + \\exp(\\eta))\\) is commonly used as a link function to ensure that the conditional mean is in the range \\([0, 1]\\). Similarly, the exponential function \\(g(\\eta) = \\exp(\\eta)\\) can be used to ensure the conditional mean is positive. The linear model is a special case of a GLM where the link function \\(g\\) is the identity. As well, GLMs admit a wider range of likelihoods \\(p(y_i \\, | \\, \\eta_i)\\) than linear models, typically restricted to the so-called exponential family of distributions. The equation for the linear predictor is the same as the linear model case in Equation (3.5). 3.3.3 Generalised linear mixed effects model In a generalised linear mixed effects model (GLMM) the linear predictor of the GLM is extended as follows \\[\\begin{equation} \\eta_i = \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li} + \\sum_{k = 1}^{r} u_k(w_{ki}). \\tag{3.6} \\end{equation}\\] The terms \\(\\beta_l\\) are referred to as fixed effects. The terms \\(u_k\\) are called random effects, of additional covariates \\(w_{ki}\\). The words fixed and random effects have notoriously many different and incompatible definitions which unfortunately can cause confusion (Gelman 2005). Random effects allow for more complex sharing of information between observations. To demonstrate this fact, first consider the model \\[\\begin{equation} \\eta_i = \\beta_0. \\end{equation}\\] In this model all observations are assumed to be equivalent, and as such information is said to be completely pooled together. Second, consider the so-called no pooling model \\[\\begin{equation} \\eta_i = \\beta_0 + \\beta_1 z_i, \\end{equation}\\] with \\(z_i \\in \\{0, 1\\}\\) a binary covariate. Now, there are two groups of observations, each of which with its own mean: \\(\\beta_0\\) for the first group and \\(\\beta_0 + \\beta_1\\) for the second. No amount of information is shared between the two groups. Finally, consider an intermediate between these two extremes, known as the partial pooling model. In the partial pooling model, the extent to which information is shared between groups is learnt rather than fixed to either extreme at the outset, as with the complete or no pooling models. The parameter \\(\\beta_0\\) applies to all groups, and each group is differentiated by a specific value of the random effects \\(u_i\\). Random effects can be structured to share information between some observations more than others. In spatio-temporal statistics, structured spatial and temporal random effects are often used to encode smoothness in space or time. In contrast, unstructured random effects treat groups of observations as being exchangeable. Generalised additive models [GAMs; Wood (2017); Hastie and Tibshirani (1987)] are another class of models which extend GLMs. Though GAMs place more of a focus on using \\(u_k\\) to model non-linear relationships between covariates and the response variable, they can also be cast to fit into the GLMM framework. 3.3.4 Latent Gaussian model Latent Gaussian models [LGMs; Håvard Rue, Martino, and Chopin (2009)] are a type of GLMMs in which Gaussian priors are used for many of the models parameters. More specifically, these parameters are \\(\\beta_0\\), \\(\\{\\beta_j\\}\\), \\(\\{u_k(\\cdot)\\}\\), and can be collected into a vector \\(\\mathbf{x} \\in \\mathbb{R}^N\\) called the latent field. The Gaussian prior distribution is then \\[\\begin{equation} \\mathbf{x} \\sim \\mathcal{N}(\\mathbf{0}, \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}}_2)^{-1}), \\end{equation}\\] where \\(\\boldsymbol{\\mathbf{\\theta}}_2 \\in \\mathbb{R}^{s_2}\\) are hyperparameters, with \\(s_2\\) assumed small. The vector \\(\\boldsymbol{\\mathbf{\\theta}}_1 \\in \\mathbb{R}^{s_1}\\), with \\(s_1\\) assumed small, are additional parameters of the likelihood. Let \\(\\boldsymbol{\\mathbf{\\theta}} = (\\boldsymbol{\\mathbf{\\theta}}_1, \\boldsymbol{\\mathbf{\\theta}}_2) \\in \\mathbb{R}^m\\) with \\(m = s_1 + s_2\\) be all hyperparameters, with prior distribution \\(p(\\boldsymbol{\\mathbf{\\theta}})\\). The posterior distribution under an LGM is then \\[\\begin{equation} p(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\propto p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] with the complete set of parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\), and \\(N + m = d\\). In an LGM, like the more general GLMM case as given in Equation (3.6), there is a one-to-one correspondence between observations \\(y_i\\) and elements of the linear predictor \\(\\eta_i\\). 3.3.5 Extended latent Gaussian model Extended latent Gaussian models [ELGMs; Stringer, Brown, and Stafford (2022)] facilitate modelling of data with greater non-linearities than an LGM. In an ELGM, the structured additive predictor is redefined as \\[\\begin{equation} \\boldsymbol{\\mathbf{\\eta}} = (\\eta_1, \\ldots \\eta_{N_n}), \\end{equation}\\] where \\(N_n \\in \\mathbb{N}\\) is a function of \\(n\\). Unlike in the LGM case, it is possible that \\(N_n \\neq n\\). Each mean response \\(\\mu_i\\) now depends on some subset \\(\\mathcal{J}_i \\subseteq [N_n]\\) of indices of \\(\\boldsymbol{\\mathbf{\\eta}}\\), with \\(\\cup_{i = 1}^n \\mathcal{J}_i = [N_n]\\) and \\(1 \\leq |\\mathcal{J}_i| \\leq N_n\\), where \\([N_n] = \\{1, \\ldots, N_n\\}\\). The inverse link function \\(g(\\cdot)\\) is redefined for each observation to be a possibly many-to-one mapping \\(g_i: \\mathbb{R}^{|\\mathcal{J}_i|} \\to \\mathbb{R}\\), such that \\(\\mu_i = g_i(\\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i})\\). Put together, ELGMs are of the form \\[\\begin{align*} y_i &\\sim p(y_i \\, | \\, \\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}, \\boldsymbol{\\mathbf{\\theta}}_1), \\quad i = 1, \\ldots, n, \\\\ \\mu_i &= \\mathbb{E}(y_i \\, | \\, \\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}) = g_i(\\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}), \\\\ \\eta_j &= \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li} + \\sum_{k = 1}^{r} u_k(w_{ki}), \\quad j \\in [N_n]. \\end{align*}\\] The latent field and hyperparameter prior distributions are equivalent to the LGM case. Though the ELGM model class was only introduced recently, it connects much of the work done in this thesis. While it can be transformed to an LGM using the Poisson-multinomial transformation (Baker 1994), the multinomial logistic regression model used in Chapter 5 is most naturally written as an ELGM, where each observation depends on the set of structured additive predictors corresponding to the set of multinomial observations. In Chapter 6, the Naomi small-area estimation model used to produce estimates of HIV indicators is shown to have ELGM-like features. 3.4 Model comparison Many models can be fit to the same data during the course of an analysis. Model comparison methods are used to determine which is the most suitable for use. This section focuses on measuring suitability via the model’s predictive performance (Vehtari and Ojanen 2012). Ideally, new data \\(\\tilde{\\mathbf{y}} = (\\tilde{y}_1, \\ldots, \\tilde{y}_n)\\) drawn from the true data generating process would be available to test predictive performance. The log predictive density for new data (LPD) (Gelman, Hwang, and Vehtari 2014) is one measure of out-of-sample predictive performance given by \\[\\begin{equation} \\text{lpd} = \\sum_{i = 1}^n \\log p(\\tilde y_i \\, | \\, \\mathbf{y}) = \\sum_{i = 1}^n \\log p(\\tilde y_i \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\tag{3.7} \\end{equation}\\] The expected LPD (ELPD) integrates the LPD over the data generating process to give a measure of expected performance \\[\\begin{equation} \\text{elpd} = \\sum_{i = 1}^n \\log \\int p(\\tilde y_i \\, | \\, \\mathbf{y}) p(\\tilde y_i) \\text{d} \\tilde y_i. \\tag{3.8} \\end{equation}\\] In reality, such data are not usually available, and instead the ELPD must be approximated using the available data. 3.4.1 Information criteria Information criteria can be constructed to approximate the ELPD using adjusted within-sample predictive performance. The Akaike [AIC; Akaike (1973)] and deviance [DIC; D. J. Spiegelhalter et al. (2002)] information criteria estimate ELPD by \\[\\begin{equation} \\text{elpd}_\\texttt{IC} = \\log p(\\mathbf{y} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\phi}}}), \\tag{3.9} \\end{equation}\\] where \\(\\hat{\\boldsymbol{\\mathbf{\\phi}}}\\) is a maximum likelihood estimate (AIC) or Bayesian point estimate (DIC). The widely applicable information criteria [WAIC; Watanabe (2013)] improves upon Equation (3.9) by instead using the predictive density of the data \\[\\begin{align} \\text{elpd}_\\texttt{WAIC} = \\sum_{i = 1}^n \\log p(y_i \\, | \\, \\mathbf{y}). \\tag{3.10} \\end{align}\\] As both Equations (3.9) and (3.10) are based on within-sample measures, they overestimate the ELPD. As such, they are adjusted downward by a complexity penalty \\(p_{\\texttt{IC}}\\). The particular penalty varies depending on the particular information criteria. 3.4.2 Cross-validation Cross-validation is an alternative way to estimate the ELPD. Rather than use a complexity penalty, as in Section 3.4.1, to adjust a within-sample estimate, cross-validation (CV) partitions the data into training and held-out sets of data. For example, in a leave-one-out (LOO) CV there are \\(n\\) partitions, where each held-out set is a single observation. The LOO-CV estimate of ELPD is \\[\\begin{align} \\text{elpd}_\\texttt{LOO-CV} = \\sum_{i = 1}^n \\log p(y_i \\, | \\, \\mathbf{y}_{-i}), \\tag{3.11} \\end{align}\\] where the subscript \\(-i\\) refers to all elements of the vector excluding \\(i\\). Naively, computing \\(\\text{elpd}_\\texttt{LOO-CV}\\) requires refitting the model \\(n\\) times. This can be computationally costly, and so approximation strategies have been developed. Importance sampling methods using the the full posterior as a proposal are a notable example, including Pareto-smoothed importance sampling [PSIS; Vehtari, Gelman, and Gabry (2017)]. Equation (3.11) is additive, and treats each observation as an independent unit of information. Special care is therefore required in applying cross-validation techniques to dependent (Section 3.2.1.2) spatio-temporal data. For example, Bürkner, Gabry, and Vehtari (2020) and Cooper et al. (2024) use “leave-future-out” (LFO) cross-validation in the time-series context. Similarly, in Chapter 4 I apply a spatial-leave-one-out (SLOO) cross-validation scheme. 3.4.3 Scoring rules Scoring rules [SR; Gneiting and Raftery (2007)] measure the quality of probabilistic forecasts. The log score, used above in the ELPD, is one example of a scoring rule. However, it is by no means the only possibility. Any information criterion (Section 3.4.1) or cross-validation strategy (Section 3.4.2) can be redefined using a different scoring rule, or utility function more broadly. Possible examples include the root mean square error (RMSE), variance explained (\\(R^2\\)) or classification accuracy. The log score (LS) is popular, in part because it is an example of a strictly proper scoring rule (SPSR). A scoring rule is strictly proper when the forecaster gains maximum expected reward by reporting their true probability distribution. Any scoring rule which does not admit this property is susceptible to manipulation, in some sense. The continuous ranked probability score [CRPS; Matheson and Winkler (1976)], which generalises the Brier score (Brier 1950) beyond binary classification, is another example of a SPSR. Ideally, the correct scoring rule to use in an analysis should be determined based upon the application setting. 3.4.4 Bayes factors Finally, the evidence \\(p(\\mathbf{y})\\), given in Equation (3.3), can also be used as a measure of model performance. If \\(\\mathcal{M}_0\\) and \\(\\mathcal{M}_1\\) are two competing models, then the Bayes factor comparing \\(\\mathcal{M}_0\\) to \\(\\mathcal{M_1}\\) is \\[\\begin{equation} B_{01} = \\frac{p(\\mathbf{y} \\, | \\, \\ \\mathcal{M}_0)}{p(\\mathbf{y} \\, | \\, \\ \\mathcal{M}_1)}, \\end{equation}\\] where \\(p(\\mathbf{y} \\, | \\, \\ \\mathcal{M})\\) denotes the evidence under model \\(\\mathcal{M}\\). The Bayes factor can be interpreted as supporting the maximum a posteriori model. If \\(B_{01} > 1\\) then support is provided for \\(\\mathcal{M}_0\\) and if \\(B_{01} < 1\\) then support is provided for \\(\\mathcal{M}_1\\). Bayes factors can also be framed as predictive criteria according to the decomposition \\[\\begin{equation} p(\\mathbf{y}) = p(y_1) p(y_2 \\, | \\, y_1) \\cdots p(y_n \\, | \\, y_{n - 1}, \\ldots, y_1). \\end{equation}\\] 3.5 Survey methods Large national household surveys (Section 2.2.1) provide the highest quality population-level information about HIV indicators in SSA. Demographic and Health Surveys [DHS; USAID (2012)] are funded by the United States Agency for International Development (USAID) and run every three to five years in most countries. Population-based HIV Impact Assessment (PHIA) surveys are funded by PEPFAR and run every four to five years in high HIV burden countries. Analysis of responses from surveys can require specific methods. This section provides required background, before describing the survey design approach used by household surveys in SSA, and the methods used to analyse this data in this thesis. 3.5.1 Background Consider a population of \\(N\\) individuals, indexed by \\(i\\), with outcomes of interest \\(y_i\\). If a census were run, with all responses recorded, then any population quantities of interest could be directly calculated. However, running a census is usually too expensive or otherwise impractical. As such, in a survey only a subset of individuals are sampled: let \\(S_i\\) be an indicator for whether or not individual \\(i\\) is sampled. Furthermore, only a subset of those sampled have their outcome recorded, due to nonresponse or otherwise: let \\(R_i\\) be an indicator for whether or not \\(y_i\\) is recorded. If \\(S_i = 0\\) then \\(R_i = 0\\), and if \\(S_i = 1\\) then individual \\(i\\) may not respond such that \\(R_i = 0\\). Consider a function \\(G_i = G(y_i)\\). The population mean of \\(G\\) is \\[\\begin{equation} \\bar G = \\frac{1}{N} \\sum_{i = 1}^N G(y_i), \\end{equation}\\] and a direct estimate of \\(\\bar G\\) based on the recorded subset of the population is \\[\\begin{equation} \\bar G_R = \\frac{\\sum_{i = 1}^N R_i G(y_i)}{\\sum_{i = 1}^N R_i}, \\tag{3.12} \\end{equation}\\] where \\(m_R = \\sum_{i = 1}^N R_i\\) is the recorded sample size. In a probability sample, individuals are selected to be included in the survey at random. On the other hand, in a non-probability sample, inclusion or exclusion from the survey is deterministic. A simple random sample (SRS) is a probability sample where the sampling probability for each individual is equal \\(\\mathbb{P}(S_i = 1) = 1 / N\\). A survey design is called complex when the sampling probabilities for each individual vary, such that \\(\\mathbb{P}(S_i = 1) = \\pi_i\\), with \\(\\sum_{i = 1}^N \\pi_i = 1\\) and \\(\\pi_i > 0\\). Complex survey designs can offer both greater practicality and statistical efficiency than a SRS. However, care is required in analysing data collected using complex survey designs. Under a complex design, not accounting for unequal sampling probabilities will result in bias. That said, even under SRS, nonresponse analogous bias can be caused by non-response. 3.5.2 Survey design The DHS employs a two-stage sampling procedure, outlined here following USAID (2012). In the first stage, enumeration areas from a recently conducted census are typically used as the primary sampling unit, or cluster. Each cluster is assigned to a strata \\(h\\) by region, as well as by urban-rural status. After appropriate strata sample sizes \\(n_h\\) are determined, EAs are sampled with probability proportional to number of households \\[\\begin{equation} \\pi_{1hj} = n_h \\times \\frac{N_{hj}}{\\sum_j N_{hj}}, \\end{equation}\\] where \\(N_{hj}\\) is the number of households in strata \\(h\\) and cluster \\(j\\). In the second stage, the secondary sampling units are households. All households in the selected cluster are listed, before being sampled systematically at a regular interval, with equal probability \\[\\begin{equation} \\pi_{2hj} = \\frac{n_{hj}}{N_{hj}}, \\end{equation}\\] where \\(n_{hj}\\) is the number of households selected in cluster \\(j\\) and stratum \\(h\\). All adults are interviewed in each selected household. As a result, the probability an individual is sampled is equal to the probability their household is sampled \\(\\pi_{hj} = \\pi_{1hj} \\times \\pi_{2hj}\\). 3.5.3 Survey analysis Suppose a survey is run with complex design, and sampling probabilities \\(\\pi_i\\). Some individuals are more likely to be included in the survey than others. By over-weighting the responses of those unlikely to be included, and under-weighting the responses of those likely to be included, this feature can be taken into account. Design weights \\(\\delta_i = 1 / \\pi_i\\) can be thought of as the number of individuals in the population represented by the \\(i\\)th sampled individual. Let \\[\\begin{equation} \\mathbb{P}(R_i = 1 \\, | \\, S_i = 1) = \\upsilon_i \\end{equation}\\] be the probability of response for sampled individual \\(i\\). Nonresponse can be handled using nonresponse weights \\(\\gamma_i = 1 / \\upsilon_i\\), which analogously can be thought of as the number of sampled individuals represented by the \\(i\\)th recorded individual. Multiplying the design and nonresponse weights gives survey weights \\(\\omega_i = \\delta_i \\times \\gamma_i\\). Extending Equation (3.12), a weighted estimate (Hájek 1971) of the population mean using the survey weights \\(\\omega_i\\) is \\[\\begin{equation} \\bar G_\\omega = \\frac{\\sum_{i = 1}^N \\omega_i R_i G(y_i)}{\\sum_{i = 1}^N \\omega_i R_i}. \\tag{3.13} \\end{equation}\\] Following Meng (2018) and Bradley et al. (2021), decomposing the additive error \\(\\bar G_\\omega - \\bar G\\) of Equation (3.13) provides useful intuition as to the benefits of survey weighting (M. A. Bailey 2023). Under SRS then, the error is a product of three terms \\[\\begin{align} \\bar G_\\omega - \\bar G &= \\frac{\\mathbb{E}(\\omega_i R_i G_i)}{\\mathbb{E}(\\omega_i R_i)} - \\mathbb{E}(G_i) = \\frac{\\mathbb{C}(\\omega_i R_i G_i)}{\\mathbb{E}(\\omega_i R_i)} \\\\ &= \\rho_{R_\\omega, G} \\times \\sqrt{\\frac{N - m_{R_\\omega}}{m_{R_\\omega}}} \\times \\sigma_G, \\end{align}\\] where \\(R_\\omega = \\omega R\\). The first term is called the data defect correlation (DDC), and measures the correlation between the weighted recording mechanism and given function of the outcome of interest. The DDC is minimised when \\(G \\perp \\!\\!\\! \\perp R_\\omega\\). The second term is the data scarcity, and measures the effective proportion of the population who have been recorded. Finally, the third term is the problem difficulty, and measures the intrinsic difficulty of the estimation problem. This term is independent of the sampling or analysis method used. This thesis uses hierarchical Bayesian models defined using weighted direct survey estimates (Fay and Herriot 1979). Following Chen, Wakefield, and Lumely (2014), the sampling distribution of these direct estimates is arrived at by estimating the variance of Equation (3.13). Although this approach acknowledges the complex survey design, it has some important limitations. Importantly, it ignores clustering structure within the observations \\(i\\). Furthermore, as a two-step procedure, it fails to fully propagate uncertainty from a Bayesian perspective. While progress has been made in dealing with survey data, the Gelman (2007) claim that “survey weighting is a mess” still holds some weight. References Akaike, Hirotugu. 1973. “Information theory as an extension of the maximum likelihood principle–In: Second International Symposium on Information Theory (Eds) BN Petrov, F.” Csaki. BNPBF Csaki Budapest: Academiai Kiado. Bailey, Michael A. 2023. “A New Paradigm for Polling.” Harvard Data Science Review 5 (3). Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Berger, James. 2006. “The Case for objective Bayesian analysis.” Bayesian Analysis 1 (3): 385–402. Bernardo, José M, and Adrian FM Smith. 2001. Bayesian theory. John Wiley & Sons. Bivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied spatial data analysis with R. Springer. Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational inference: A review for statisticians.” Journal of the American Statistical Association 112 (518): 859–77. Bradley, Valerie C, Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-Li Meng, and Seth Flaxman. 2021. “Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake.” Nature 600 (7890): 695–700. Brier, Glenn W. 1950. “Verification of forecasts expressed in terms of probability.” Monthly Weather Review 78 (1): 1–3. Bürkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. “Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (14): 2499–2523. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chopin, Nicolas, Omiros Papaspiliopoulos, et al. 2020. An introduction to sequential Monte Carlo. Vol. 4. Springer. Cooper, Alex, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari. 2024. “Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors.” Bayesian Analysis 1 (1): 1–25. Cressie, Noel, and Christopher K Wikle. 2015. Statistics for spatio-temporal data. John Wiley & Sons. Dempster, Arthur P, Nan M Laird, and Donald B Rubin. 1977. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological) 39 (1): 1–22. Duane, Simon, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195 (2): 216–22. Fay, Robert E, and Roger A Herriot. 1979. “Estimates of income for small places: an application of James-Stein procedures to census data.” Journal of the American Statistical Association 74 (366a): 269–77. Fisher, Ronald Aylmer. 1936. “Design of experiments.” British Medical Journal 1 (3923): 554. Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, et al. 2023. viridis(Lite) - Colorblind-Friendly Color Maps for R. https://doi.org/10.5281/zenodo.4679423. Gelfand, Alan E, Li Zhu, and Bradley P Carlin. 2001. “On the change of support problem for spatio-temporal data.” Biostatistics 2 (1): 31–45. Gelman, Andrew. 2005. “Analysis of variance—why it is more important than ever.” ———. 2007. “Struggles with survey weighting and regression modeling.” Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” Statistics and Computing 24 (6): 997–1016. Gelman, Andrew, and Donald B Rubin. 1992. “Inference from iterative simulation using multiple sequences.” Statistical Science, 457–72. Gelman, Andrew, Daniel Simpson, and Michael Betancourt. 2017. “The prior can often only be understood in the context of the likelihood.” Entropy 19 (10): 555. Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian workflow.” arXiv Preprint arXiv:2011.01808. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Giordano, Ryan, Tamara Broderick, and Michael I. Jordan. 2018. “Covariances, Robustness, and Variational Bayes.” Journal of Machine Learning Research 19 (51): 1–49. http://jmlr.org/papers/v19/17-670.html. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Goldstein, Michael. 2006. “Subjective Bayesian analysis: principles and practice.” Hájek, Jaroslav. 1971. “Discussion of ‘An essay on the logical foundations of survey sampling, part I’.” Foundations of Statistical Inference (Proc. Sympos., Univ. Waterloo, Ontario, 1970), 236. Hastie, Trevor, and Robert Tibshirani. 1987. “Generalized additive models: some applications.” Journal of the American Statistical Association 82 (398): 371–86. Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57 (1): 97–109. http://www.jstor.org/stable/2334940. Hoffman, Matthew D, Andrew Gelman, et al. 2014. “The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.” J. Mach. Learn. Res. 15 (1): 1593–623. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Margossian, Charles C, and Andrew Gelman. 2023. “For How Many Iterations Should We Run Markov Chain Monte Carlo?” arXiv Preprint arXiv:2311.02726. Martin, Gael M, David T Frazier, and Christian P Robert. 2023. “Computing Bayes: From then ‘til now.” Statistical Science 1 (1): 1–17. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. McElreath, Richard. 2020. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Metropolis, Nicholas, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” J. Chem. Phys 21: 1087. Minka, Thomas P. 2001. “Expectation Propagation for approximate Bayesian inference.” In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 362–69. Neal, Radford M et al. 2011. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (11): 2. Pfeffermann, Danny et al. 2013. “New Important Developments in Small Area Estimation.” Statistical Science 28 (1): 40–68. Porcu, Emilio, Reinhard Furrer, and Douglas Nychka. 2021. “30 Years of space–time covariance functions.” Wiley Interdisciplinary Reviews: Computational Statistics 13 (2): e1512. Robert, Christian P, and George Casella. 2005. “Monte Carlo Statistical Methods (Springer Texts in Statistics).” Springer. Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. “General state space Markov chains and MCMC algorithms.” Probability Surveys 1 (none): 20–71. https://doi.org/10.1214/154957804100000024. Roy, Vivekananda. 2020. “Convergence diagnostics for Markov chain Monte Carlo.” Annual Review of Statistics and Its Application 7: 387–412. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Shumway, Robert H, and David S Stoffer. 2017. Time Series Analysis and Its Applications With R Examples. Springer. Sisson, Scott A, Yanan Fan, and Mark Beaumont. 2018. Handbook of approximate Bayesian computation. CRC Press. Smith, Nathaniel, and Stéfan van der Walt. 2015. “A Better Default Colormap for Matplotlib.” In Proceedings of the 14th Python in Science Conference (SciPy). Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Stan Development Team. 2023. Stan Reference Manual. https://mc-stan.org/docs/reference-manual/index.html. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tobler, Waldo R. 1970. “A computer movie simulating urban growth in the Detroit region.” Economic Geography 46 (sup1): 234–40. Tokdar, Surya T, and Robert E Kass. 2010. “Importance sampling: a review.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (1): 54–60. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys 6 (none): 142–228. https://doi.org/10.1214/12-SS102. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. "],["beyond-borders.html", "4 Models for areal spatial structure 4.1 Models based on adjacency 4.2 Models using kernels 4.3 Simulation study 4.4 HIV prevalence study 4.5 Discussion", " 4 Models for areal spatial structure This chapter is about spatial random effect model specifications for areal data. A simple model based on the adjacency structure between areas is popular in HIV small-area estimation and beyond. The analysis aimed to determine if using a more complex model would result in more accurate predictions. Modelling spatial correlation is particularly important for small-area estimation of HIV because the covariates most strongly associated with HIV, such as sexual behaviour and STI status (Mayala, Bhatt, and Gething 2020), are difficult to measure. As a result, previous small-area models of HIV have found including covariates only modestly improve predictive performance (Supplementary Figure 20, Dwyer-Lindgren et al. 2019). The lack of predictive covariates emphasises the role of modelling spatial variation. For mapping of other infectious diseases, such as Malaria where transmission is driven by more predictive and easily measurable environmental factors, explanatory covariates are more easily available and directly modelling spatial correlation is less important (Weiss et al. 2015; Bhatt et al. 2015). Spatial variation in areal data is often modelled using spatial random effects (Haining 2003; Cramb et al. 2018). The most common class of models used to specify spatial random effects are Gaussian Markov random fields [GMRFs; Havard Rue and Held (2005)]. These models combine a Gaussian distribution with Markov conditional independence assumptions between areas. Observations in areas close together are assumed to be related, with more distant relationships not directly accounted for. Perhaps the simplest GMRF model is that of Besag, York, and Mollié (1991) in which information is borrowed equally from each adjacent area, based on a binary relationship. The Besag model is attractive as it requires minimal additional modelling choices and is accessibly implemented in software such as R-INLA (Blangiardo et al. 2013), rstan (Morris et al. 2019; Donegan 2022), NIMBLE [Chapter 9; de Valpine et al. (2023)] and PyMC (Saunders 2023), among others. As a result, it has been widely used, including to model bird population dynamics from capture-recapture data (Saracco et al. 2010); for the analysis of magnetic resonance images (Gössl, Auer, and Fahrmeir 2001; Schmid et al. 2006); to map mortality from cancers (Rashid et al. 2023), injuries (Parks et al. 2020), and air pollution (Bennett et al. 2019); and to model alcohol use patterns (Dwyer-Lindgren et al. 2015). The Besag model was designed for image analysis, on a regular grid. However, for more irregular geometries, the assumptions made are unrealistic and appear to be violated. The administrative divisions of a country used in small-area estimation are one example of a more irregular geometry. This chapter tests the hypothesis that using more realistic assumptions about spatial structure improves the performance of small-area estimation models. Performance in this context refers to accurate forecasts of parameters as measured by scoring rules. In doing so, practical recommendations for modelling areal spatial structure are offered. Code for the analysis in this chapter is available from https://github.com/athowes/beyond-borders, and supported by the arealutils R package (Howes 2023a). 4.1 Models based on adjacency This section discusses spatial random effect models based on a symmetric adjacency relation \\(i \\sim j\\) between areas \\(A_i\\) and \\(A_j\\). Adjacency is typically defined by a shared border, though other choices are possible (Christopher J. Paciorek et al. 2013). 4.1.1 The Besag model Figure 4.1: Panel A shows the districts of Zimbabwe. Panel B shows the corresponding adjacency graph \\(\\mathcal{G}\\) with vertices positioned at the centre of the area they correspond to, and edges between adjacent areas. The Besag model (Besag, York, and Mollié 1991) is an improper conditional auto-regressive (ICAR) model where the conditional mean of the random effect \\(u_i\\) is the average of its neighbours \\(\\{u_j\\}_{j \\sim i}\\) and the precision is proportional to the number of neighbours. The full conditional distribution of the \\(i\\)th spatial random effect is given by \\[\\begin{equation} u_i \\, | \\, \\mathbf{u}_{-i} \\sim \\mathcal{N} \\left(\\frac{1}{n_{\\delta i}} \\sum_{j: j \\sim i} u_j, \\frac{1}{n_{\\delta i}\\tau_u}\\right), \\tag{4.1} \\end{equation}\\] where \\(\\delta i\\) is the set of neighbours of \\(A_i\\) with cardinality \\(n_{\\delta i} = |\\delta i|\\) and \\(\\mathbf{u}_{-i}\\) is the vector of spatial random effects with the \\(i\\)th entry removed. By Brook’s lemma (Havard Rue and Held 2005) the set of full conditionals of the Besag model is equivalent to the Gaussian Markov random field (GMRF) given by \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N}(\\mathbf{0}, \\tau_u^{-1} \\mathbf{R}^{-}). \\tag{4.2} \\end{equation}\\] The matrix \\(\\mathbf{R}^{-}\\) is the generalised inverse of the rank-deficient structure matrix \\(\\mathbf{R}\\) with entries \\[\\begin{equation} R_{ij} = \\begin{cases} n_{\\delta i}, & i = j \\\\ -1, & i \\sim j \\\\ 0, & \\text{otherwise.} \\end{cases} \\end{equation}\\] The Markov property arises due to the conditional independence structure \\(p(u_i \\, | \\, \\mathbf{u}_{-i}) = p(u_i \\, | \\, \\mathbf{u}_{\\delta i})\\) whereby each area only depends on its neighbours. This is reflected in the sparsity of \\(\\mathbf{R}\\) such that \\(u_i \\perp u_j \\, | \\, \\mathbf{u}_{-ij}\\) if and only if \\(R_{ij} = 0\\). The structure matrix \\(\\mathbf{R}\\) may also be expressed as the Laplacian matrix of the adjacency graph \\(\\mathcal{G} = (\\mathcal{V}, \\mathcal{E})\\) with vertices \\(v \\in \\mathcal{V}\\) corresponding to each area and edges \\(e \\in \\mathcal{E}\\) between vertices \\(i\\) and \\(j\\) when \\(i \\sim j\\). Figure 4.1 shows the districts of Zimbabwe with corresponding adjacency graph. Rewriting Equation (4.2), the probability density function of \\(\\mathbf{u}\\) is \\[\\begin{equation} p(\\mathbf{u}) \\propto \\tau_u^{\\frac{n - n_c}{2}} \\times \\exp \\left( -\\frac{\\tau_u}{2} \\mathbf{u}^\\top \\mathbf{R} \\mathbf{u} \\right) \\propto \\exp \\left( -\\frac{\\tau_u}{2} \\sum_{i \\sim j} (u_i - u_j)^2 \\right). \\tag{4.3} \\end{equation}\\] This density is a function of the pairwise differences \\(u_i - u_j\\) and so is invariant to the addition of a constant \\(c\\) to each entry \\(p(\\mathbf{u}) = p(\\mathbf{u} + c\\mathbf{1})\\). As a result, there is an improper uniform distribution on the average of the \\(u_i\\). If \\(\\mathcal{G}\\) is connected, in that by traversing the edges, any vertex can be reached from any other vertex, then there is only one impropriety in the model and \\(\\text{rank}(\\mathbf{R}) = n - 1\\), while if \\(\\mathcal{G}\\) is disconnected, and composed of \\(n_c \\geq 2\\) connected components with index sets \\(I_1, \\ldots, I_{n_c}\\), then the corresponding structure matrix \\(\\mathbf{R}\\) has rank \\(n - n_c\\) and the density is invariant to the addition of a constant to each of the connected components \\(p(\\mathbf{u}_{I}) = p(\\mathbf{u}_{I} + c\\mathbf{1})\\) where \\(I = I_1, \\ldots, I_{n_c}\\). 4.1.2 Best practises for the Besag model Directly using the Besag model as described in Section 4.1.1 has several practical limitations in applied settings. To overcome these limitations, Freni-Sterrantino, Ventrucci, and Rue (2018) recommend three best practices: The structure matrix \\(\\mathbf{R}\\) should be rescaled to have generalised variance equal to one. The generalised variance of a matrix is defined by the geometric mean of the diagonal elements of its generalised inverse. For the structure matrix that is \\[\\begin{equation} \\sigma^2_{\\text{GV}}(\\mathbf{R}) = \\prod_{i = 1}^n (\\mathbf{R}^-_{ii})^{1/n} = \\exp \\left( \\frac{1}{n} \\sum_{i = 1}^n \\log (R^-_{ii}) \\right). \\end{equation}\\] The scaled structure matrix \\(\\mathbf{R}^\\star\\) is given by \\[\\begin{equation} \\mathbf{R}^\\star = \\mathbf{R} / \\sigma^2_{\\text{GV}}(\\mathbf{R}). \\end{equation}\\] As the diagonal elements \\(R^-_{ii}\\) correspond to marginal variances, the generalised variance gives a measure of the average marginal variance. This measure, introduced by Sørbye and Rue (2014), ignores off-diagonal entries. More broadly, other measures of typical variance could be used. Scaling mitigates the influence of the adjacency graph on the variance of \\(\\mathbf{u}\\). For consistent and interpretable prior distribution selection, it is important to allow the variance to be controlled by \\(\\tau_u\\) alone. When the adjacency graph is disconnected it is not appropriate to scale the structure matrix \\(\\mathbf{R}\\) uniformly. This is because, given the precision \\(\\tau_u\\), local smoothing operates on each connected component independently. As such, each connected component \\(I = I_1, \\ldots, I_{n_c}\\) should be scaled independently to have generalised variance one \\[\\begin{equation} \\mathbf{R}^\\star_I = \\mathbf{R}_I / \\sigma^2_{\\text{GV}}(\\mathbf{R}_I) \\end{equation}\\] where \\(\\mathbf{R}_I\\) is the sub-matrix of the structure matrix corresponding to index set \\(I\\). When one of the connected components is a single area, known either as a singleton or an island, the probability density \\[\\begin{equation} p(u_i) \\propto \\exp \\left( -\\frac{\\tau_u}{2} \\sum_{i \\sim j} (u_i - u_j)^2 \\right) \\propto 1 \\end{equation}\\] has no dependence on \\(u_i\\). This is equivalent to using an improper prior. To avoid this, each singleton should be set to have independent Gaussian noise \\(p(u_i) \\sim \\mathcal{N}(0, \\tau_u^{-1})\\). To avoid confounding of the spatial random effects with the intercept, it is recommended to place a sum-to-zero constraint on each non-singleton connected component. In other words, \\[\\begin{equation} \\sum_{i \\in I} u_i = 0, \\quad |I| > 1. \\end{equation}\\] As such, in total the number of sum-to-zero constraints equals to the number of non-singleton connected components. 4.1.3 Concerns about the Besag model The Besag model was originally proposed by Besag, York, and Mollié (1991) for use in image analysis. In this setting, areas correspond to pixels arranged in a regular lattice structure. In an image, the data point at each pixel can be thought of as an average of the intensity or colour over the space represented by the pixel. Since its original proposal, the Besag model has seen wider use. However, for small-area estimation of HIV, the spatial structure corresponds to administrative units. These administrative units may have a more irregular spatial structure than a lattice. Furthermore, data points may not come about by uniform averaging over a space. For example, population density may vary across the area. These considerations raise concerns about the Besag model’s applicability to the small-area estimation setting, which we explore in this section. The discussion is closely linked to the modifiable areal unit problem (Openshaw and Taylor 1979), whereby statistical conclusions change as a result of seemingly arbitrary changes in data aggregation, and the challenge of ecological inference and the ecological fallacy (Jonathan Wakefield and Lyons 2010). 4.1.3.1 Compression to adjacency Figure 4.2: Though they are quite different, the geometries shown in panels A, B, C, and D each have the same adjacency graph. Therefore, each geometry would have the same distribution under the Besag model. A fundamental objection is that summarising a geometry by an adjacency graph represents a loss of information. Many geometries share the same adjacency graph, and as such are isomorphic under the Besag model (Figure 4.2). Though not in itself a problem, this fact prompts consideration whether the class of geometries with the same adjacency graph is sufficiently similar to merit identical models. Intuitively, the more regular the spatial structure, the less information is lost in compression to an adjacency graph. In image analysis, very little spatial information is lost in compression of a lattice structure to an adjacency graph. On the other hand, the regions of a country, determined by political and geographic forces, tend to display greater irregularity. The appropriateness of adjacency compression varies therefore by the type of geometry common to the application setting. The regularity of realistic geometries may help to constrain each class to be similar. In other words, although pathological geometries can be constructed, they might be implausible in statistical practice and so of limited concern. 4.1.3.2 Mean structure In the Besag model all adjacent areas count equally in the equation for the conditional mean. This assumption is unsatisfying, as for most geometries we expect different amounts of correlation between neighbouring areas. Figure 4.2 illustrates a number of heuristic features for neighbour importance. In Panel 4.2C, the area with a longer shared border would be expected to be more highly correlated. In Panel 4.2D, the area with a closer centre would be expected to be more highly correlated. 4.1.3.3 Variance structure Figure 4.3: A sequence of geometries where the number of neighbours of area one grows by one at each iteration, as the shaded area is split into more areas. In the limit, the precision of the spatial random effect in the first area tends to infinity. This is not reasonable behaviour if the amount of information being shared is not also increasing. In Equation (4.1) the precision of \\(u_i\\) is proportional to its number of neighbours \\(n_{\\delta i}\\). It follows that as \\(n_{\\delta i} \\to \\infty\\) then \\(\\text{Var}(u_i) \\to 0\\). This is illustrated by Figure 4.3 where the area on the right is repeatedly divided such that its number of neighbours increases. This property is a consequence of averaging the conditional mean over a greater number of areas, which, in certain situations, can correspond to a greater amount of information. However, if the amount of information in the shaded area remains fixed, it is inappropriate that \\(\\text{Var}(u_1)\\) should tend to zero as a result of drawing additional, arbitrary, boundaries. In the image analysis setting this modelling assumption is reasonable: each pixel represents a fixed amount of information and a higher pixel density represents a greater amount of information. On the other hand, in public health and epidemiology, drawing boundaries to create additional areas is not expected to correspond to a greater amount of information. Figure 4.4: Each of the shaded areas in the geometry in Panel A are split into two in Panel B. As a second example of undesirable behaviour, suppose we fit a Besag model upon identical data using each of the two geometries in Figure 4.4. If the spatial variation is relatively smooth, dividing the shaded areas into two will result in a lower estimated variance \\(\\sigma^2_u\\) in Panel 4.4B as compared with Panel 4.4A because there will appear to be less variation between neighbouring areas. This problem does not only apply locally: since the effect of \\(\\sigma^2_u\\) applies everywhere, the smoothing will change even in unaltered parts of the study region. 4.1.4 Weighted ICAR models The Besag model is a special case of a more general class of (zero-mean) weighted ICAR models. These models can be specified in terms of scaled weights \\(\\{b_{ij}\\}_{j \\sim i}\\) and a precision vector \\(\\boldsymbol{\\mathbf{\\kappa}} = (\\kappa_i)_{i \\in [n]}\\). The full conditionals are then \\[\\begin{equation} u_i \\, | \\, \\mathbf{u}_{-i} \\sim \\mathcal{N} \\left( \\sum_{j: j \\sim i} b_{ij} u_j, \\frac{1}{\\kappa_i \\tau_u} \\right). \\tag{4.4} \\end{equation}\\] Setting \\(b_{ij} = 1 / n_{\\delta i}\\) and \\(\\kappa_i = n_{\\delta i}\\) recovers the Besag model in Equation (4.1). The structure matrix \\(\\mathbf{R}\\) corresponding to the more general full conditionals in Equation (4.4) is \\[\\begin{equation} \\mathbf{R} = \\mathbf{D}_\\kappa(\\mathbf{I} - \\mathbf{B}), \\end{equation}\\] where the unscaled weights matrix \\(\\mathbf{B}\\) has elements \\[\\begin{equation} \\mathbf{B}_{ij} = \\begin{cases} b_{ij}, & \\text{for } i \\sim j, \\\\ 0, & \\text{for } i = j, i \\nsim j. \\end{cases}, \\end{equation}\\] and the matrix \\(\\mathbf{D}_\\kappa\\) is given by \\(\\text{diag}(\\kappa_1, \\ldots, \\kappa_n)\\). Ensuring that the structure matrix is symmetric requires that for all \\(i, j \\in [n]\\) \\[\\begin{equation} - b_{ij} \\kappa_i = - b_{ji} \\kappa_j. \\end{equation}\\] To meet this condition, it can be simpler to directly consider symmetry of the unscaled weights matrix \\[\\begin{equation} \\mathbf{W} = \\mathbf{D}_\\kappa \\mathbf{B}, \\end{equation}\\] such that \\(\\mathbf{R} = \\mathbf{D}_\\kappa - \\mathbf{W}\\). For the Besag model the unscaled weights matrix \\(\\mathbf{W}\\) corresponds to the adjacency matrix. Scaled weights can be recovered by \\(b_{ij} = w_{ij} / \\kappa_i\\) where \\(\\kappa_i = \\sum_{k: k \\sim i} w_{ik}\\). Duncan, White, and Mengersen (2017) provide discussion of methods for specifying \\(\\mathbf{W}\\), including \\[\\begin{align} w_{ij} &= \\left( \\frac{1}{d_{ij}} \\right), \\\\ w_{ij} &= \\exp (-d_{ij}). \\end{align}\\] Weighted ICAR models appear to overcome some of the limitations discussed in Section 4.1.3. 4.1.5 The reparameterised Besag-York-Mollié model Often, as well as spatial correlation, there exists IID over-dispersion in the residuals and it is inappropriate to use purely spatially structured random effects in the model. The Besag-York-Mollié (BYM) model of Besag, York, and Mollié (1991) accounts for this in a natural way by decomposing the spatial random effect \\(\\mathbf{u} = \\mathbf{v} + \\mathbf{w}\\) into a sum of an unstructured IID component \\(\\mathbf{v}\\) and a spatially structured Besag component \\(\\mathbf{w}\\). Each component has its own respective precision parameter \\(\\tau_v\\) and \\(\\tau_w\\). The resulting distribution is \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N}(0, \\tau_v^{-1} \\mathbf{I} + \\tau_w^{-1} \\mathbf{R}^{-}) \\tag{4.5}. \\end{equation}\\] Including both \\(\\mathbf{v}\\) and \\(\\mathbf{w}\\) is intended to enable the model to learn the relative extent of the unstructured and structured components via \\(\\tau_v\\) and \\(\\tau_w\\). However, in the BYM model, scaling of the Besag precision matrix \\(\\mathbf{Q}\\) is not taken into account despite this issue being particularly pertinent when dealing with multiple sources of noise. In particular, placing a joint prior distribution \\[\\begin{equation} (\\tau_v, \\tau_w) \\sim p(\\tau_v, \\tau_w) \\end{equation}\\] which does not privilege either component is more easily accomplished if \\(\\mathbf{Q}\\) and \\(\\mathbf{I}\\) have the same scale. Additionally, supposing one has a prior belief that the over-dispersion is primarily IID and \\(\\mathbf{v}\\) accounts for the majority of the dispersion, then it is not immediately obvious how to represent this belief, without inadvertently altering the prior distribution on the amount of overall variation. This highlights identifiability issues of the parameters \\((\\tau_v, \\tau_w)\\) resulting from their non-orthogonality. Building on the models of Leroux, Lei, and Breslow (2000) and Dean, Ugarte, and Militino (2001) which tackle this identifiability problem, but do not scale the spatially structured noise, Simpson et al. (2017) propose a reparameterisation \\((\\tau_v, \\tau_w) \\mapsto (\\tau_u, \\phi)\\) of the BYM model. This is known as the BYM2 model and given by \\[\\begin{align} \\mathbf{u} = \\frac{1}{\\tau_u} \\left( \\sqrt{1- \\phi} \\, \\mathbf{v} + \\sqrt{\\phi} \\, \\mathbf{w}^\\star \\right), \\tag{4.6} \\end{align}\\] where \\(\\tau_u\\) is the marginal precision of \\(\\mathbf{u}\\), \\(\\phi \\in [0, 1]\\) gives the proportion of the marginal variance explained by each component, and \\(\\mathbf{w}^\\star\\) is a scaled version of \\(\\mathbf{w}\\) with precision matrix given by the scaled structure matrix \\(\\mathbf{R}^\\star\\). When \\(\\phi = 0\\) the random effects are IID, and when \\(\\phi = 1\\) the random effects follow the Besag model. To borrow an analogy (Håvard Rue 2020) the parameterisation \\((\\tau_v, \\tau_w)\\) is like having one hot water and one cold water tap, whereas the parameterisation \\((\\tau_u, \\phi)\\) is like a mixer tap where the amount of water and its temperature can be adjusted separately. Although the BYM and BYM2 models were originally proposed using the Besag model as the spatially structured component, this need not be the case. Indeed, more broadly it is reasonable to consider convolved random effects (of a form analogous to that in Equation (4.5) or (4.6)) with any model for spatially structured noise. Any limitations of the model for spatially structured random effects are inherited by the convolved random effects. 4.2 Models using kernels Section 4.1 reviewed ways to construct spatial random effect precision matrices using an adjacency relation. An alternate approach is to define the covariance matrix using an areal kernel function which gives a measure of similarity between two areas. Such a function may be specified as \\[\\begin{equation} K: \\mathcal{P}(\\mathcal{S}) \\times \\mathcal{P}(\\mathcal{S}) \\to \\mathbb{R}, \\tag{4.7} \\end{equation}\\] where \\(\\mathcal{P}\\) denotes the power set such that \\(\\mathcal{P}(\\mathcal{S})\\) is the space of subsets of the study region. If the function \\(K\\) is positive semi-definite, then define areal kernel spatial random effects by \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N} \\left( 0, \\frac{1}{\\tau_u} \\mathbf{K} \\right), \\tag{4.8} \\end{equation}\\] where the \\(n \\times n\\) Gram matrix \\(\\mathbf{K}\\) with entries \\(K_{ij} = K(A_i, A_j)\\) is a valid covariance matrix. The precision parameter \\(\\tau_u\\) is placed outside of the Gram matrix, analogous to the relation of the precision and structure matrices, but could be omitted. Areal kernels may be thought of as a type of kernels on sets (Gärtner et al. 2002). It is challenging to think directly about the correlation structure between areas. Instead, most well-known spatial process models define the correlation structure between points using a kernel function \\[\\begin{equation} k: \\mathcal{S} \\times \\mathcal{S} \\to \\mathbb{R}. \\tag{4.9} \\end{equation}\\] A simple method, and the one considered here henceforth, is to construct \\(K\\) (Equation (4.7)) from \\(k\\) (Equation (4.9)) by averaging the kernel \\(k\\) computed on some number of points representing each area. In Section 4.2.1 one point is used, and in Section 4.2.2 multiple points are used. 4.2.1 Centroid kernel The simplest approach is to use a single point to represent each area such that \\[\\begin{equation} K(A_i, A_j) = k(p_i, p_j). \\end{equation}\\] A natural choice is the centroid \\(p_i = c_i\\), given by the arithmetic mean of the latitude and longitude. (Note that it is not guaranteed for the centroid to lie within the area i.e. it is possible \\(c_i \\notin A_i\\), and more generally points representing an area may not be contained by that area.) This choice results in the centroid kernel \\[\\begin{equation} K(A_i, A_j) = k(c_i, c_j). \\tag{4.10} \\end{equation}\\] The centroid kernel has been used in environmental epidemiology (J. Wakefield and Morris 1999), for US election modelling (Flaxman, Wang, and Smola 2015), and to model the reproduction number of COVID-19 (Teh et al. 2022). In a model comparison study Nicky Best, Richardson, and Thomson (2005) (Section 3) simulated data representing heterogeneous exposure to air pollution, including elevated rates of exposure near two hypothetical point source locations, and found that the centroid kernel tended to over-smooth the high-risk areas. That said, it is unsurprising that a stationary covariance function would struggle to recover non-stationary structure. 4.2.2 Integrated kernel Rather than choosing a single representative point, an alternative is to more completely represent the area by integrating the kernel over the areas of interest (Kelsall and Wakefield 2002; Follestad and Rue 2003). This results in the integrated kernel \\[\\begin{equation} K(A_i, A_j) = \\frac{1}{|A_i||A_j|} \\int_{A_i} \\int_{A_j} k(s, s') \\text{d} s \\text{d} s'. \\tag{4.11} \\end{equation}\\] Unlike for the centroid kernel where \\(K_{ii} = 1\\) for all \\(i\\), the marginal variance of the \\(i\\)th spatial random effect \\(K_{ii} = K(A_i, A_i)\\) varies depending on the area: becoming smaller for more compact areas and larger for areas which are of greater extent or more spread out. This covariance structure is equivalent to that obtained by aggregating a spatially continuous Gaussian process with kernel \\(k\\) over the areal partition. In the machine learning literature, models of this kind have been studied under the name aggregated Gaussian processes (Law et al. 2018; Tanaka et al. 2019; Yousefi, Smith, and Alvarez 2019; Hamelijnck et al. 2019; Chau, Bouabid, and Sejdinovic 2021). Examples of use of this model in statistical practice are rare. 4.2.2.1 Accounting for heterogeneity Additional information accounting for heterogeneity over \\(A_i\\) may be incorporated into the integrated kernel. This can be accomplished using weighting distributions \\(\\{w_i\\}\\) which represent an unequal contribution of each point to the similarity measure. The weighted integrated kernel is given by \\[\\begin{equation} K(A_i, A_j) = \\frac{1}{|A_i||A_j|}\\int_{A_i} \\int_{A_j} w_i(s) w_j(s') k(s, s') \\text{d} s \\text{d} s'. \\tag{4.12} \\end{equation}\\] This areal kernel may be useful in disease mapping. For example, areas with populations who live close to a shared border are likely to be more strongly correlated than areas whose populations live far apart. This detail could be accounted for by weighting according to a high resolution measure of population density. Though e.g. weighted centroids may also be used in Equation (4.10), accounting for heterogeneity over an area is more natural within the integrated kernel than the centroid kernel. 4.2.2.2 Computation Figure 4.5: The \\(n = 33\\) districts of Malawi. Panel A shows the centroids as in Section 4.2.1. Panel B shows \\(L_i = 10\\) randomly chosen points, Panel C hexagonal points, and Panel D grid points in each area, each generated using the sf::st_sample function (E. Pebesma 2018). Most of the time it is not possible to calculate Equation (4.12) analytically. Instead, consider \\(n\\) collections of \\(L_i\\) samples \\(\\{s^{(i)}_l \\}_{l = 1}^{L_i} \\sim \\mathcal{U}(A_i)\\) drawn uniformly from each area. Then the integral may be approximated using Monte Carlo by the double sum \\[\\begin{equation} K(A_i, A_j) \\approx \\frac{1}{L_i L_j} \\sum_{l = 1}^{L_i} \\sum_{m = 1}^{L_j} w_i \\left( s^{(i)}_l \\right) w_j \\left( s^{(j)}_m \\right) k \\left( s^{(i)}_l, s^{(j)}_m \\right). \\tag{4.13} \\end{equation}\\] Equivalently, samples drawn from \\(W_i\\) may be used without weighting by \\(w_i(s)\\). Nodes may also be selected deterministically to give a numerical quadrature estimate of the kernel. Figure 4.5 shows three possible ways of choosing points \\(s^{(i)}_l\\), together with the centroids approach. Computing the \\(n \\times n\\) Gram matrix \\(K\\) requires \\[\\begin{equation} \\mathcal{O}(\\sum_{i = 1}^n \\sum_{j = 1}^n L_i L_j) \\end{equation}\\] evaluations of the kernel \\(k\\). This imposes a significant computational cost if the Gram matrix is often recomputed during inference. For example, during MCMC when the kernel has hyperparameters which are learnt then the Gram matrix is recomputed for each proposed set of hyperparameters. As such, there is a limit on the size of \\(L_i\\) which it is feasible to use. Kelsall and Wakefield (2002) encounter this challenge, and take the approach of using a discrete hyperparameter prior to reduce the number of Gram matrix constructions and inversions required. 4.2.2.3 Connection to log-Gaussian Cox processes The log-Gaussian Cox Process framework (Diggle et al. 2013) arrives naturally at the integrated kernel formulation (Li et al. 2012). A Cox process is an inhomogeneous Poisson process with a continuous stochastic intensity function \\(\\{ x(s), s \\in \\mathcal{S} \\}\\) such that conditional on the realisation of \\(x(s)\\) the number of points in any area \\(A_i\\) follows a Poisson distribution. The rate parameter of this Poisson distribution is explicitly aggregated as follows \\[\\begin{equation} y_i \\, | \\, x(s) \\sim \\text{Poisson} \\left(\\int_{s \\in A_i} x(s) \\text{d}s \\right). \\end{equation}\\] In a LGCP the log intensity \\(\\log x(s) = \\eta(s)\\) is modelled using a Gaussian process prior \\(\\eta(s) \\sim \\mathcal{GP}(\\mu(s), k(s, s'))\\). O. Johnson, Diggle, and Giorgi (2019) obtain Equation (4.12) by considering a discrete Poisson log-linear mixed model approximation to a continuous LGCP, whereby \\(\\eta(s)\\) is approximated by a piecewise constant \\(\\eta_i = \\mu_i + u_i\\) in each area \\(A_i\\). The \\(i\\)th discrete spatial random effect is then \\(u_i = \\int_{A_i} w_i(s) u(s) \\text{d}s\\), with covariance structure \\[\\begin{equation} \\text{Cov} \\left( \\int_{A_i} w_i(s) u(s) \\text{d}s, \\int_{A_j} w_j(s') u(s') \\text{d}s' \\right) = \\int_{A_i} \\int_{A_j} w_i(s) w_j(s') k(s, s') \\text{d}s\\text{d}s', \\end{equation}\\] corresponding to an areal integrated kernel with a logarithmic link function and Poisson likelihood. 4.2.2.4 Connection to disaggregation regression Disaggregation regression, also known as downscaling or interpolation, is another closely related approach. Rather than focusing on the aggregate nature of areal observations as a route towards better area-level estimates, disaggregation regression aims to produce high-resolution or point-level estimates from areal observations (Utazi et al. 2019; Arambepola et al. 2022; Nandi et al. 2023). These two tasks are similar, and indeed it could be argued that accurate point-level estimates are a necessary intermediate step towards accurate area-level estimates. However, disaggregation regression is challenging without auxiliary covariate information, and therefore unlikely to be applicable to small-area estimation of HIV. 4.3 Simulation study This simulation study tests the ability of inferential models with varying spatial random effect specifications to accurately recover small-area quantities. The data and modelling choices were designed with a spatial epidemiology application in mind. 4.3.1 Synthetic data Table 4.1: The three spatial random effect models used to generate synthetic data in the simulation study (Section 4.3). Model Details IID \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\mathbf{I}_n)\\) Besag \\(\\mathbf{u} \\sim \\mathcal{N}(0, {\\mathbf{R}^\\star}^{-})\\) as in Section 4.1.1 Integrated kernel (IK) \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\mathbf{K}^\\star)\\) as in Section 4.2.2 with Matérn kernel, \\(\\nu = 3/2, l = 2.5\\) and \\(L_i = 100\\) points per area Data \\(\\mathbf{y} = (y_i)_{i \\in [n]}\\) were simulated from a binomial likelihood \\(y_i \\sim \\text{Bin}(m_i, \\rho_i)\\). The probabilities \\(\\rho_i \\in [0, 1]\\) were linked to linear predictors \\(\\eta_i \\in \\mathbb{R}\\) via \\[\\begin{equation} \\log \\left( \\frac{\\rho_i}{1 - \\rho_i} \\right) = \\eta_i = \\beta_0 + u_i, \\quad i \\in [n]. \\end{equation}\\] Spatial random effects were generated according to three different models (Table 4.1). Sample sizes were fixed as \\(m_i = 25\\) for all \\(i \\in [n]\\), the intercept parameter as \\(\\beta_0 = -2\\) and the spatial random effect precision parameter as \\(\\tau_u = 1\\). Figure 4.6: Seven geometries were considered in the simulation study. These were the four geometries from Figure 4.2 shown in Panel A, B, C and D, and three more realistic geometries shown in Panel E, F and G. Seven geometries were considered (Figure 4.6). These included the four vignette geometries from Figure 4.2 which share an adjacency graph. Three more realistic geometries were included to represent plausible variation over spatial regularity for the small-area estimation setting. From the most to the least spatially regular, these geometries were: a \\(6 \\times 6\\) lattice grid; the 33 districts of Côte d’Ivoire; and the 36 congressional districts of Texas. For each of the three spatial random effect models and seven geometries 250 synthetic data were generated, resulting in a total of 5250 synthetic data. 4.3.2 Inferential models Table 4.2: The spatial random effect models used for inference. Each model is implemented in the arealutils R package (Howes 2023a). The BYM2 model was implemented using the sparsity preserving parameterisation described in Section 3.2 of Riebler et al. (2016). Model Details IID \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{I}_n)\\) Besag \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} {\\mathbf{R}^\\star}^{-})\\) as in Section 4.1.1 BYM2 \\(\\mathbf{u} = \\tau_u^{-1} ( \\sqrt{1 - \\pi} \\, \\mathbf{v} + \\sqrt{\\pi} \\, \\mathbf{w}^\\star)\\) as in Section 4.1.5 with \\(\\pi \\sim \\text{Beta}(0.5, 0.5)\\) FCK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = k(c_i, c_j)\\) as in Section 4.2.1 with fixed length-scale \\(l\\) CK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = k(c_i, c_j)\\) as in Section 4.2.1 with length-scale prior distribution \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\) with \\(a, b\\) set based on the geometry FIK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = K(A_i, A_j)\\) as in Section 4.2.2 with hexagonal points (Panel 4.5C), \\(L_i = 10\\), and fixed length-scale \\(l\\) IK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = K(A_i, A_j)\\) as in Section 4.2.2 with hexagonal points (Panel 4.5C), \\(L_i = 10\\), and length-scale prior distribution \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\) with \\(a, b\\) set based on the geometry Eight inferential models were fit to the synthetic data (Table 4.2). Apart from the spatial random effect specification, each inferential model corresponded exactly to the simulation model. 4.3.2.1 Kernels Gram matrices were computed using the Matérn kernel \\(k: \\mathcal{S} \\times \\mathcal{S} \\to \\mathbb{R}\\) (Stein 1999) given by \\[\\begin{equation} k(s, s') = \\frac{1}{2^{\\nu - 1}\\Gamma(\\nu)} \\left(\\frac{\\sqrt{2\\nu}\\lvert s - s' \\rvert}{l}\\right)^\\nu B_\\nu\\left(\\frac{\\sqrt{2\\nu}\\lvert s - s' \\rvert}{l}\\right). \\tag{4.14} \\end{equation}\\] In Equation (4.14): \\(B_\\nu\\) is the modified Bessel function of the second kind; \\(|s - s'|\\) is the Euclidean distance between the point locations \\(s\\) and \\(s'\\); \\(\\nu\\) is the smoothness hyperparameter; \\(l\\) is the length-scale hyperparameter on the latitude-longitude scale. We fixed the smoothness hyperparameter \\(\\nu\\) to \\(3/2\\) to avoid concerns regarding the joint identifiability of the smoothness and lengthscale hyperparameters. This value matches that used to simulate data, and simplifies Equation (4.14) as follows \\[\\begin{equation} k(s, s') = \\left(1 + \\frac{\\sqrt{3} \\lvert s - s' \\rvert}{l} \\right) \\exp \\left(- \\frac{\\sqrt{3} \\lvert s - s' \\rvert}{l} \\right). \\end{equation}\\] The number of points per area \\(L_i\\) was set to 10 with a hexagonal spacing structure (Panel 4.5C). The actual values of \\(L_i\\) sometimes differed from 10 because sf::st_sample with type = \"hexagonal\" does not guarantee exactly the specified number of samples are returned (E. Pebesma 2018). 4.3.2.2 Prior distributions A weakly informative half-Gaussian prior was placed on the standard deviation such that \\(\\sigma_u \\sim \\mathcal{N}_+(0, 2.5^2)\\) (Gelman 2006). The value 2.5 avoids placing significant prior density on the region \\(\\sigma_u > 5\\), which after logistic transformation would facilitate undesirable variation on the probability scale very close to zero or one. A weakly informative \\(\\mathcal{N}(-2, 1)\\) prior was placed on \\(\\beta_0\\), setting most of the prior probability density for \\(\\text{logit}^{-1}(\\beta_0)\\) within a range typical for a disease prevalence. In cases where the length-scale \\(l\\) was fixed, it was set based on the geometry such that points an average distance apart had 1% correlation (N. Best et al. 1999). In cases where a prior distribution was set on the length-scale it was \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\), with \\(a\\) and \\(b\\) chosen for each geometry such that 5% of the prior mass was below the 5% quantile for distance between points and 5% of the prior mass was above the 95% quantile (Betancourt 2017). The sensitivity analysis in Appendix A.2 illustrates the extent to which six possible lengthscale prior distributions (Figure A.9) affect the lengthscale posterior distribution (Figure A.10). 4.3.2.3 Inference Approximate Bayesian inference was conducted using adaptive Gauss-Hermite quadrature [AGHQ; Stringer, Brown, and Stafford (2022)] with \\(k = 3\\) quadrature points over a marginal Laplace approximation via the aghq package (Stringer 2021). Models were implemented using a Template Model Builder C++ template for the log-posterior via the TMB package (Kristensen et al. 2016). Appendix A.1 compares posterior mean and standard deviations from AGHQ to those obtained using the No-U-Turn Sampler (NUTS) Hamiltonian Monte Carlo (HMC) algorithm run using Stan (Carpenter et al. 2017) via the tmbstan package (Monnahan and Kristensen 2018). 4.3.3 Model assessment Let the parameter \\(\\phi\\) have posterior marginal \\(f(\\phi) = p(\\phi \\, | \\, \\mathbf{y})\\) with cumulative distribution function \\(F\\). Let \\(\\phi_s\\) be samples \\(s \\in [S]\\) from \\(f\\). Here, the number of samples per posterior marginal was \\(S = 200\\). Let \\(\\omega\\) be the true value of \\(\\phi\\) used in the simulation. The accuracy of latent field parameter and hyperparameter posterior marginals from each model were assessed using three methods. These were the mean squared error (MSE), the continuous ranked probability score [CRPS; Matheson and Winkler (1976)], and the probability integral transform (PIT; Dawid (1984)) values. The MSE is a simple and popular measure, calculated using samples as \\[\\begin{equation} \\text{MSE}(f, \\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S (\\phi_s - \\omega)^2. \\end{equation}\\] The CRPS is a strictly proper scoring rule (SPSR) which has favourable properties and is often regarded as a default choice (Gneiting and Raftery 2007). Any scoring rule which is not strictly proper rewards a misrepresentation of beliefs. The CRPS is \\[\\begin{equation} \\text{CRPS}(f, \\omega) = \\int_{-\\infty}^{\\infty} (F(\\phi) - \\mathbb{I} \\{\\phi \\geq \\omega \\} )^2 \\text{d}\\phi. \\tag{4.15} \\end{equation}\\] The CRPS may be estimated using samples by \\[\\begin{equation} \\text{CRPS}(f, \\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S | \\phi_s - \\omega | - \\frac{1}{2S^2} \\sum_{s = 1}^S \\sum_{l = 1}^S | \\phi_s - \\phi_l |. \\tag{4.16} \\end{equation}\\] A posterior marginal is calibrated if over repeated simulations the quantile of the true value, known as the PIT value, is uniformly distributed such that \\[\\begin{equation} F(\\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S \\mathbb{I} \\{\\phi_i \\geq \\omega \\} = q \\sim \\mathcal{U}[0, 1]. \\tag{4.17} \\end{equation}\\] If Equation (4.17) holds then at any given nominal coverage \\(1 - \\alpha\\) the proportion of quantile-based credible intervals containing \\(\\omega\\) is also \\(1 - \\alpha\\). Uniformity was assessed using PIT histograms (Dawid 1984) and empirical cumulative distribution function (ECDF) difference plots (Aldor-Noiman et al. 2013) with simultaneous confidence bands as described in Säilynoja, Bürkner, and Vehtari (2022). 4.3.4 Results 4.3.4.1 Vignette geometries As each geometry only had three areas, the sample size of 250 synthetic data was insufficient to distinguish between inferential models for the vignette geometries. Figures A.13, A.14, A.15 and A.16 show that almost all 95% credible intervals for the mean CRPS in estimating \\(\\rho_i\\) overlap. Additionally, for the vignette geometries, both the heuristic method for fixing a lengthscale, and lengthscale prior distribution, were misspecified. Three points was insufficient to learn the lengthscale, and as such misspecification of the prior distribution propagated to the posterior distribution (Figure A.11). To produce higher resolution and more meaningful results, the simulation study for the vignette geometries should be rerun. Two changes should be made. First, an increase to the sample size. Second, more careful specification of study with regard to the lengthscale. 4.3.4.2 Realistic geometries Figure 4.7: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the grid geometry (Panel 4.6E). The mean value averages over both areas and simulation runs. Figure 4.8: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the Côte d’Ivoire geometry (Panel 4.6F). The mean value averages over both areas and simulation runs. Figure 4.9: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the Texas geometry (Panel 4.6G). The mean value averages over both areas and simulation runs. The two problems with the vignette geometry study did not apply to the more realistic geometries. Figures 4.7, 4.8 and 4.9 show mean CRPS values in estimating \\(\\rho_i\\) with 95% credible intervals which rarely overlap, and hence provide meaningful findings. Mean MSE and CRPS values are provided in Tables A.2 and A.3. The mean values are an average over both the number of areas in each geometry and the number of simulations run. The mean CRPS varied substantially between the three models (Table 4.1) used to simulate synthetic data. IID structure is harder to predict than spatial structure, and to a lesser extent, Besag structure is harder to predict than IK. This observation is explained by correlation structure making forecasting easier. For IID synthetic data, the IID and BYM2 models performed well. The BYM2 model also performed almost as well as the Besag model on the spatially structured synthetic data. Appendix A.3.2 shows that the BYM2 proportion parameter successfully recovers either IID or spatial structure. Meanwhile, the IID model performed poorly on spatially structured synthetic data. The performance of kernel models on IID and Besag synthetic data diminished with increasingly spatially irregular geometry. For the most part, differences between the centroid and integrated kernel models were small, even for synthetic data generated from the IK model. Only for the IK simulated data there was a significant difference between the kernel models with a fixed lengthscale and prior distribution set on the lengthscale. Interpretation of CRPS choropleths (Figures 4.7, 4.8 and 4.9) was challenging primarily due to two factors: varying scores by simulation model, and limited sample size at the area-level. It would be relatively simple to remedy these challenges, such that figures of this kind could help to uncover precise findings about spatial random effect models. For IID synthetic data, spatial models tend to produce “U”-shaped ECDF difference plots (Figures A.28, A.29 and A.30). In other words, the quantile of the true value is too often near zero or one. This pattern corresponds to over-smoothing. 4.4 HIV prevalence study Simulation studies are a valuable tool for experimenting on models in controlled environments. However, it is difficult to capture the complexity of a realistic applied scenario using simulation. Therefore, it is important to complement simulation studies with studies conducted on real data. To this end, model performance was compared in estimating district-level HIV prevalence \\(\\rho_i \\in [0, 1]\\) in adults aged 15-49. Household survey data was used from across four countries in sub-Saharan Africa (Table 4.3, Figure 4.10). Table 4.3: The four PHIA household surveys included in the HIV prevalence study (Section 4.4). Country Survey Number of areas Analysis level Côte d’Ivoire PHIA 2017 33 Regions Malawi PHIA 2016 31 Health districts and cities, with islands removed Tanzania PHIA 2017 26 Regions, with islands removed Zimbabwe PHIA 2016 60 Districts 4.4.1 Household survey data Figure 4.10: Adult (15-49) HIV prevalence from the most recent PHIA survey conducted in Côte d’Ivoire (Panel A), Malawi (Panel B), Tanzania (Panel C), and Zimbabwe (Panel D). These estimates are survey weighted according to Equation (4.18). Data from the most recent publicly available Population Health Impact Assessment (PHIA) survey were used in each country. Let \\(y_{ij} \\in \\{0, 1\\}\\) be the survey response for individual \\(j\\) in area \\(i\\). The survey designs used were complex in that each individual had potentially unequal probabilities \\(\\pi_{ij}\\) of being included in the survey. Sampling weights \\[\\begin{equation} w_{ij} = \\frac{1}{\\pi_{ij}} \\end{equation}\\] were used to account for the complex survey design. The survey weighted prevalence in area \\(i\\) is \\[\\begin{equation} \\rho_i^\\star = \\frac{\\sum_{j} w_{ij} y_{ij}}{\\sum_{j} w_{ij}}. \\tag{4.18} \\end{equation}\\] The effective number of cases \\(y_i^\\star = \\rho_i^\\star \\cdot m_i^\\star\\) is given by the product of the weighted prevalence, and the Kish effective sample size (Kish 1965) \\[\\begin{equation} m_i^\\star = \\frac{(\\sum_j w_{ij})^2}{\\sum_j w_{ij}^2}, \\end{equation}\\] and may be intuitively thought of as what would have been observed had the survey been a simple random sample. 4.4.2 Inferential models The inferential models used correspond to those in Section 4.3 with a small modification. As before, prevalences \\(\\rho_i\\) were modelled via \\(\\text{logit}(\\rho_i) = \\beta_0 + u_i\\) with spatial random effect specification varied according to Table 4.2. Due to survey weighting, the effective number of cases \\(y_i^\\star \\in \\mathbb{R}\\) and effective sample size \\(m_i^\\star \\in \\mathbb{R}\\) may not be integers. Following Chen, Wakefield, and Lumely (2014) a generalised binomial distribution \\(y_i^\\star \\sim \\text{xBin}(m_i^\\star, \\rho_i)\\) was used, with working likelihood for \\(m^\\star_i \\geq y^\\star_i\\) given by \\[\\begin{equation} p(y_i^\\star \\, | \\, m_i^\\star, \\rho_i) = \\frac{\\Gamma(m_i^\\star + 1)}{\\Gamma(y_i^\\star + 1) \\Gamma(m_i^\\star - y_i^\\star + 1)} \\rho_i ^{y_i^\\star} (1 - \\rho_i)^{(m_i^\\star - y_i^\\star)}. \\tag{4.19} \\end{equation}\\] 4.4.3 Model comparison Figure 4.11: In leave-one-out (LOO) cross-validation, one observation is left out of the training data and predicted upon in each fold. The spatial-leave-one-out (SLOO) cross-validation scheme considered here is similar, only differing in that observations corresponding to adjacent areas are also left out of the training data. Each model was assessed using (Figure 4.11): a regular leave-one-out cross-validation (LOO-CV); a spatial leave-one-out cross-validation (SLOO-CV). At each fold the CRPS, MSE and quantile (as in Section 4.3.3) of posterior predictive samples as compared with the observed data were computed. In this section, the number of samples per posterior marginal was \\(S = 1000\\). 4.4.4 Results Figure 4.12: The mean pointwise leave-one-out and spatial leave-one-out CRPS in estimating \\(\\rho_i\\) using each inferential model for the four PHIA surveys described in Table 4.3. The 95% credible intervals shown are generated using 1.96 times the standard error. #jhmfoggtba table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #jhmfoggtba thead, #jhmfoggtba tbody, #jhmfoggtba tfoot, #jhmfoggtba tr, #jhmfoggtba td, #jhmfoggtba th { border-style: none; } #jhmfoggtba p { margin: 0; padding: 0; } #jhmfoggtba .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #jhmfoggtba .gt_caption { padding-top: 4px; padding-bottom: 4px; } #jhmfoggtba .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #jhmfoggtba .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #jhmfoggtba .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #jhmfoggtba .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #jhmfoggtba .gt_column_spanner_outer:first-child { padding-left: 0; } #jhmfoggtba .gt_column_spanner_outer:last-child { padding-right: 0; } #jhmfoggtba .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #jhmfoggtba .gt_spanner_row { border-bottom-style: hidden; } #jhmfoggtba .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #jhmfoggtba .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #jhmfoggtba .gt_from_md > :first-child { margin-top: 0; } #jhmfoggtba .gt_from_md > :last-child { margin-bottom: 0; } #jhmfoggtba .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #jhmfoggtba .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #jhmfoggtba .gt_row_group_first td { border-top-width: 2px; } #jhmfoggtba .gt_row_group_first th { border-top-width: 2px; } #jhmfoggtba .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #jhmfoggtba .gt_first_summary_row.thick { border-top-width: 2px; } #jhmfoggtba .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #jhmfoggtba .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #jhmfoggtba .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_left { text-align: left; } #jhmfoggtba .gt_center { text-align: center; } #jhmfoggtba .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #jhmfoggtba .gt_font_normal { font-weight: normal; } #jhmfoggtba .gt_font_bold { font-weight: bold; } #jhmfoggtba .gt_font_italic { font-style: italic; } #jhmfoggtba .gt_super { font-size: 65%; } #jhmfoggtba .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #jhmfoggtba .gt_asterisk { font-size: 100%; vertical-align: 0; } #jhmfoggtba .gt_indent_1 { text-indent: 5px; } #jhmfoggtba .gt_indent_2 { text-indent: 10px; } #jhmfoggtba .gt_indent_3 { text-indent: 15px; } #jhmfoggtba .gt_indent_4 { text-indent: 20px; } #jhmfoggtba .gt_indent_5 { text-indent: 25px; } Table 4.4: The mean pointwise leave-one-out and spatial leave-one-out CRPS in estimating \\(\\rho_i\\) for each inferential model across the four considered PHIA surveys. The units used in this table are thousandths. For standard errors, see Figure 4.12. PHIA survey Continuous ranked probability score (units: 1/1000) IID Besag BYM2 FCK CK FIK IK LOO Côte d’Ivoire, 2017 6.6 6.6 6.7 6.7 6.9 6.9 6.9 Malawi, 2016 31.7 19.5 19.6 22.7 22.8 21.4 21.0 Tanzania, 2017 14.9 12.1 13.4 10.7 9.5 10.3 10.6 Zimbabwe, 2016 28.9 20.8 20.9 21.7 21.6 21.4 22.0 SLOO Côte d’Ivoire, 2017 6.5 6.6 6.6 6.4 6.9 6.4 6.8 Malawi, 2016 31.6 19.3 19.9 26.5 29.0 25.1 28.3 Tanzania, 2017 14.9 12.1 18.1 16.0 17.6 15.4 16.9 Zimbabwe, 2016 29.1 20.8 25.2 26.7 26.2 26.1 26.3 The results (Figure 4.12, Table 4.4, Table A.4) for each survey were as follows: For the 2017 PHIA survey in Côte d’Ivoire, all of the models performed similarly, using both LOO- and SLOO-CV (Figure A.37). The pointwise CRPS for all models was high at one outlying district in the survey, Grand-Ponts. It is difficult to see how any spatial random model would perform well in this situation, without additional covariates or using a distribution with heavier tails than the Gaussian. The CK and IK models had lengthscale posterior distributions largely unchanged from their prior distribution (Figure A.31). This uncertainty in lengthscale resulted in wide prevalence 95% credible intervals for the CK and IK models in Figure A.33. This example shows the importance of being careful using kernel models, and the prior distributions set on their hyperparameters. It is surprisingly that this behaviour appears not to have resulted in poor LOO or SLOO performance. For this survey the BYM2 proportion posterior distribution was also similar to its prior distribution, in contrast to each of the other surveys which had BYM2 proportion posteriors peaked at one, corresponding to spatially structured noise (Figure A.32). For the 2017 PHIA survey in Malawi the Besag and BYM2 models performed the best, followed by the kernel models, and then the IID model (Figure A.38). While the LOO and SLOO CRPS values for IID, Besag and BYM2 models were similar, for the kernel models forecasting performance was substantially reduced by leaving out adjacent districts. This finding is surprising, as the kernel models make use of more distant correlations, and it is the adjacency-based models that one would intuitively expect to be hampered more by the SLOO-CV. For the IID model, that LOO and SLOO performance are similar is no surprise as in all cases the IID model should be predicting the mean. Though less data are available in the SLOO case, this should be of little consequence. For the 2017 PHIA survey in Tanzania (Figure A.39), under LOO-CV the kernel models performed better, but under SLOO-CV there was a significant drop in performance. Finally, for the 2016 PHIA survey in Zimbabwe, performance for each of the spatially structured models was similar (Figure A.40). Again, under SLOO-CV, performance of the BYM2 and kernel-based models dropped. Differences within the kernel-based models for this survey, and indeed across all four surveys, were limited. 4.5 Discussion 4.5.1 Modelling Though there are situations where other models perform better, on the whole this study supports the use of adjacency-based spatial random effect models. For the study on HIV survey data, adjacency-based models performed well, if not the best, in all cases. That is not to say that under data truly generated from a kernel model, there isn’t significant benefit to using the corresponding kernel model for inference. However, the transferability of this finding to applied settings is limited by the following factors. First and foremost, it is usually impossible to know that real data was generated from any particular process. Second, the synthetic data study used the same kernel, Matérn with \\(\\nu = 3/2\\) (Equation (4.14)), for both simulation and inference, and as such represents a best-case. Third, specification of the lengthscale prior distribution is challenging, and easy to do badly. Finally, aggregation via the integrated kernel occurred at the level of the latent field, despite the fact that most of the time we expect aggregation to occur at the level of the data. If the link function \\(g\\) is the identity or linear then the two are equivalent, but non-linear link functions create a discrepancy, which this study did not address. This chapter did not consider use of the stochastic partial differential equation (SPDE) approximation of Lindgren, Rue, and Lindström (2011) as a potentially more computationally efficient way to implement integrated kernel models (Wilson and Wakefield 2018). Though the underlying models are ultimately similar, that is a continuous Matérn random field over space aggregated at an area-level, the findings from this work are likely to apply to use of the SPDE approximation. Nonetheless, it would be of value to confirm this empirically. This chapter used area-level models to for data which arises by aggregation of point-level data. However, Konstantinoudis et al. (2020) found that using a point-level LGCP model rather than an area-level BYM model may have significant benefits. The work in this chapter does not address the broader question of under which circumstances use of an area or point-level model is sensible. The adjacency-based models considered in this study were limited to the Besag and BYM2 model. Although these are perhaps the most widely used adjacency-based models, others could have been considered. Examples include the more general weighted ICAR model discussed in Section 4.1.4. Additionally, it would be of interest to implement the integrated kernel model with population-based weighting (Section 4.2.2.1). The models used for spatial structure in this chapter were all stationary. Although stationarity assumptions may be violated by HIV survey data, it remains challenging to estimate non-stationary spatial structure (Christopher J. Paciorek and Schervish 2006). 4.5.2 Model comparison Previous spatial random effect comparison studies (Nicky Best, Richardson, and Thomson 2005; Lee 2011) were limited to the DIC measure of model performance. Use of the DIC is strongly discouraged by Vehtari, Gelman, and Gabry (2017). This study used less flawed measures of model performance, such as the cross-validated CRPS. It would be beneficial to compute the DIC and WAIC in Section 4.4 as a comparison. Additionally, the measures used in this study were computed and presented by individual area. With refinements to the sample sizes used, these area-specific measures of performance could enable more nuanced conclusions about the use of spatial random effect models. Cross-validation was performed using \\(\\rho\\) as the forecasting target, rather than \\(y\\) as is typical. This decision was made because applied interest is in forecasting HIV prevalence at a district level, not forecasting the outcome of a household survey. It could be argued that a district does not become more important to forecast well by virtue of surveying a larger sample size in that district. That said, an alternative viewpoint is that forecast accuracy should be incentivised in proportion to district population size, such that PLHIV is accurately estimated. If sample size is proportional to population size, then forecasting \\(y\\) could be a useful proxy. Choice of the particular parameter, or transformation of that parameter (Nikos I. Bosse et al. 2023), to score is an ongoing topic of research. The CRPS was used in preference to the log-score. Whereas the log-score requires a kernel density estimate of the posterior distribution, and is therefore sensitive to tuning parameters, the CRPS can be estimated from samples alone. A downside of use of the CRPS and MSE is their relative lack of interpretability. For example, it is difficult to determine whether a forecast is good, or suitable for practical use, on the basis of its CRPS or MSE. Measures such as the skill score have been used to contrast forecast performance with some baseline. A constant model, with no random effects, could be used as such a baseline. 4.5.3 Inference A strength of this work is that all of the inferential models (Table 4.2) in this chapter were implemented in TMB. Inference was then conducted using AGHQ over the marginal Laplace approximation using the aghq package. The accuracy of inferences was compared to gold-standard results from NUTS obtained using the tmbstan package. An earlier version of this study used R-INLA. Not all of the inferential models were compatible with R-INLA, so rstan was used in some cases. However, due to the difference in inference algorithm, this study design conflated statistical models with inference algorithms. Consistent use of TMB, a fast and flexible tool for spatial modelling (Osgood-Zimmerman and Wakefield 2023), overcame this limitation. Chapter 6 extends TMB to implement the INLA algorithm of R-INLA. References Aldor-Noiman, Sivan, Lawrence D Brown, Andreas Buja, Wolfgang Rolke, and Robert A Stine. 2013. “The power to see: A new graphical test of normality.” The American Statistician 67 (4): 249–60. Arambepola, Rohan, Tim CD Lucas, Anita K Nandi, Peter W Gething, and Ewan Cameron. 2022. “A simulation study of disaggregation regression for spatial disease mapping.” Statistics in Medicine 41 (1): 1–16. Bennett, James E, Helen Tamura-Wicks, Robbie M Parks, Richard T Burnett, C Arden Pope III, Matthew J Bechle, Julian D Marshall, Goodarz Danaei, and Majid Ezzati. 2019. “Particulate matter air pollution and national and county life expectancy loss in the USA: A spatiotemporal analysis.” PLOS Medicine 16 (7): e1002856. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Best, Nicky, Sylvia Richardson, and Andrew Thomson. 2005. “A comparison of Bayesian spatial models for disease mapping.” Statistical Methods in Medical Research 14 (1): 35–59. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. Bhatt, Samir, DJ Weiss, E Cameron, D Bisanzio, B Mappin, U Dalrymple, KE Battle, et al. 2015. “The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015.” Nature 526 (7572): 207–11. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Bosse, Nikos I, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, and Sebastian Funk. 2023. “Scoring epidemiological forecasts on transformed scales.” PLOS Computational Biology 19 (8): e1011393. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Chau, Siu Lun, Shahine Bouabid, and Dino Sejdinovic. 2021. “Deconditional downscaling with Gaussian processes.” Advances in Neural Information Processing Systems 34: 17813–25. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Cramb, SM, EW Duncan, PD Baade, and KL Mengersen. 2018. “Investigation of Bayesian spatial models.” Cancer Council Queensland; Queensland University of Technology (QUT). Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dean, CB, MD Ugarte, and AF Militino. 2001. “Detecting interaction between random region and fixed age effects in disease mapping.” Biometrics 57 (1): 197–202. Diggle, Peter J, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. “Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm.” Statistical Science 28 (4): 542–63. Donegan, Connor. 2022. “geostan: An R package for Bayesian spatial analysis.” The Journal of Open Source Software 7 (79): 4716. https://doi.org/10.21105/joss.04716. Duncan, Earl W, Nicole M White, and Kerrie Mengersen. 2017. “Spatial smoothing in Bayesian models: a comparison of weights matrix specifications and their impact on inference.” International Journal of Health Geographics 16 (1): 1–16. Dwyer-Lindgren, Laura, Michael A Cork, Amber Sligar, Krista M Steuben, Kate F Wilson, Naomi R Provost, Benjamin K Mayala, et al. 2019. “Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017.” Nature 570 (7760): 189–93. Dwyer-Lindgren, Laura, Abraham D Flaxman, Marie Ng, Gillian M Hansen, Christopher JL Murray, and Ali H Mokdad. 2015. “Drinking patterns in US counties from 2002 to 2012.” American Journal of Public Health 105 (6): 1120–27. Flaxman, Seth R, Yu-Xiang Wang, and Alexander J Smola. 2015. “Who supported Obama in 2012? Ecological inference through distribution regression.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 289–98. Follestad, Turid, and Håvard Rue. 2003. “Modelling spatial variation in disease risk using Gaussian Markov random field proxies for Gaussian random fields.” Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. “A note on intrinsic conditional autoregressive models for disconnected graphs.” Spatial and Spatio-Temporal Epidemiology 26: 25–34. Gärtner, Thomas, Peter A Flach, Adam Kowalczyk, and Alexander J Smola. 2002. “Multi-instance kernels.” In ICML, 2:7. 3. ———. 2006. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Gössl, Christoff, Dorothee P Auer, and Ludwig Fahrmeir. 2001. “Bayesian spatiotemporal inference in functional magnetic resonance imaging.” Biometrics 57 (2): 554–62. Haining, Robert P. 2003. Spatial data analysis: theory and practice. Cambridge University Press. Hamelijnck, O, T Damoulas, K Wang, and MA Girolami. 2019. “Multi-resolution multi-task Gaussian processes.” Advances in Neural Information Processing Systems 32. Howes, Adam. 2023a. arealutils: Utility functions for beyond-borders. Johnson, Olatunji, Peter Diggle, and Emanuele Giorgi. 2019. “A spatially discrete approximation to log-Gaussian Cox processes for modelling aggregated disease count data.” Statistics in Medicine 38 (24): 4871–87. Kelsall, Julia, and Jonathan Wakefield. 2002. “Modeling spatial variation in disease risk: a geostatistical approach.” Journal of the American Statistical Association 97 (459): 692–701. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Konstantinoudis, Garyfallos, Dominic Schuhmacher, Håvard Rue, and Ben D Spycher. 2020. “Discrete versus continuous domain models for disease mapping.” Spatial and Spatio-Temporal Epidemiology 32: 100319. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Law, Ho Chung, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. “Variational learning on aggregate outputs with Gaussian processes.” Advances in Neural Information Processing Systems 31. Lee, Duncan. 2011. “A comparison of conditional autoregressive models used in Bayesian disease mapping.” Spatial and Spatio-Temporal Epidemiology 2 (2): 79–89. Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of disease rates in small areas: a new mixed model for spatial dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer. Li, Ye, Patrick Brown, Dionne C Gesink, and Håvard Rue. 2012. “Log Gaussian Cox processes and spatially aggregated disease incidence data.” Statistical Methods in Medical Research 21 (5): 479–507. https://doi.org/10.1177/0962280212446326. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. Mayala, Benjamin K., Samir Bhatt, and Peter Gething. 2020. “Predicting HIV/AIDS at Subnational Levels using DHS Covariates related to HIV.” DHS Spatial Analysis Reports 18. Rockville, Maryland, USA: ICF. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Morris, Mitzi, Katherine Wheeler-Martin, Dan Simpson, Stephen J. Mooney, Andrew Gelman, and Charles DiMaggio. 2019. “Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan.” Spatial and Spatio-Temporal Epidemiology 31: 100301. https://doi.org/https://doi.org/10.1016/j.sste.2019.100301. Nandi, Anita K, Tim CD Lucas, Rohan Arambepola, Peter Gething, and Daniel J Weiss. 2023. “disaggregation: An R Package for Bayesian Spatial Disaggregation Modeling.” Journal of Statistical Software 106: 1–19. Openshaw, S, and P. J. Taylor. 1979. “A million or so correlation coefficients, three experiments on the modifiable areal unit problem.” Statistical Applications in the Spatial Science, 127–44. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Paciorek, Christopher J et al. 2013. “Spatial models for point and areal data using Markov random fields on a fine grid.” Electronic Journal of Statistics 7: 946–72. Paciorek, Christopher J., and Mark J. Schervish. 2006. “Spatial modelling using a new class of nonstationary covariance functions.” Environmetrics 17 (5): 483–506. https://doi.org/https://doi.org/10.1002/env.785. Parks, Robbie M, James E Bennett, Helen Tamura-Wicks, Vasilis Kontis, Ralf Toumi, Goodarz Danaei, and Majid Ezzati. 2020. “Anomalously warm temperatures are associated with increased injury deaths.” Nature Medicine 26 (1): 65–70. Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009. Rashid, T, JE Bennett, D Muller, A Cross, J Pearson-Stuttard, H Daby, D Fecht, B Davies, and M Ezzati. 2023. “Inequalities in mortality from leading cancers in districts of England from 2002 to 2019: population-based high-resolution spatiotemporal analysis of vital registration data.” The Lancet Oncology. http://hdl.handle.net/10044/1/107364. Riebler, Andrea, Sigrunn H Sørbye, Daniel Simpson, and Håvard Rue. 2016. “An intuitive Bayesian spatial model for disease mapping that accounts for scaling.” Statistical Methods in Medical Research 25 (4): 1145–65. ———. 2020. “Comment on R-INLA Discussion Group thread.” Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saracco, James F, J Andrew Royle, David F DeSante, and Beth Gardner. 2010. “Modeling spatial variation in avian survival and residency probabilities.” Ecology 91 (7): 1885–91. Saunders, Daniel. 2023. “The Besag-York-Mollie Model for Spatial Data.” In PyMC Examples, edited by PyMC Team. https://doi.org/10.5281/zenodo.5654871. Schmid, Volker J, Brandon Whitcher, Anwar R Padhani, N Jane Taylor, and Guang-Zhong Yang. 2006. “Bayesian methods for pharmacokinetic models in dynamic contrast-enhanced magnetic resonance imaging.” IEEE Transactions on Medical Imaging 25 (12): 1627–36. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Sørbye, Sigrunn Holbek, and Håvard Rue. 2014. “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics 8: 39–51. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tanaka, Yusuke, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. “Spatially aggregated Gaussian processes with multivariate areal outputs.” In Advances in Neural Information Processing Systems, 3005–15. Teh, Yee Whye, Bryn Elesedy, Bobby He, Michael Hutchinson, Sheheryar Zaidi, Avishkar Bhoopchand, Ulrich Paquet, Nenad Tomasev, Jonathan Read, and Peter J. Diggle. 2022. “Efficient Bayesian inference of Instantaneous Reproduction Numbers at Fine Spatial Scales, with an Application to Mapping and Nowcasting the Covid-19 Epidemic in British Local Authorities.” Journal of the Royal Statistical Society Series A: Statistics in Society 185 (1): S65–85. https://doi.org/10.1111/rssa.12971. Utazi, C Edson, Julia Thorley, VA Alegana, MJ Ferrari, Kristine Nilsen, Saki Takahashi, CJE Metcalf, Justin Lessler, and AJ Tatem. 2019. “A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping.” Statistical Methods in Medical Research 28 (10-11): 3226–41. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Wakefield, J, and S Morris. 1999. “Spatial dependence and errors-in-variables in environmental epidemiology.” Bayesian Statistics 6: 657–84. Wakefield, Jonathan, and Hilary Lyons. 2010. “Spatial Aggregation and the Ecological Fallacy.” In Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 2010:541–58. https://doi.org/10.1201/9781420072884-c30. Weiss, Daniel J, Bonnie Mappin, Ursula Dalrymple, Samir Bhatt, Ewan Cameron, Simon I Hay, and Peter W Gething. 2015. “Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach.” Malaria Journal 14 (1): 1–18. Wilson, Katie, and Jon Wakefield. 2018. “Pointless spatial modeling.” Biostatistics 21 (2): e17–32. https://doi.org/10.1093/biostatistics/kxy041. Yousefi, Fariba, Michael T Smith, and Mauricio Alvarez. 2019. “Multi-task learning for aggregated data using Gaussian processes.” Advances in Neural Information Processing Systems 32. "],["multi-agyw.html", "5 A model for risk group proportions 5.1 Background 5.2 Data 5.3 Model for risk group proportions 5.4 Prevalence and incidence by risk group 5.5 Results 5.6 Discussion", " 5 A model for risk group proportions This chapter describes an application of Bayesian spatio-temporal statistics to small-area estimation of HIV risk group proportions. This work was conducted in collaboration with colleagues from the MRC Centre for Global Infectious Disease Analysis and UNAIDS. I developed the statistical model, building upon an earlier version of the analysis conducted by Dr. Kathryn Risher. The model and results for 13 countries are presented in Howes et al. (2023). Outputs are implemented in a spreadsheet tool (https://hivtools.unaids.org/shipp/) for use in national HIV response planning. The tool is being updated by inclusion of more countries to the analysis, and extension of the methodology, including to additional risk groups. Code for the analysis in this chapter is available from https://github.com/athowes/multi-agyw and supported by the multi.utils R package (Howes 2023b). 5.1 Background Figure 5.1: Risk of acquiring HIV depends on both individual-level risk behaviour and population-level HIV incidence. It is assumed here that with no individual-level risk behaviour, there is no risk of acquiring HIV, independent of the population-level HIV incidence. The risk scale is intended to be illustrative, rather than interpreted quantitatively. In SSA, adolescent girls and young women (AGYW) aged 15-29 are at increased risk of HIV infection. AGYW account for only 28% of the population, but comprise 44% of new infections (UNAIDS 2021a). HIV incidence for AGYW is 2.4 times higher than for similarly aged (15-29) males. The social and biological reasons for this disparity include structural vulnerabilities and power imbalances, age patterns of sexual mixing, a younger age at first sex, and increased susceptibility to HIV infection. On this basis, AGYW have been identified as a priority population for HIV prevention services. Significant investments, including by the Global Fund (The Global Fund 2018) and the DREAMS (Determined, Resilient, Empowered, AIDS-free, Mentored, and Safe) partnership (Saul et al. 2018), have been made to support prevention programming. The Global AIDS Strategy 2021-2026 (UNAIDS 2021b) was adopted by the United Nations (UN) General Assembly in June 2021, and “outlines the strategic priorities and actions to be implemented by global, regional, country and community partners to get on-track to ending AIDS”. It proposed stratifying HIV prevention packages to AGYW based on two factors: local population-level HIV incidence, and individual-level sexual risk behaviour. Risk of acquiring HIV depends importantly on both factors. As such, prioritisation of prevention services is more efficient if both are taken into account. Figure 5.1 illustrates this fact stylistically. The strategy encourages programmes to define targets for the proportion of AGYW to be reached with a range of interventions (Table B.2) based on prioritisation strata which incorporate behavioural risk (Table B.1). Implementation of the strategy by national HIV programmes and stakeholders requires estimates of the population size and HIV incidence in each risk group by location. In this chapter, I used a Bayesian spatio-temporal model (Section 5.3) of behavioural data from household surveys (Section 5.2) to estimate HIV risk group proportions. To then estimate risk group specific HIV prevalence and HIV incidences (Section 5.4), I combined the proportion estimates with population size, HIV prevalence and HIV incidence estimates, as well as risk group specific HIV incidence rate ratios, and HIV prevalence rate ratios. Finally, by ordering district, age, risk group strata by HIV incidence, I estimated an upper bound for the number of new HIV infections that could be averted under different risk prioritisation strategies (Section 5.4.3). 5.2 Data 5.2.1 Behavioural data from household surveys I used household survey data from 13 countries identified by the Global Fund (The Global Fund 2018) as priority countries for implementation of AGYW HIV prevention. These countries were Botswana, Cameroon, Kenya, Lesotho, Malawi, Mozambique, Namibia, South Africa, Eswatini, Tanzania, Uganda, Zambia and Zimbabwe. Surveys conducted in these countries between 1999 and 2018 were included in which both women were interviewed about their sexual behaviour, and sufficient geographic data were available to locate survey clusters to health districts. There were 46 suitable surveys (Figure 5.2), with a total sample size of 274,970 women aged 15-29 years. Of the respondents, 103,063 were aged 15-19 years, 92,173 were aged 20-24 years, and 79,734 were aged 25-29 years. The median number of surveys per country was four, ranging from one in both Botswana and South Africa to six in Uganda. Figure 5.2: Surveys conducted 1999-2018 that were used in the analysis by year, survey type, sample size, and whether the survey included a specific question about transactional sex. That is, whether the respondent had “had sex in return for gifts, cash or anything else in the past 12 months”. Survey type included AIDS Indicator Surveys (AIS), Demographic and Health Surveys (DHS), the Botswana AIDS Impact Survey 2013 (BAIS), and Population-based HIV Impact Assessment (PHIA) surveys. For each survey, respondents were classified into one of four behavioural risk groups according to reported sexual risk behaviour in the past 12 months (Figure 5.3), which I index by \\(k\\). In increasing order of HIV acquisition risk, these risk groups were: \\(k = 1\\): Not sexually active \\(k = 2\\): One cohabiting sexual partner \\(k = 3\\): Non-regular or multiple sexual partner(s), and \\(k = 4\\): Reporting transactional sex. Table 5.1: HIV risk groups and HIV incidence rate ratios relative to AGYW with one cohabiting sexual partner. The incidence rate ratio for women with non-regular or multiple sexual partner(s) was derived from analysis of longitudinal data by Slaymaker et al. (2020). Among female sex workers (FSW), the incidence rate ratio (25.0, 13.0, 9.0, 6.0, 3.0) depended on the level of HIV incidence among the general population (<0.1%, 0.1-0.3%, 0.3-1.0%, 1.0-3.0%, >3.0%), such that higher local HIV incidence in the general population corresponded to a lower incidence rate ratio for FSW. Estimates of HIV incidence rate ratios for FSW were derived by UNAIDS based on patterns of relative HIV prevalence among FSW compared to general population prevalence. Risk group Description Incidence rate ratio None Not sexually active 0.0 Low One cohabiting sexual partner 1.0 (baseline) High Non-regular or multiple partner(s) 1.72 Very High Reporting transactional sex (later adjusted to correspond to FSW) 3.0-25.0 (varied depending on local HIV incidence) The HIV incidence rate ratio \\(\\text{RR}_k\\), used to calculate HIV incidence, was assumed to vary by risk group (Table 5.1). The one cohabiting partner risk group was set as baseline such that \\(\\text{RR}_2 = 1\\). For the \\(k = 4\\) risk group, the HIV incidence ratio ratio was further assumed to vary by local HIV incidence among the general population. Exact survey questions varied slightly across survey types and between survey phases. Questions captured information about whether the respondent had been sexually active in the past twelve months, and if so with how many partners. For their three most recent partners, respondents were also asked about the type of partnership. Possible partnership types included spouse, cohabiting partner, partner not cohabiting with respondent, friend, sex worker, sex work client, and other. The survey questions used are in Appendix B.4. In the case of inconsistent responses, women were categorised according to the highest risk group they fell into, ensuring that the categories were mutually exclusive. Some surveys included a specific question asking if the respondent had received or given money or gifts for sex in the past twelve months. In these surveys, 2.64% of women reported transactional sex. In surveys without such a question, women almost never (0.01%) answered that one of their three most recent partners was a sex work client. This incomparability made it inappropriate to include surveys without a specific transactional sex question when estimating the proportion of the population who engaged in transactional sex. Of the total 46 surveys included in the analysis, 12 had a specific transactional sex question, with a total sample size of 62,853 (28,753 aged 15-19 years, 26,324 aged 20-24 years, and 7,776 aged 25-29 years). The sample size for women aged 25-29 is smaller because there were 6 DHS surveys which excluded women 25-29 from the transactional sex survey question. Table B.3 gives the sample size by age group for every survey included in the analysis. Figure 5.3: Flowchart describing how respondents were classified to HIV risk groups based on their survey responses. 5.2.2 Other data In addition to the household survey behavioural data, I used estimates of population, PLHIV and new HIV infections stratified according to district and age group from HIV estimates published by UNAIDS that were developed using the Naomi model (Eaton et al. 2021). I used the most recent 2022 estimates for all countries, apart from Mozambique where, due to data accuracy concerns, I used the 2021 estimates (in which the Cabo Delgado province is excluded due to disruption by conflict). I used administrative area hierarchy and geographic boundaries corresponding to those used for health service planning by countries (Table B.5). Exceptions were Cameroon and Kenya, where I conducted analyses one level higher at the department and county levels, respectively. 5.3 Model for risk group proportions Owing to the incomparability in estimating the \\(k = 4\\) risk group across surveys, I took a two-stage modelling approach to estimate the four risk group proportions. Denote being in either the third or fourth risk group as \\(k = 3^{+}\\). First, using all the surveys, I used a spatio-temporal multinomial logistic regression model to estimate the proportion of AGYW in the risk groups \\(k \\in \\{1, 2, 3^{+}\\}\\). This model is described in Section 5.3.1. Then, using only those surveys with a specific transactional sex question, I fit a spatial logistic regression model to estimate the proportion of those in the \\(k = 3^{+}\\) risk group that were in the \\(k = 3\\) and \\(k = 4\\) risk groups respectively. This model is described in Section 5.3.2. 5.3.1 Spatio-temporal multinomial logistic regression Let \\(i \\in \\{1, \\ldots, n\\}\\) denote districts partitioning the 13 studied AGYW priority countries \\(c[i] \\in \\{1, \\ldots, 13\\}\\). Consider the years 1999-2018 denoted as \\(t \\in \\{1, \\ldots, T\\}\\), and age groups \\(a \\in \\{\\text{15-19}, \\text{20-24}, \\text{25-29}\\}\\). Let \\(p_{itak} > 0\\) with \\(\\sum_{k = 1}^{3^{+}} p_{itak} = 1\\), be the probabilities of membership of risk group \\(k\\). 5.3.1.1 Multinomial logistic regression A standard multinomial logistic regression model (e.g. Gelman et al. 2013) is specified by \\[\\begin{align} \\mathbf{y}_{ita} &= (y_{ita1}, \\ldots, y_{ita3^{+}})^\\top \\sim \\text{Multinomial}(m_{ita}; \\, p_{ita1}, \\ldots, p_{ita3^{+}}), \\tag{5.1} \\\\ \\log \\left( \\frac{p_{itak}}{p_{ita1}} \\right) &= \\eta_{itak}, \\quad k = 2, 3^{+}, \\tag{5.2} \\end{align}\\] where the number in risk group \\(k\\) is \\(y_{itak}\\), the fixed sample size is \\(m_{ita} = \\sum_{k = 1}^{3^{+}} y_{itak}\\), and \\(k = 1\\) is chosen as the baseline category. This model is not a latent Gaussian model [LGM; Håvard Rue, Martino, and Chopin (2009)] because each observation \\(y_{itak}\\) for \\(k \\in \\{1, 2, 3^{+}\\}\\) depends non-linearly on multiple structured additive predictors \\(\\{\\eta_{itak}, k = 1, 2, 3^{+}\\}\\). The model, defined over 940 districts, 20 years, 3 age groups, and 3 risk groups, is too large for MCMC to be tractable in reasonable time. To recast this model as an LGM, I used the multinomial-Poisson transformation (detailed in Section 5.3.1.2). This modification allowed inference to be performed using the INLA (Håvard Rue, Martino, and Chopin 2009) algorithm via the R-INLA package (Martins et al. 2013). 5.3.1.2 The multinomial-Poisson transformation The multinomial-Poisson transformation (Baker 1994) reframes a given multinomial logistic regression model, like that described in Equations (5.1) and (5.2), as an equivalent Poisson log-linear model. The equivalent model is of the form \\[\\begin{align} y_{itak} &\\sim \\text{Poisson}(\\kappa_{itak}), \\tag{5.3} \\\\ \\log(\\kappa_{itak}) &= \\eta_{itak}. \\tag{5.4} \\end{align}\\] The basis of the transformation is that conditional on their sum Poisson counts are jointly multinomially distributed (McCullagh and Nelder 1989) as follows \\[\\begin{equation} \\mathbf{y}_{ita} \\, | \\, m_{ita} \\sim \\text{Multinomial} \\left( m_{ita}; \\frac{\\kappa_{ita1}}{\\kappa_{ita}}, \\ldots, \\frac{\\kappa_{ita3^{+}}}{\\kappa_{ita}} \\right), \\tag{5.5} \\end{equation}\\] where \\(\\kappa_{ita} = \\sum_{k = 1}^{3^{+}} \\kappa_{itak}\\). The probabilities \\(p_{itak}\\) may then be obtained using the softmax function \\[\\begin{equation} p_{itak} = \\frac{\\exp(\\eta_{itak})}{\\sum_{k = 1}^{3^{+}} \\exp(\\eta_{itak})} = \\frac{\\kappa_{itak}}{\\sum_{k = 1}^{3^{+}} \\kappa_{itak}} = \\frac{\\kappa_{itak}}{\\kappa_{ita}}. \\end{equation}\\] Under the equivalent model, in Equation (5.3) the sample sizes \\(m_{ita}\\) are treated as random rather than fixed such that \\[\\begin{equation} m_{ita} = \\sum_k y_{itak} \\sim \\text{Poisson} \\left( \\sum_k \\kappa_{itak} \\right) = \\text{Poisson} \\left( \\kappa_{ita} \\right). \\tag{5.6} \\end{equation}\\] Using Equations (5.5) for \\(p(\\mathbf{y}_{ita} \\, | \\, m_{ita})\\) and Equation (5.6) for \\(p(m_{ita})\\), the joint distribution is given by \\[\\begin{align} p(\\mathbf{y}_{ita}, m_{ita}) &= \\exp(-\\kappa_{ita}) \\frac{(\\kappa_{ita})^{m_{ita}}}{m_{ita}!} \\times \\frac{m_{ita}!}{\\prod_k y_{itak}!} \\prod_k \\left( \\frac{\\kappa_{itak}}{\\kappa_{ita}} \\right)^{y_{itak}} \\\\ &= \\prod_k \\left( \\frac{\\exp(-\\kappa_{itak}) \\left( \\kappa_{itak} \\right)^{y_{itak}}}{y_{itak}!} \\right) \\\\ &= \\prod_k \\text{Poisson} \\left( y_{itak} \\, | \\, \\kappa_{itak} \\right). \\tag{5.7} \\end{align}\\] As expected, Equation (5.7) corresponds to the product of independent Poisson likelihoods defined in Equation (5.3). This exercise demonstrates that the Poisson log-linear model contains within it a multinomial likelihood, with a Poisson prior on the sample size. For this model to be equivalent to a multinomial logistic regression model, the normalisation constants \\(m_{ita}\\) must be recovered exactly. That is to say, their posterior distributions should be as close as possible to a Dirac delta distribution with value zero everywhere but the known value of the sample size. To ensure that this is the case, observation-specific random effects \\(\\theta_{ita}\\) can be included in the equation for the linear predictor. Multiplying each of \\(\\{\\kappa_{itak}\\}_{k = 1}^{3^+}\\) by \\(\\exp(\\theta_{ita})\\) has no effect on the category probabilities, but does provide the necessary flexibility for \\(\\kappa_{ita}\\) to recover \\(m_{ita}\\) exactly. Although in theory an improper prior distribution \\(\\theta_{ita} \\propto 1\\) should be used, I found that in practice, by keeping \\(\\eta_{ita}\\) otherwise small using appropriate constraints, so that arbitrarily large values of \\(\\theta_{ita}\\) are not required, it is sufficient (and practically preferable for inference) to instead use a vague prior distribution. 5.3.1.3 Model specifications I considered four models (Table 5.2) for \\(\\eta_{ita}\\) in the equivalent Poisson log-linear model of the form \\[\\begin{equation} \\eta_{ita} = \\theta_{ita} + \\beta_k + \\zeta_{c[i]k} + \\alpha_{ac[i]k} + u_{ik} + \\gamma_{tk}. \\end{equation}\\] Observation random effects \\(\\theta_{ita} \\sim \\mathcal{N}(0, 1000^2)\\) with a vague prior distribution were included in all models to ensure the multinomial-Poisson transformation was valid. To capture country-specific proportion estimates for each category, I included category random effects \\(\\beta_k \\sim \\mathcal{N}(0, \\tau_\\beta^{-1})\\) and country-category random effects \\(\\zeta_{ck} \\sim \\mathcal{N}(0, \\tau_\\zeta^{-1})\\). Heterogeneity in risk group proportions by age was allowed by including age-country-category random effects \\(\\alpha_{ack} \\sim \\mathcal{N}(0, \\tau_\\alpha^{-1})\\). Several specifications were considered for the space-category \\(u_{ik}\\) and time-category effects \\(\\gamma_{tk}\\), described in Sections 5.3.1.3.1 and 5.3.1.3.2. Table 5.2: Four multinomial regression models were considered. Observation random effects \\(\\theta_{ita}\\), included in all models, are omitted from this table. Category \\(\\beta_k\\) Country \\(\\zeta_{ck}\\) Age \\(\\alpha_{ack}\\) Spatial \\(u_{ik}\\) Temporal \\(\\gamma_{tk}\\) M1 IID IID IID IID IID M2 IID IID IID Besag IID M3 IID IID IID IID AR1 M4 IID IID IID Besag AR1 Use of the multinomial-Poisson transformation required all random effects to include interaction with category \\(k\\), because any random effects which did not include interaction with category would give no change in category probabilities. The only exception were the observation random effects, which were included as a device to ensure the transformation is valid, rather than to model the data. 5.3.1.3.1 Spatial random effects For the space-category random effects \\(u_{ik}\\) I considered two specifications: Independent and identically distributed (IID) \\(u_{ik} \\sim \\mathcal{N}(0, \\tau_u^{-1})\\), The Besag improper conditional autoregressive (ICAR) model (Besag, York, and Mollié 1991) grouped by category \\[ \\mathbf{u} = (u_{11}, \\ldots, u_{n1}, \\ldots, u_{1{3^{+}}}, \\ldots u_{n3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_u \\mathbf{R}^\\star_u)^{-}). \\] The scaled structure matrix \\(\\mathbf{R}^\\star_u = \\mathbf{R}^\\star_b \\otimes \\mathbf{I}\\) is given by the Kronecker product of the scaled Besag structure matrix \\(\\mathbf{R}^\\star_b\\) and the identity matrix \\(\\mathbf{I}\\), and \\(\\mathbf{A}^{-}\\) denotes the generalised matrix inverse of \\(\\mathbf{A}\\) I followed best practices for the Besag model as described in Chapter 4. To implement the Kronecker product I used the group option in R-INLA [Section 3.5.5; Gómez-Rubio (2020)] setting the random effect to be f(area_idx, model = \"besag\", group = cat_idx, control.group = list(model = \"iid\"), ...). Though the Kronecker product is symmetric, performance is better in R-INLA when the more complicated effect is written as the first variable rather than the grouping variable. In preliminary testing I used the BYM2 model (Simpson et al. 2017) in place of the Besag. I found that the proportion parameter posteriors tended to be highly peaked at the value one. For simplicity and to avoid numerical issues, by using Besag random effects I effectively decided to fix this proportion to one. 5.3.1.3.2 Temporal random effects For the time-category random effects \\(\\gamma_{tk}\\) I considered two specifications: IID \\(\\gamma_{tk} \\sim \\mathcal{N}(0, \\tau_\\gamma^{-1})\\), First order autoregressive (AR1) grouped by category \\[ \\boldsymbol{\\mathbf{\\gamma}} = (\\gamma_{11}, \\ldots, \\gamma_{13^{+}}, \\ldots, \\gamma_{T1}, \\ldots, \\gamma_{T3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_\\gamma \\mathbf{R}^\\star_\\gamma)^{-}). \\] The scaled structure matrix \\(\\mathbf{R}^\\star_\\gamma = \\mathbf{R}^\\star_r \\otimes \\mathbf{I}\\) is given by the Kronecker product of a scaled AR1 structure matrix \\(\\mathbf{R}^\\star_r\\) and the identity matrix \\(\\mathbf{I}\\). The AR1 structure matrix \\(\\mathbf{R}_r\\) is obtained by the precision matrix of the random effects \\(\\mathbf{r} = (r_1, \\ldots, r_T)^\\top\\) specified by \\[\\begin{align} r_1 &\\sim \\left( 0, \\frac{1}{1 - \\rho^2} \\right), \\\\ r_t &= \\rho r_{t - 1} + \\epsilon_t, \\quad t = 2, \\ldots, T, \\end{align}\\] where \\(\\epsilon_t \\sim \\mathcal{N}(0, 1)\\) and \\(|\\rho| < 1\\). As with the structured spatial random effects, I implemented this Kronecker product using the group option via f(year_idx, model = \"ar1\", group = cat_idx, control.group = list(model = \"iid\"), ...). Again, the variable with the more complicated model was written first. 5.3.1.3.3 Note on spatio-temporal interaction random effects I also considered including separable space-time-category random effects \\(\\delta_{itk}\\) in the model, using the specification \\[\\begin{equation} \\boldsymbol{\\mathbf{\\delta}} = (\\delta_{111}, \\ldots, \\delta_{nT3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_\\delta \\mathbf{R}^\\star_\\delta)^{-}), \\end{equation}\\] where \\(\\mathbf{R}^\\star_\\delta\\) is a Kronecker product of the relevant space, time and category structure matrices. These specifications were: IID spatial and IID temporal (Type I) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{I} \\otimes \\mathbf{I} \\otimes \\mathbf{I}\\), Besag spatial and IID temporal (Type II) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{R}^\\star_b \\otimes \\mathbf{I} \\otimes \\mathbf{I}\\), IID spatial and AR1 temporal (Type III) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{I} \\otimes \\mathbf{R}^\\star_a \\otimes \\mathbf{I}\\), Besag spatial and AR1 (Type IV) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{R}^\\star_b \\otimes \\mathbf{R}^\\star_a \\otimes \\mathbf{I}\\), where the first, second and third elements of the Kronecker product represent space, time and category (always IID) structure matrices respectively. The interaction type in brackets (e.g. Type I) is given according to the Knorr-Held (2000) framework. Though three-way Kronecker products are not directly supported in R-INLA, I implemented each specification using a combination of the group and replicate options [Section 6.5.2; Gómez-Rubio (2020)]. For example, for the Type IV effects the random effects were specified by f(area_idx_copy, model = \"besag\", group = year_idx, replicate = cat_idx, control.group = list(model = \"ar1\")). I was able to run these models for single countries, keeping only years at which surveys occurred in those countries. However, when fitting all countries jointly I found inclusion of the space-time-category random effects to be intractable, and as such decided not to include them in the model. 5.3.1.3.4 Prior distributions All random effect precision parameters \\[\\begin{equation} \\tau \\in \\{\\tau_\\beta, \\tau_\\zeta, \\tau_\\alpha, \\tau_u, \\tau_\\gamma, \\tau_\\delta\\} \\end{equation}\\] were given independent penalised complexity (PC) prior distributions (Simpson et al. 2017) with base model \\(\\sigma = 0\\) given by \\[\\begin{equation} p(\\tau) = 0.5 \\nu \\tau^{-3/2} \\exp \\left( - \\nu \\tau^{-1/2} \\right), \\end{equation}\\] where \\(\\nu = - \\ln(0.01) / 2.5\\) such that \\(\\mathbb{P}(\\sigma > 2.5) = 0.01\\). For the lag-one correlation parameter \\(\\rho\\), I used the PC prior distribution, as derived by Sørbye and Rue (2017), with base model \\(\\rho = 1\\) and condition \\(\\mathbb{P}(\\rho > 0 = 0.75)\\). I chose the base model \\(\\rho = 1\\) corresponding to no change in behaviour over time, rather than the alternative \\(\\rho = 0\\) corresponding to no correlation in behaviour over time, as I judged the former to be more plausible a priori. 5.3.1.4 Identifiability constraints To facilitate interpretability of the posterior inferences, I applied sum-to-zero constraints (Table 5.3) such that none of the category interaction random effects altered overall category probabilities. In testing of the space-time-category random effects, I applied analogous sum-to-zero constraints to maintain roles of the space-category and time-category random effects. In some cases it was not possible to implement all three sets of constraints for the three-way interactions in R-INLA. Table 5.3: Applying sum-to-zero constraints to interaction effects ensured that the main effect was not interfered with. Random effects Constraints Category \\(\\sum_k \\beta_k = 0\\) Country \\(\\sum_c \\zeta_{ck} = 0, \\, \\forall \\, k\\) Age-country \\(\\sum_a \\alpha_{ack} = 0, \\, \\forall \\, c, k\\) Spatial \\(\\sum_i u_{ik} = 0, \\, \\forall \\, k\\) Temporal \\(\\sum_t \\gamma_{tk} = 0, \\, \\forall \\, k\\) Spatio-temporal \\(\\sum_i \\delta_{itk} = 0, \\, \\forall \\, t, k; \\sum_t \\delta_{itk} = 0, \\, \\forall \\, i, k; \\sum_k \\delta_{itk} = 0, \\, \\forall \\, i, t\\) 5.3.1.5 Survey weighted likelihood I accounted for the survey design using a weighted pseudo-likelihood where the observed counts \\(y\\) are replaced by effective counts \\(y^\\star\\), as described in Section 3.5. These counts may not be integers, and as such the Poisson likelihood given in Equation (5.3) is not appropriate. Instead, I used a generalised Poisson pseudo-likelihood \\(y^\\star \\sim \\text{xPoisson}(\\kappa)\\) given by \\[\\begin{equation} p(y^\\star) = \\frac{\\kappa^{y^\\star}}{\\left \\lfloor{y^\\star!}\\right \\rfloor } \\exp \\left(- \\kappa \\right), \\end{equation}\\] to extend the Poisson distribution to non-integer weighted counts. This working likelihood is implemented by family = \"xPoisson\" in R-INLA. 5.3.1.6 Model selection I selected the model including Besag spatial random effects and IID temporal random effects based on the conditional predictive ordinate (CPO) criterion (Pettit 1990). For comparison, I also computed the deviance information criterion (DIC) (D. J. Spiegelhalter et al. 2002) and widely applicable information criterion (WAIC) (Watanabe 2013). Each of these criterion can be calculated in R-INLA without requiring model refitting. The results are presented in Table 5.4 and Figure 5.4. Figure 5.4: For the multinomial logistic regression model, under the conditional predictive ordinate (CPO) criterion, including Besag spatial random effects rather than IID spatial random effects improved model performance. On the other hand, under the deviance information criterion (DIC) and widely applicable information criterion (WAIC), where smaller values are preferred, the opposite was true. The relatively poor DIC and WAIC performance of Besag random effects was due to outlying values of these criteria for three of four surveys in Tanzania, and as such may be erroneous. Though IID temporal random effects are preferred by all criteria, AR1 temporal random effects performed very similarly, likely as there is a limited amount of temporal variation in the data to describe. Table 5.4: Conditional predictive ordinate (CPO), deviance information criterion (DIC), and widely applicable information criterion (WAIC) values for the multinomial logistic regression model specifications with corresponding standard errors. M1 M2 M3 M4 CPO 5573 (36) 5772 (36) 5574 (36) 5771 (36) DIC 100780 (300) 101588 (317) 100781 (300) 101589 (317) WAIC 103763 (358) 105008 (383) 103763 (358) 105009 (383) 5.3.2 Spatial logistic regression To estimate the proportion of those in the \\(k = 3^{+}\\) risk group that were in the \\(k = 3\\) and \\(k = 4\\) risk groups respectively, I fit logistic regression models of the form \\[\\begin{align} y_{ia4} &\\sim \\text{Binomial} \\left( y_{ia3} + y_{ia4}, q_{ia} \\right), \\tag{5.8} \\\\ q_{ia} &= \\text{logit}^{-1} \\left( \\eta_{ia} \\right), \\end{align}\\] where \\[\\begin{equation} q_{ia} = \\frac{p_{ia4}}{p_{ia3} + p_{ia4}} = \\frac{p_{ia4}}{p_{ia{3^+}}}. \\end{equation}\\] This two-step approach allowed all surveys to be included in the multinomial regression model, but only those surveys with a specific transactional sex question to be included in the logistic regression model. As all such surveys occurred in the years 2013-2018 (Figure 5.2) I assumed no dependence on time, hence omission of the index \\(t\\). Model specification for the linear predictor \\(\\eta_{ia}\\) is discussed in Section 5.3.2.1 to follow. 5.3.2.1 Model specifications Table 5.5: Six logistic regression models were considered. The covariate cfswever denotes the proportion of men who have ever paid for sex and cfswrecent denotes the proportion of men who have paid for sex in the past 12 months. Intercept \\(\\beta_0\\) Country \\(\\zeta_{c}\\) Age \\(\\alpha_{ac}\\) Spatial \\(u_{i}\\) Covariates L1 Constant IID IID IID None L2 Constant IID IID Besag None L3 Constant IID IID IID cfswever L4 Constant IID IID Besag cfswever L5 Constant IID IID IID cfswrecent L6 Constant IID IID Besag cfswrecent I considered six logistic regression models (Table 5.5). Each included a constant intercept \\(\\beta_0 \\sim \\mathcal{N}(-2, 1^2)\\), country random effects \\(\\zeta_{c} \\sim \\mathcal{N}(0, \\tau_\\zeta^{-1})\\), and age-country random effects \\(\\alpha_{ac} \\sim \\mathcal{N}(0, \\tau_\\alpha^{-1})\\). The Gaussian prior distribution on \\(\\beta_0\\) placed 95% prior probability on the range 2-50% for the percentage of those with non-regular or multiple partners who report transactional sex. I considered two specifications (IID, Besag) for the spatial random effects \\(u_i\\). To aid estimation with sparse data, I also considered national-level covariates for the proportion of men who have paid for sex ever or in the last twelve months (Hodgins et al. 2022). For both random effect precision parameters \\(\\tau \\in \\{\\tau_\\alpha, \\tau_\\zeta\\}\\) I used the PC prior distribution with base model \\(\\sigma = 0\\) and \\(\\mathbb{P}(\\sigma > 2.5 = 0.01)\\). For both regression parameters \\(\\beta \\in \\{\\beta_\\texttt{cfswever}, \\beta_\\texttt{cfswrecent}\\}\\) I used the prior distribution \\(\\beta \\sim \\mathcal{N}(0, 2.5^2)\\). 5.3.2.2 Survey weighted likelihood As with the multinomial regression model, I used survey weighted counts \\(y^\\star\\) and sample sizes \\(m^\\star\\). I used a generalised binomial pseudo-likelihood \\(y^\\star \\sim \\text{xBinomial}(m^\\star, q)\\) given by \\[\\begin{equation} p(y^\\star \\, | \\, m^\\star, q) = \\binom{\\lfloor m^\\star \\rfloor}{\\lfloor y^\\star \\rfloor} q^{y^\\star} (1 - q)^{m^\\star - y^\\star} \\end{equation}\\] to extend the binomial distribution to non-integer weighted counts and sample sizes. This working likelihood is implemented by family = \"xBinomial\" in R-INLA. 5.3.2.3 Model selection I selected the model including Besag spatial effects and cfswrecent covariates according to the CPO criterion. All results, including DIC and WAIC, are presented in Table 5.6 and Figure 5.5. Inclusion of Besag spatial random effects, rather than IID, consistently improved performance. Benefits from inclusion of covariates were more marginal. As some countries had no suitable surveys, I nonetheless preferred to include covariate information so that estimates in these countries would be based on some country-specific data. Figure 5.5: For the logistic regression model, the CPO, DIC, and WAIC each agreed that the model containing Besag spatial random effects and the cfswrecent covariates was best. Inclusion of Besag spatial random effects consistently improved each criterion, whereas improvements from inclusion of any covariates were marginal. Table 5.6: CPO, DIC, and WAIC values for the logistic regression model specifications with corresponding standard errors. L1 L2 L3 L4 L5 L6 CPO 950 (15) 969 (15) 951 (15) 970 (15) 950 (15) 970 (15) DIC 4662 (110) 4605 (111) 4662 (110) 4605 (111) 4662 (110) 4605 (111) WAIC 4692 (115) 4624 (115) 4692 (115) 4624 (115) 4692 (115) 4624 (115) 5.3.3 Female sex worker population size adjustment Domain experts do not consider having had sex “in return for gifts, cash or anything else in the past 12 months” sufficient to constitute sex work. For this reason, I adjusted the estimates obtained based on the transactional sex survey question to match alternatively obtained age-country FSW population size estimates. Taking this approach retained subnational variation informed by the transactional sex survey question. I used the estimates of adult (15-49) FSW population size by country from a Bayesian meta-analysis of key population specific data sources (Stevens et al. 2023). To disaggregate these estimates by age, I took the following steps. First, I calculated the total sexually debuted population in each age group, by country. To describe the distribution of age at first sex, I used skew logistic distributions (Nguyen and Eaton 2022) with cumulative distribution function given by \\[\\begin{equation} F(x) = \\left(1 + \\exp(\\kappa_c (\\mu_c - x)) \\right)^{- \\gamma_c}, \\end{equation}\\] where \\(\\kappa_c, \\mu_c, \\gamma_c > 0\\) are country-specific shape, shape and skewness parameters respectively. Next, I used the assumed \\(\\text{Gamma}(\\alpha = 10.4, \\beta = 0.36)\\) FSW age distribution in South Africa from the Thembisa model (L. Johnson and Dorrington 2020) to calculate the implied ratio between the number of FSW and the sexually debuted population in each age group. I assumed the South African ratios were applicable to every country, allowing calculation of the number of FSW by age group in all 13 countries. The resulting age trends obtained (Figure 5.6) reflect country-level variation in demographics and age-at-first-sex. Altering the FSW population size estimates requires that other risk group population size estimates are also altered such that the corresponding risk group proportion estimates sum to one. Here, estimates of the non-regular or multiple sexual partner(s) population size were altered to facilitate changing of the FSW population size. Figure 5.6: The disaggregation procedure I used produces an age distribution for FSW peaking in the 20-24 and 25-29 age groups, and declining for older age groups. 5.4 Prevalence and incidence by risk group Using the most recent risk group proportion estimates, I calculated the following indicators, stratified by district, age group and risk group: HIV prevalence \\(\\rho_{iak}\\), the number of people living with HIV (PLHIV) \\(H_{iak}\\), HIV incidence \\(\\lambda_{iak}\\), and the number of new HIV infections \\(I_{iak}\\). To do so, I disaggregated district, age group specific Naomi estimates by risk group. 5.4.1 Disaggregation of Naomi prevalence estimates To disaggregate HIV prevalence, I began by estimating HIV prevalence log odds ratios \\(\\log(\\text{OR}_k)\\) relative to the general population. To do so, I began by calculating age, country, and risk group specific (as well as general population) HIV prevalence \\(\\rho_{cak}\\) using bio-marker survey data from all 46 surveys included in the risk group model (Section 5.2.1). I then fit a logistic regression model, with indicator functions for each risk group, and an indicator for being in the general population. The fitted regression coefficients in this model \\(\\beta_k\\) correspond to log odds \\(\\log \\rho_k - \\log(1 - \\rho_k)\\). The required log odds ratios may then be easily obtained by taking the difference in odds ratios. To allow the log odds ratio for the highest risk group to vary based on general population prevalence I fit a linear regression of the FSW log odds against the general population log odds. I ensured that log odds ratios for the FSW risk group were at least as large as those for the multiple or non-regular partner(s) risk group. Given the fitted log odds ratios, I disaggregated Naomi estimates of PLHIV \\(H_{ia}\\) on the logit scale using numerical optimisation. To do so, I found the values of \\(\\theta_{ia}\\) which minimised the equation \\[\\begin{equation} f(\\theta_{ia}) = \\sum_{k = 1}^4 \\left( \\text{logistic}(\\theta_{ia} + \\log(\\text{OR}_k)) \\cdot N_{iak} \\right) - H_{ia}, \\end{equation}\\] where \\(\\text{logistic}(x) = \\exp(x) / (1 + \\exp(x))\\) such that \\(\\text{logistic}(\\hat \\theta_{ia} + \\log(\\text{OR}_k)) = \\rho_{iak}\\). These values were given by \\[\\begin{equation} \\hat \\theta_{ia} = \\arg\\min_{\\theta_{ia} \\in [-10, 10]} f(\\theta_{ia})^2. \\end{equation}\\] The number of PLHIV were obtained by \\(H_{iak} = \\rho_{iak} N_{iak}\\), where \\(N_{iak}\\) is the risk group population size. 5.4.2 Disaggregation of Naomi incidence estimates I used linear disaggregation to calculate the number of new HIV infections by risk group \\[\\begin{align} I_{ia} &= \\sum_k I_{iak} = \\sum_k \\lambda_{iak} (1 - \\rho_{iak}) N_{iak} \\\\ &= 0 + \\lambda_{ia2} (1 - \\rho_{ia2}) N_{ia2} + \\lambda_{ia3} (1 - \\rho_{ia3}) {ia3} + \\lambda_{ia4} (1 - \\rho_{ia4}) N_{ia4} \\\\ &= \\lambda_{ia2} \\left((1 - \\rho_{ia2}) N_{ia2} + \\text{RR}_{3} (1 - \\rho_{ia3}) N_{ia3} + \\text{RR}_4(\\lambda_{ia}) (1 - \\rho_{i4}) N_{ia4} \\right), \\end{align}\\] where \\(\\text{RR}_{2}\\), \\(\\text{RR}_{3}\\) and \\(\\text{RR}_{4}(\\cdot)\\) are the HIV risk ratios given in Table 5.1, and \\((1 - \\rho_{iak}) N_{iak}\\) are the susceptible population sizes in each risk group. The risk ratio for FSW was defined as a function of district-level incidence in the general population \\(\\lambda_{ia}\\). Risk group specific HIV incidence estimates were then given by \\[\\begin{align} \\lambda_{ia1} &= 0, \\\\ \\lambda_{ia2} &= \\frac{I_{ia}}{(1 - \\rho_{ia2}) N_{ia2} + \\text{RR}_{3} (1 - \\rho_{ia3}) N_{ia3} + \\text{RR}_4(\\lambda_{ia}) (1 - \\rho_{ia4}) N_{ia4}}, \\\\ \\lambda_{ia3} &= \\text{RR}_{3} \\lambda_{ia2}, \\\\ \\lambda_{ia4} &= \\text{RR}_4(\\lambda_{ia}) \\lambda_{ia2}. \\end{align}\\] These equations were evaluated using Naomi model estimates of the number of new HIV infections \\(I_{ia} = \\lambda_{ia} N_{ia}\\). The number of new HIV infections were \\(I_{iak} = \\lambda_{iak} N_{iak}\\). 5.4.3 Expected new infections reached To quantify the number of new infections that could be reached prioritising according to each possible stratification of the population, I took the following approach, which I illustrate for stratification by age. First, I aggregated the number of new HIV infections and HIV incidence (calculated above in Section 5.4.2) such that \\[\\begin{align} I_a &= \\sum_{ik} I_{iak}, \\\\ \\lambda_a &= I_a / \\sum_{ik} (1 - \\rho_{iak}) N_{iak}. \\end{align}\\] I then considered prioritisation individuals by age group \\(a\\) according to the highest HIV incidence \\(\\lambda_a\\). By cumulatively summing the expected infections, for each fraction of the total population reached (0-100%) I calculated the fraction of total expected new infections that would be reached. In this instance, as there are three age groups, the resulting function was piecewise linear with three segments. This analysis was repeated for all \\(2^3 = 8\\) possible combinations of stratification by location, age, and risk group. 5.5 Results 5.5.1 Model for risk group proportions 5.5.1.1 Estimates Figure 5.7: The posterior mean of the AGYW risk group proportions over space in 2018. Estimates are stratified by risk group (columns) and five-year age group (rows). Countries in grey were not included in the analysis. A limitation of this figure is that using a common colour scale, though desirable for other reasons, makes it challenging to see spatial variation in the FSW risk group. Figure 5.8: National (in white) and subnational (in color) posterior means of the risk group proportions. Estimates are stratified by risk group (columns) and five-year age group (rows). Though the information presented is similar to that of Figure 5.7, this figure presents a clear view of within- and between-country variation in risk group proportions. Figure B.1 and Figure 5.8 show posterior mean estimates for the proportion in each risk group for the final model in 2018, the most recent year included in our analysis. I focused on the most recent estimates because they are the most relevant to inform ongoing HIV policy. In subsequent results, all estimates refer to 2018, unless otherwise indicated. The median national FSW proportion was 1.1% (95% CI 0.4–1.9) for the 15-19 age group, 1.6% (95% CI 0.6–2.8) for the 20-24 age group and 1.9% (95% CI 0.5–3.5) for the 25-29 age group, in line with the results displayed in Figure 5.6. In the 20-24 and 25-29 year age groups, the majority of women were either cohabiting or had non-regular or multiple partner(s). Countries in eastern and central Africa (Cameroon, Kenya, Malawi, Mozambique, Tanzania, Uganda, Zambia and Zimbabwe) had a higher proportion of women in these age groups cohabiting (63.1% [95% CI 35–78.7%] compared with 21.3% [95% CI 10.1–48.8%] with non-regular partner[s]). In contrast, countries in southern Africa (Botswana, Eswatini, Lesotho, Namibia and South Africa) had a higher proportion with non-regular or multiple partner(s) (58.9% [95% CI 43.2–70.5%], compared with 23.4% [95% CI 9.7–39.1%] cohabiting). This finding is the most notable feature of between-country variation shown in Figure 5.8. Figure 5.7 shows the geographic delineation to pass along the border of Mozambique, through the interior of Zimbabwe and along the border of Zambia. The bimodality of the 20-24 and 25-29 year age groups is shown in Figure B.2. In the median district, 57.9% of adolescent girls 15-19 were not sexually active (95% credible interval [CI] at the district-level 27.7–79.7). The country of Mozambique was an exception, where the majority of adolescent girls 15-19 (64.23%) were sexually active in the past year and close to a third (34.17%) were cohabiting with a partner. 5.5.1.2 Coverage assessment Figure 5.9: Probability integral transform (PIT) histograms (top row) and empirical cumulative distribution function (ECDF) difference plots (bottom row) for the final selected model. To assess the calibration of the fitted model, I calculated the quantile \\(q\\) of each observation within the posterior predictive distribution. For calibrated models, these quantiles, known as probability integral transform (PIT) values (Dawid 1984; Nikos I. Bosse et al. 2022), should follow a uniform distribution \\(q \\sim \\mathcal{U}[0, 1]\\). To generate samples from the posterior predictive distribution, I applied the multinomial likelihood to samples from the latent field, setting the sample size to be the floor of the Kish effective sample size. Using the PIT values, it is possible to calculate the empirical coverage of all \\((1 - \\alpha)100\\)% equal-tailed posterior predictive credible intervals. These empirical coverages can be compared to the nominal coverage \\((1 - \\alpha)\\) for each value of \\(\\alpha \\in [0, 1]\\) to give empirical cumulative distribution function (ECDF) difference values. This approach has the advantage of considering all possible confidence values at once. To test for uniformity, I used the binomial distribution based simultaneous confidence bands for ECDF difference values developed by Säilynoja, Bürkner, and Vehtari (2022). I found the only significant deviation from uniformity occurred in the right-hand tail of the one cohabiting partner risk group. That is to say, the proportion of the PIT values which were greater than 0.95 was significantly more than would be expected under a calibrated model. 5.5.1.3 Variance decomposition Age group was the most important factor explaining variation in risk group proportions, accounting for 65.9% (95% CI 54.1–74.9%) of total variation. The primary change in risk group proportions by age group occurs between the 15-19 age group and 20-29 age group (Figure 5.7). The next most important factor was location. Country-level differences explained 20.9% (95% CI 11.9–34.5%) of variation, while district-level variation within countries explained 11.3% (95% CI 8.2–15.3%). Temporal changes only explained 0.9% (95% CI 0.6–1.4%) of variation, indicating very little change in risk group proportions over time. I found similar variance decomposition results fitting each country individually (Figure B.1) and using other model specifications. 5.5.2 Prevalence and incidence by risk group Figure 5.10: Percentage of new infections reached across all 13 countries, taking a variety of risk stratification approaches, against the percentage of at risk population required to be reached. For any given fraction of AGYW prioritised, substantially more new infections were reached by strategies that included behavioural risk stratification. Reaching half of all expected new infections required reaching 19.4% of the population when stratifying by subnational area and age, but only 10.6% when behavioural stratification was included (Figure 5.10). The majority of this benefit came from reaching FSW, who were 1.3% of the population but 10.6% of all new infections. Considering each country separately, on average, reaching half of new infections in each country required reaching 14.6% (range 8.7-21.8%) of the population when stratifying by area and age, reducing to 5.1% (range 2.1-13.2%) when behaviour was included. The relative importance of stratifying by age, location and behaviour varied between countries, analogous to the varying contribution of each to the total variance (Section 5.5.1.3). 5.6 Discussion In this chapter, I estimated the proportion of AGYW who fall into different risk groups at a district level in 13 sub-Saharan African countries. These estimates support consideration of differentiated prevention programming according to geographic locations and risk behaviour, as outlined in the Global AIDS Strategy. Systematic differences in risk by age groups, and variation within and between countries, explained the large majority of variation in risk group proportions. Changes over time were negligible in the overall variation in risk group proportions. The proportion of 15-19 year olds who are sexually active, and among women aged 20-29 years, norms around cohabitation especially varied across districts and countries. This variation underscores the need for these granular data to implement HIV prevention options aligned to local norms and risk behaviours. I considered four risk groups based on sexual behaviour, the most proximal determinant of risk. Other factors, such as condom usage or type of sexual act, may account for additional heterogeneity in risk from sexual behaviour. However, I did not include these factors in view of measurement difficulties, concerns about consistency across contexts, and the operational benefits of describing risk parsimoniously. Sexual behaviour confers risk only when AGYW reside in geographic locations where there is unsuppressed viral load among their potential partners. I did not include more distal determinants, such as school attendance, orphanhood, or gender empowerment, as I expect their effects on risk to largely be mediated by more proximal determinants. However, to effectively implement programming, it is crucial to understand these factors, as well as the broader structural barriers and limits to personal agency faced by AGYW. Importantly, programs must ensure that intervention prioritisation occurs without stigmatising or blaming AGYW. By considering a range of possible risk stratification strategies, I showed that successful implementation of a risk-stratified approach would allow substantially more of those at risk for infections to be identified before infection occurs. A considerable proportion of estimated new infections were among FSW, supporting the case for HIV programming efforts focused on key population groups (Baral et al. 2012). There is substantial variation in the importance of prioritisation by age, location and behaviour within each country. This highlights the importance of understanding and tailoring HIV prevention efforts to country-specific contexts. By standardising the analysis across all 13 countries, I showed the additional efficiency benefits of resource allocation between countries. I found a geographic delineation in the proportion of women cohabiting between southern and eastern Africa, calling attention to a divide attributable to many cultural, social, and economic factors. The delineation does not represent a boundary between predominately Christian and Muslim populations, which is further north. I also note that the high numbers of adolescent girls aged 15-19 cohabiting in Mozambique is markedly different from the other countries (UNICEF 2019). Brugh et al. (2021) previously geographically mapped AGYW HIV risk groups using biomarker and behavioural data from the most recent surveys in Eswatini, Haiti and Mozambique to define and subsequently map risk groups with a range of machine learning techniques. My work builds on Brugh et al. (2021) by including more countries, integrating a greater number of surveys, and connecting risk group proportions with HIV epidemic indicators to help inform programming. My modelled estimates of risk group proportions improve upon direct survey results for three reasons. First, by taking a modular modelling approach, I integrated all relevant survey information from multiple years, allowing estimation of the FSW proportion for surveys without a specific transactional sex question. Second, whereas direct estimates exhibit large sampling variability at a district level, I alleviated this issue using spatio-temporal smoothing (Figure 5.11). Third, I provided estimates in all district-years, including those not directly sampled by surveys, allowing estimates to be consistently fed into further analysis and planning pipelines such as my analysis of risk group specific prevalence and incidence. Figure 5.11: The modelled estimates display more plausible spatial smoothness than the direct estimates. In addition, missing values in the direct estimates are appropriately infilled by the model. The final surveys included in the risk model model were conducted in 2018. The analysis may be updated with more surveys as they become available. I do not anticipate that the risk group proportions will change substantially, as I found that they did not change significantly over time. My analysis focused on females aged 15-29 years, and could be extended to consider optimisation of prevention more broadly, accounting for new infections among adults 15-49 which occur in women 30-49 and men 15-49. Estimating sexual risk behaviour in adults 15-49 would be a crucial step toward greater understanding of the dynamics of the HIV epidemic in sub-Saharan Africa, and would allow incidence models to include stratification of individuals by sexual risk. 5.6.1 Limitations This analysis was subject to challenges shared by most approaches to monitoring sexual behaviour in the general population (Cleland et al. 2004). In particular, under-reporting of higher risk sexual behaviours among AGYW could affect the validity of my risk group proportion estimates. Due to social stigma or disapproval, respondents may be reluctant to report non-marital partners (Nnko et al. 2004; Helleringer et al. 2011) or may bias their reporting of sexual debut (Zaba et al. 2004; Wringe et al. 2009; Nguyen and Eaton 2022). For guidance of resource allocation, differing rates of under-reporting by country, district, year or age group are particularly concerning to the applicability of my results; and, while it may be reasonable to assume a constant rate over space-time, the same cannot be said for age, where aspects of under-reporting have been shown to decline as respondents age (Glynn et al. 2011), suggesting that the elevated risks I found faced by younger women are likely a conservative estimate. If present, these reporting biases will also have distorted the estimates of infection risk ratios and prevalence ratios I used in my analysis, likely over-attributing risk to higher risk groups. I have the least confidence in my estimates for the FSW risk group. As well as having the smallest sample sizes, my transactional sex estimates do not overcome the difficulties of sampling hard to reach groups. I inherent any limitations of the national FSW estimates (Stevens et al. 2023) which I adjust my estimates of transactional sex to match. Furthermore, I do not consider seasonal migration patterns, which may particularly affect FSW population size. More generally, I did not consider covariates potentially predictive of risk group proportions (such as sociodemographic characteristics, education, local economic activity, cultural and religious norms and attitudes), which are typically difficult to measure spatially. Identifying measurable correlates of risk, or particular settings in which time-concentrated HIV risk occurs, is an important area for further research to improve risk prioritisation and precision HIV programme delivery. The efficiency of each stratified prevention strategy depends on the ability of programmes to identify and effectively reach those in each strata. My analysis of new infections potentially averted assumed a “best-case” scenario where AGYW of every strata can be reached perfectly, and should therefore be interpreted as illustrating the potentially obtainable benefits rather than benefits which would be obtained from any specific intervention strategy. In practice, stratified prevention strategies are likely to be substantially less efficient than this best-case scenario. Factors I did not consider include the greater administrative burden of more complex strategies, variation in difficulty or feasibility of reaching individuals in each strata, variation in the range or effectiveness of interventions by strata, and changes in strata membership that may occur during the course of a year. Identifying and reaching behavioural strata may be particularly challenging. Empirical evaluations of behavioural risk screening tools have found only moderate discriminatory ability (Jia et al. 2022), and risk behaviour may change rapidly among young populations, increasing the challenge to effectively deliver appropriately timed prevention packages. This consideration may motivate selecting risk groups based on easily observable attributes, such as attendance of a particular service or facility, rather than sexual behaviour. In conducting this work, there was insufficient engagement with country experts or civil society organisations. As a result, in early use of the risk group tool the FSW population size estimates were met with some disagreement in Malawi. In that instance, the cause of the disagreement was external model inputs used. In future, estimates should be generated and reviewed by country teams. 5.6.2 Conclusion I estimated HIV risk group proportions, HIV prevalences and HIV incidences for AGYW aged 15-19, 20-24 and 25-29 years at a district-level in 13 priority countries. Using these estimates, I analysed the number of infections that could be reached by prioritisation based upon location, age and behaviour. Though subject to limitations, these estimates provide data that national HIV programmes can use to set targets and implement differentiated HIV prevention strategies as outlined in the Global AIDS Strategy. Successfully implementing this approach would result in more efficiently reaching a greater number of those at risk of infection. Among AGYW, there was systematic variation in sexual behaviour by age and location, but not over time. Age group variation was primarily attributable to age of sexual debut (ages 15-24). Spatial variation was particularly present between those who reported one cohabiting partner versus non-regular or multiple partners. Risk group proportions did not change substantially over time, indicating that norms relating to sexual behaviour are relatively static. These findings underscore the importance of providing effective HIV prevention options tailored to the needs of particular age groups, as well as local norms around sexual partnerships. References Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Baral, Stefan, Chris Beyrer, Kathryn Muessig, Tonia Poteat, Andrea L Wirtz, Michele R Decker, Susan G Sherman, and Deanna Kerrigan. 2012. “Burden of HIV among female sex workers in low-income and middle-income countries: a systematic review and meta-analysis.” The Lancet Infectious Diseases 12 (7): 538–49. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Bosse, Nikos I., Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. 2022. “Evaluating Forecasts with scoringutils in R.” arXiv. https://arxiv.org/abs/2205.07090. Brugh, Kristen N, Quinn Lewis, Cameron Haddad, Jon Kumaresan, Timothy Essam, and Michelle S Li. 2021. “Characterizing and mapping the spatial variability of HIV risk among adolescent girls and young women: A cross-county analysis of population-based surveys in Eswatini, Haiti, and Mozambique.” PLOS One 16 (12): e0261520. Cleland, John, J Ties Boerma, Michel Caraël, and Sharon S Weir. 2004. “Monitoring sexual behaviour in general populations: a synthesis of lessons of the past decade.” Sexually Transmitted Infections 80 (suppl 2): ii1–7. Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Glynn, Judith R, Ndoliwe Kayuni, Emmanuel Banda, Fiona Parrott, Sian Floyd, Monica Francis-Chizororo, Misheck Nkhata, et al. 2011. “Assessing the validity of sexual behaviour reports in a whole population survey in rural Malawi.” PLOS One 6 (7): e22840. Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Helleringer, Stéphane, Hans-Peter Kohler, Linda Kalilani-Phiri, James Mkandawire, and Benjamin Armbruster. 2011. “The reliability of sexual partnership histories: implications for the measurement of partnership concurrency during surveys.” AIDS (London, England) 25 (4): 503. Hodgins, Caroline, James Stannah, Salome Kuchukhidze, Lycias Zembe, Jeffrey W Eaton, Marie-Claude Boily, and Mathieu Maheu-Giroux. 2022. “Population sizes, HIV prevalence, and HIV prevention among men who paid for sex in sub-Saharan Africa (2000–2020): A meta-analysis of 87 population-based surveys.” PLOS Medicine 19 (1): e1003861. ———. 2023b. multi.utils: Utility functions for multi-agyw. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Johnson, L, and RE Dorrington. 2020. “Thembisa version 4.3: A model for evaluating the impact of HIV/AIDS in South Africa.” View Article. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. McCullagh, Peter, and John A Nelder. 1989. Generalized linear models. Routledge. Nguyen, Van Kính, and Jeffrey W. Eaton. 2022. “Trends and country-level variation in age at first sex in sub-Saharan Africa among birth cohorts entering adulthood between 1985 and 2020.” BMC Public Health 22 (1): 1120. https://doi.org/10.1186/s12889-022-13451-y. Nnko, Soori, J Ties Boerma, Mark Urassa, Gabriel Mwaluko, and Basia Zaba. 2004. “Secretive females or swaggering males?: An assessment of the quality of sexual partnership reporting in rural Tanzania.” Social Science & Medicine 59 (2): 299–310. Pettit, LI. 1990. “The conditional predictive ordinate for the normal distribution.” Journal of the Royal Statistical Society: Series B (Methodological) 52 (1): 175–84. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saul, Janet, Gretchen Bachman, Shannon Allen, Nora F Toiv, Caroline Cooney, and Ta’Adhmeeka Beamon. 2018. “The DREAMS core package of interventions: a comprehensive approach to preventing HIV among adolescent girls and young women.” PLOS One 13 (12): e0208167. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Slaymaker, Emma, Kathryn A. Risher, Ramadhani Abdul, Milly Marston, Keith Tomlin, Robert Newton, Anthony Ndyanabo, et al. 2020. “Risk factors for new HIV infections in the general population in sub-Saharan Africa.” ———. 2017. “Penalised complexity priors for stationary autoregressive processes.” Journal of Time Series Analysis 38 (6): 923–35. Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. The Global Fund. 2018. The Global Fund Measurement Framework for Adolescent Girls and Young Women Programs. https://www.theglobalfund.org/media/8076/me\\%5Fadolescentsgirlsandyoungwomenprograms\\%5Fframeworkmeasurement\\%5Fen.pdf. UNAIDS. 2021a. “2021 UNAIDS Global AIDS Update - Confronting Inequalities - Lessons for pandemic responses from 40 Years of AIDS.” Geneva, Switzerland. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” UNICEF. 2019. “Adolescent & social norms situation in Mozambique.” https://www.unicef.org/mozambique/en/adolescent-social-norms. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Wringe, A, I Cremin, J Todd, N McGrath, I Kasamba, K Herbst, P Mushore, B Żaba, and E Slaymaker. 2009. “Comparative assessment of the quality of age-at-event reporting in three HIV cohort studies in sub-Saharan Africa.” Sexually Transmitted Infections 85 (Suppl 1): i56–63. Zaba, Basia, Elizabeth Pisani, Emma Slaymaker, and J Ties Boerma. 2004. “Age at first sex: understanding recent trends in African demographic surveys.” Sexually Transmitted Infections 80 (suppl 2): ii28–35. "],["naomi-aghq.html", "6 Fast approximate Bayesian inference 6.1 Inference methods and software 6.2 A universal INLA implementation 6.3 The Naomi model 6.4 AGHQ in moderate dimensions 6.5 Malawi case-study 6.6 Discussion", " 6 Fast approximate Bayesian inference This chapter describes the development of a novel deterministic Bayesian inference approach, motivated by the Naomi small-area estimation model (Eaton et al. 2021). Over 35 countries (UNAIDS 2023b) have used the Naomi model web interface (https://naomi.unaids.org) to produce subnational estimates of HIV indicators. In Naomi, evidence is synthesised from household surveys and routinely collected health data to generate estimates of HIV indicators by district, age, and sex. The complexity and size of the model makes obtaining fast and accurate Bayesian inferences challenging. As such, development of the approach required meeting both methodological challenges and implementation difficulties. The methods in this chapter combine Laplace approximations with adaptive quadrature, and are descended from the integrated nested Laplace approximation (INLA) method pioneered by Håvard Rue, Martino, and Chopin (2009). The INLA method has enabled fast and accurate Bayesian inferences for a vast array of models, across a large number of scientific fields (Håvard Rue et al. 2017). The success of INLA is in large part due to its accessible implementation in the R-INLA software. Use of the INLA method and the R-INLA software are nearly ubiquitous in applied settings. However, the Naomi model is not compatible with R-INLA. The foremost reason is that Naomi is too complex to be expressed using a formula interface of the form y ~ .... Additionally, Naomi has more hyperparameters (moderate-dimensional, >20) than can typically be handled using INLA (low-dimensional, certainly below 10). As a result, inferences for the Naomi model have previously been obtained using an empirical Bayes [EB; Casella (1985)] approximation to full Bayesian inference, with the Laplace approximation implemented by the more flexible Template Model Builder [TMB; Kristensen et al. (2016)] R package. Under the EB approximation, the hyperparameters are fixed by optimising an approximation to the marginal posterior. This is undesirable as fixing the hyperparameters underestimates their uncertainty. Ultimately, the resulting overconfidence may lead to worse HIV prevention policy decisions. Most methodological work relating to INLA has taken place using the R-INLA software package. There are two notable exceptions. First, the simplified INLA approach of Wood (2020), implemented in the mgcv R package, proposed a fast Laplace approximation approach which does not rely on Markov structure of the latent field in the same way as Håvard Rue, Martino, and Chopin (2009). Second, Stringer, Brown, and Stafford (2022) extended the scope and scalability of INLA by avoiding augmenting the latent field with the noisy structured additive predictors. This enables the application of INLA to a wider class of extended latent Gaussian models, which includes Naomi. Van Niekerk et al. (2023) refer to this as the “modern” formulation of the INLA method, as opposed to the “classic” formulation of Håvard Rue, Martino, and Chopin (2009), and it is now included in R-INLA using inla.mode = \"experimental\". Stringer, Brown, and Stafford (2022) also propose use of the adaptive Gauss-Hermite quadrature [AGHQ; Naylor and Smith (1982)] rule to perform integration with respect to the hyperparameters. The methodological contributions of this chapter extend Stringer, Brown, and Stafford (2022) in two directions: First, a universally applicable implementation of INLA with Laplace marginals, where automatic differentiation via TMB is used to obtain the derivatives required for the Laplace approximation. For users of R-INLA, the Stringer, Brown, and Stafford (2022) approach is analogous to method = \"gaussian\", while the approach newly implemented in this chapter is analogous to method = \"laplace\". Section 6.2 demonstrates the implementation using two examples, one compatible with R-INLA and one incompatible. Second, a quadrature rule which combines AGHQ with principal components analysis to enable integration over moderate-dimensional spaces, described in Section 6.4. This quadrature rule is used to perform inference for the Naomi model by integrating the marginal Laplace approximation with respect to the moderate-dimensional hyperparameters within an INLA algorithm implemented in TMB in Section 6.5. This work was conducted in collaboration with Prof. Alex Stringer, whom I visited at the University of Waterloo during the fall term of 2022. Code for the analysis in this chapter is available from https://github.com/athowes/naomi-aghq. 6.1 Inference methods and software This section reviews existing deterministic Bayesian inference methods (Sections 6.1.1, 6.1.2, 6.1.3) and the software implementing them (Section 6.1.4). Recall that inference comprises obtaining the posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\frac{p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})}{p(\\mathbf{y})}, \\tag{6.1} \\end{equation}\\] or some way to compute relevant functions of it. The posterior distribution encapsulates beliefs about the parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\phi_1, \\ldots, \\phi_d)\\) having observed data \\(\\mathbf{y} = (y_1, \\ldots, y_n)\\). Here I assume these quantities are expressible as vectors. Inference is a sensible goal because (under Bayesian decision theory) the posterior distribution is sufficient for use in decision making. More specifically, given a loss function \\(l(a, \\boldsymbol{\\mathbf{\\phi}})\\), the expected posterior loss of a decision \\(a\\) depends on the data only via the posterior distribution \\[\\begin{equation} \\mathbb{E}(l(a, \\boldsymbol{\\mathbf{\\phi}}) \\, | \\, \\mathbf{y}) = \\int_{\\mathbb{R}^d} l(a, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\end{equation}\\] For example, historic data about treatment demand are only required for planning of HIV treatment service provision in so far as they alter the posterior distribution of current demand. The information provided for strategic response to the HIV epidemic may therefore be thought of as functions of some posterior distribution. It is usually intractable to obtain the posterior distribution. This is because the denominator in Equation (6.1) contains a potentially high-dimensional integral over the \\(d \\in \\mathbb{Z}^+\\) -dimensional parameters \\[\\begin{equation} p(\\mathbf{y}) = \\int_{\\mathbb{R}^d} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\tag{6.2} \\end{equation}\\] This quantity is sometimes called the evidence or posterior normalising constant. As a result, approximations to the posterior distribution \\(\\tilde p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\) are typically used in place of the exact posterior distribution. Some approximate Bayesian inference methods, like Markov chain Monte Carlo (MCMC), avoid directly calculating the posterior normalising constant. Instead they find ways to work with the unnormalised posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\propto p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y}), \\end{equation}\\] where \\(p(\\mathbf{y})\\) is not a function of \\(\\boldsymbol{\\mathbf{\\phi}}\\) and so can be removed as a constant. Other approximate Bayesian inference methods can more directly be thought of as ways to estimate the posterior normalising constant (Equation (6.2)). The methods in this chapter fall into this latter category, and are sometimes described as deterministic Bayesian inference methods because they do not make fundamental use of randomness. 6.1.1 The Laplace approximation Laplace’s method (Laplace 1774) is a technique used to approximate integrals of the form \\[\\begin{equation} \\int \\exp(C h(\\mathbf{z})) \\text{d}\\mathbf{z}, \\end{equation}\\] where \\(C > 0\\) is a constant, \\(h\\) is a function which is twice-differentiable, and \\(\\mathbf{z}\\) are generic variables. The Laplace approximation (Tierney and Kadane 1986) is obtained by application of Laplace’s method to calculate the posterior normalising constant (Equation (6.2)). Let \\(h(\\boldsymbol{\\mathbf{\\phi}}) = \\log p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})\\) such that \\[\\begin{equation} p(\\mathbf{y}) = \\int_{\\mathbb{R}^d} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}} = \\int_{\\mathbb{R}^d} \\exp(h(\\boldsymbol{\\mathbf{\\phi}})) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\end{equation}\\] Laplace’s method involves approximating the function \\(h\\) by its second order Taylor expansion. This expansion is then evaluated at a maxima of \\(h\\) to eliminate the first order term. Let \\[\\begin{equation} \\hat{\\boldsymbol{\\mathbf{\\phi}}} = \\arg\\max_{\\boldsymbol{\\mathbf{\\phi}}} h(\\boldsymbol{\\mathbf{\\phi}}) \\tag{6.3} \\end{equation}\\] be the posterior mode, and \\[\\begin{equation} \\hat {\\mathbf{H}} = - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\phi}} \\partial \\boldsymbol{\\mathbf{\\phi}}^\\top} h(\\boldsymbol{\\mathbf{\\phi}}) \\rvert_{\\boldsymbol{\\mathbf{\\phi}} = \\hat{\\boldsymbol{\\mathbf{\\phi}}}} \\tag{6.4} \\end{equation}\\] be the Hessian matrix evaluated at the posterior mode. The Laplace approximation to the posterior normalising constant (Equation (6.2)) is then \\[\\begin{align} \\tilde p_{\\texttt{LA}}(\\mathbf{y}) &= \\int_{\\mathbb{R}^d} \\exp \\left( h(\\hat{\\boldsymbol{\\mathbf{\\phi}}}) - \\frac{1}{2} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}})^\\top \\hat {\\mathbf{H}} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\right) \\text{d}\\boldsymbol{\\mathbf{\\phi}} \\tag{6.5} \\\\ &= p(\\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\mathbf{y}) \\cdot \\frac{(2 \\pi)^{d/2}}{| \\hat {\\mathbf{H}} |^{1/2}}. \\tag{6.6} \\end{align}\\] The result above is calculated using the known normalising constant of the Gaussian distribution \\[\\begin{equation} p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\mathcal{N} \\left( \\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\hat {\\mathbf{H}}^{-1} \\right) = \\frac{| \\hat {\\mathbf{H}} |^{1/2}}{(2 \\pi)^{d/2}} \\exp \\left( - \\frac{1}{2} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}})^\\top \\hat {\\mathbf{H}} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\right). \\end{equation}\\] The Laplace approximation may be thought of as approximating the posterior distribution by a Gaussian distribution \\(p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\approx p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\) such that \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\mathbf{y}) = \\frac{p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})}{p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})} \\Big\\rvert_{\\boldsymbol{\\mathbf{\\phi}} = \\hat{\\boldsymbol{\\mathbf{\\phi}}}}. \\end{equation}\\] Calculation of the Laplace approximation requires obtaining the second derivative of \\(h\\) with respect to \\(\\boldsymbol{\\mathbf{\\phi}}\\) (Equation (6.4)). Derivatives may also be used to improve the performance of the optimisation algorithm used to obtain the maxima of \\(h\\) (Equation (6.3)) by providing access to the gradient of \\(h\\) with respect to \\(\\boldsymbol{\\mathbf{\\phi}}\\). Figure 6.1: Demonstration of the Laplace approximation for the simple Bayesian inference example of Figure 3.1. The unnormalised posterior is \\(p(\\phi, \\mathbf{y}) = \\phi^8 \\exp(-4 \\phi)\\), and can be recognised as the unnormalised gamma distribution \\(\\text{Gamma}(9, 4)\\). The true log normalising constant is \\(\\log p(\\mathbf{y}) = \\log\\Gamma(9) - 9 \\log(4) = -1.872046\\), whereas the Laplace approximate log normalising constant is \\(\\log \\tilde p_{\\texttt{LA}}(\\mathbf{y}) = -1.882458\\), resulting from the Gaussian approximation \\(p_\\texttt{G}(\\phi \\, | \\, \\mathbf{y}) = \\mathcal{N}(\\phi \\, | \\,\\mu = 2, \\tau = 2)\\). 6.1.1.1 The marginal Laplace approximation Approximating the full joint posterior distribution using a Gaussian distribution may be inaccurate. An alternative is to approximate the marginal posterior distribution of some subset of the parameters, referred to as the marginal Laplace approximation. It remains to integrate out the remaining parameters, using another more suitable method. This approach is the basis of the INLA method. Let \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) and consider a three-stage hierarchical model \\[\\begin{equation} p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) = p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] where \\(\\mathbf{x} = (x_1, \\ldots, x_N)\\) is the latent field, and \\(\\boldsymbol{\\mathbf{\\theta}} = (\\theta_1, \\ldots, \\theta_m)\\) are the hyperparameters. Applying a Gaussian approximation to the latent field, we have \\(h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) = \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) with \\(N\\)-dimensional posterior mode \\[\\begin{equation} \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}) = \\arg\\max_{\\mathbf{x}} h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\tag{6.7} \\end{equation}\\] and \\((N \\times N)\\)-dimensional Hessian matrix evaluated at the posterior mode \\[\\begin{equation} \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) = - \\frac{\\partial^2}{\\partial \\mathbf{x} \\partial \\mathbf{x}^\\top} h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\tag{6.8} \\end{equation}\\] Dependence on the hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}}\\) is made explicit in both Equation (6.7) and (6.8) such that there is a Gaussian approximation to the marginal posterior of the latent field \\(\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1})\\) at each value \\(\\boldsymbol{\\mathbf{\\theta}}\\) in the space \\(\\mathbb{R}^m\\). The resulting marginal Laplace approximation, for a particular value of the hyperparameters, is then \\[\\begin{align} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &= \\int_{\\mathbb{R}^N} \\exp \\left( h(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\boldsymbol{\\mathbf{\\theta}}) - \\frac{1}{2} (\\mathbf{x} - \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}))^\\top \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) (\\mathbf{x} - \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})) \\right) \\text{d}\\mathbf{x} \\tag{6.9} \\\\ &= \\exp(h(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\mathbf{y})) \\cdot \\frac{(2 \\pi)^{d/2}}{| \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) |^{1/2}} \\\\ &= \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{align}\\] The marginal Laplace approximation is most accurate when the marginal posterior \\(p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) is accurately approximated by a Gaussian distribution. For the class of latent Gaussian models (Håvard Rue, Martino, and Chopin 2009) the prior distribution on the latent field is Gaussian \\[\\begin{equation} \\mathbf{x} \\sim \\mathcal{N}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\mathbf{0}, \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}})), \\end{equation}\\] with assumed zero mean \\(\\mathbf{0}\\), and precision matrix \\(\\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}})\\). The resulting marginal posterior distribution \\[\\begin{align} p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &\\propto \\mathcal{N}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\\\ &\\propto \\exp \\left( - \\frac{1}{2} \\mathbf{x}^\\top \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}}) \\mathbf{x} + \\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\right) \\end{align}\\] is not exactly Gaussian. However, its deviation can be expected to be small if \\(\\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) is small (Blangiardo et al. 2013). 6.1.2 Quadrature Quadrature is a method used to approximate integrals using a weighted sum of function evaluations. As with the Laplace approximation, it is deterministic in that the computational procedure is not intrinsically random. Let \\(\\mathcal{Q}\\) be a set of quadrature nodes \\(\\mathbf{z} \\in \\mathcal{Q}\\) and \\(\\omega: \\mathbb{R}^d \\to \\mathbb{R}\\) be a weighting function. Then, quadrature can be used to estimate the posterior normalising constant (Equation (6.2)) by \\[\\begin{equation} \\tilde p_{\\mathcal{Q}}(\\mathbf{y}) = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} p(\\mathbf{y}, \\mathbf{z}) \\omega(\\mathbf{z}). \\end{equation}\\] To illustrate quadrature for a simple example, consider integrating the univariate function \\(f(z) = z \\sin(z)\\) between \\(z = 0\\) and \\(z = \\pi\\). This integral can be calculated analytically using integration by parts and evaluates to \\(\\pi\\). A quadrature approximation of this integral is \\[\\begin{equation} \\pi = \\sin(z) - z \\cos(z) \\bigg|_0^\\pi = \\int_{0}^\\pi z \\sin(z) \\text{d} z \\approx \\sum_{z \\in \\mathcal{Q}} z \\sin(z) \\omega(z), \\tag{6.10} \\end{equation}\\] where \\(\\mathcal{Q} = \\{z_1, \\ldots z_k\\}\\) are a set of \\(k\\) quadrature nodes and \\(\\omega: \\mathbb{R} \\to \\mathbb{R}\\) is a weighting function. The trapezoid rule is an example of a quadrature rule, in which quadrature nodes are spaced throughout the domain with \\(\\epsilon_i = z_i - z_{i - 1} > 0\\) for \\(1 < i < k\\). The weighting function is \\[\\begin{equation} \\omega(z_i) = \\begin{cases} \\epsilon_i & 1 < i < k, \\\\ \\epsilon_i / 2 & i \\in \\{1, k\\}. \\end{cases} \\end{equation}\\] Figure 6.2 shows application of the trapezoid rule to integration of \\(z \\sin(z)\\) as described in Equation (6.10). The more quadrature nodes are used, the more accurate the estimate of the integrand is. Under some regularity conditions on \\(f\\), as the spacing between quadrature nodes \\(\\epsilon \\to 0\\) the estimate obtained using the trapezoid rule converges to the true value of the integral. Indeed, this approach was used by Riemann to provide the first rigorous definition of the integral. Figure 6.2: The trapezoid rule with \\(k = 5, 10, 20\\) equally-spaced (\\(\\epsilon_i = \\epsilon > 0\\)) quadrature nodes can be used to integrate the function \\(f(z) = z \\sin(z)\\), shown in green, in the domain \\([0, \\pi]\\). Here, the exact solution is \\(\\pi \\approx 3.1416\\). As \\(k\\) increases and more nodes are used in the computation, the quadrature estimate becomes closer to the exact solution. The trapezoid rule estimate is given by the sum of the areas of the grey trapezoids. Quadrature methods are most effective when integrating over small dimensions, say three or less. This is because the number of quadrature nodes at which the function is required to be evaluated in the computation grows exponentially with the dimension. For even moderate dimension, this quickly makes computation intractable. For example, using 5, 10, or 20 quadrature nodes per dimension, as in Figure 6.2, in five-dimensions (rather than one, as shown) would require 3125, 100000 or 3200000 quadrature nodes respectively. Though quadrature is easily parallelisable, in that function evaluation at each node are entirely independent, solutions requiring the evaluation of millions quadrature nodes are unlikely to be tractable. 6.1.2.1 Gauss-Hermite quadrature It is possible to construct quadrature rules which use relatively few nodes and are highly accurate when the integrand adheres to certain assumptions [Chapter 4; Press et al. (2007)]. Gauss-Hermite quadrature [GHQ; Davis and Rabinowitz (1975)] is a quadrature rule designed to integrate functions of the form \\(f(\\mathbf{z}) = \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z})\\) exactly, that is with no error, such that \\[\\begin{equation} \\int \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z}) \\text{d} \\mathbf{z} = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z}) \\omega(\\mathbf{z}). \\tag{6.11} \\end{equation}\\] In this equation, the term \\(\\varphi(\\cdot)\\) is a standard multivariate normal density \\(\\mathcal{N}(\\cdot \\, | \\, \\mathbf{0}, \\mathbf{I})\\), where \\(\\mathbf{0}\\) and \\(\\mathbf{I}\\) are the zero-vector and identify matrix of relevant dimension, and the term \\(P_\\alpha(\\cdot)\\) is a polynomial with highest degree monomial \\(\\alpha \\leq 2k - 1\\), where \\(k\\) is the number of quadrature nodes per dimension. GHQ is attractive for Bayesian inference problems because posterior distributions are typically well approximated by functions of this form. Support for this statement is provided by the Bernstein–von Mises theorem, which states that, under some regularity conditions, as the number of data points increases the posterior distribution convergences to a Gaussian. I follow the notation for GHQ established by Bilodeau, Stringer, and Tang (2022). First, to construct the univariate GHQ rule for \\(z \\in \\mathbb{R}\\), let \\(H_k(z)\\) be the \\(k\\)th (probabilist’s) Hermite polynomial \\[\\begin{equation} H_k(z) = (-1)^k \\exp(z^2 / 2) \\frac{\\text{d}}{\\text{d}z^k} \\exp(-z^2 / 2) \\end{equation}\\] The Hermite polynomials are defined to be orthogonal with respect to the standard Gaussian probability density function \\[\\begin{equation} \\int H_k(z) H_l(z) \\varphi(z) \\text{d} z = \\delta_{kl}, \\end{equation}\\] where \\(\\delta_{kl} = 1\\) if \\(k = l\\) and \\(\\delta_{kl} = 0\\) otherwise. The GHQ nodes \\(z \\in \\mathcal{Q}(1, k)\\) are given by the \\(k\\) zeroes of the \\(k\\)th Hermite polynomial. For \\(k = 1, 2, 3\\) these zeros, up to three decimal places, are \\[\\begin{align} H_1(z) = z = 0 \\implies \\mathcal{Q}(1, 1) &= \\{0\\}, \\\\ H_2(z) = z^2 - 1 = 0 \\implies \\mathcal{Q}(1, 2) &= \\{-0.707, 0.707\\}, \\\\ H_3(z) = z^3 - 3z = 0 \\implies \\mathcal{Q}(1, 3) &= \\{-1.225, 0, 1.225\\}. \\end{align}\\] The quadrature nodes are symmetric about zero, and include zero when \\(k\\) is odd. The corresponding weighting function \\(\\omega: \\mathcal{Q}(1, k) \\to \\mathbb{R}\\) chosen to satisfy Equation (6.11) is given by \\[\\begin{equation} \\omega(z) = \\frac{k!}{\\varphi(z) [H_{k + 1}(z)]^2}. \\end{equation}\\] Multivariate GHQ rules are usually constructed using the product rule with identical univariate GHQ rules in each dimension. As such, in \\(d\\) dimensions, the multivariate GHQ nodes \\(\\mathbf{z} \\in \\mathcal{Q}(d, k)\\) are defined by \\[\\begin{equation} \\mathcal{Q}(d, k) = \\mathcal{Q}(1, k)^d = \\mathcal{Q}(1, k) \\times \\cdots \\times \\mathcal{Q}(1, k). \\end{equation}\\] The corresponding weighting function \\(\\omega: \\mathcal{Q}(d, k) \\to \\mathbb{R}\\) is given by a product of the univariate weighting functions \\(\\omega(\\mathbf{z}) = \\prod_{j = 1}^d \\omega(z_j)\\). 6.1.2.2 Adaptive quadrature In adaptive quadrature, the quadrature nodes and weights selected depend on the specific integrand being considered. For example, adaptive use of the trapezoid rule requires specifying a rule for the start point, end point, and spacing between quadrature nodes. It is particularly important to use an adaptive quadrature rule for Bayesian inference problems because the posterior normalising constant \\(p(\\mathbf{y})\\) is a function of the data. No fixed quadrature rule can be expected to effectively integrate all possible posterior distributions. In adaptive GHQ [AGHQ; Naylor and Smith (1982)] the quadrature nodes are shifted by the mode of the integrand, and rotated based on a matrix decomposition of the inverse curvature at the mode. To demonstrate AGHQ, consider its application to calculation of the posterior normalising constant. The relevant transformation of the GHQ nodes \\(\\mathcal{Q}(d, k)\\) is \\[\\begin{equation} \\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z}) = \\hat{\\mathbf{P}} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\end{equation}\\] where \\(\\hat{\\mathbf{P}}\\) is a matrix decomposition of \\(\\hat{\\boldsymbol{\\mathbf{H}}}^{-1} = \\hat{\\mathbf{P}} \\hat{\\mathbf{P}}^\\top\\). To account for the transformation, the weighting function may be redefined to include a matrix determinant, analogous to the Jacobian determinant, or more simply the matrix determinant may be written outside the integral. Taking the later approach, the resulting adaptive quadrature estimate of the posterior normalising constant is \\[\\begin{align} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) &= | \\hat{\\mathbf{P}} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(d, k)} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z})) \\omega(\\mathbf{z}) \\\\ &= | \\hat{\\mathbf{P}} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(d, k)} p(\\mathbf{y}, \\hat{\\mathbf{P}} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\omega(\\mathbf{z}). \\end{align}\\] The quantities \\(\\hat{\\boldsymbol{\\mathbf{\\phi}}}\\) and \\(\\hat{\\boldsymbol{\\mathbf{H}}}\\) are exactly those given in Equations (6.3) and (6.4) and used in the Laplace approximation. Indeed, when \\(k = 1\\) then AGHQ corresponds to the Laplace approximation. To see this, we have \\(H_1(z)\\) with univariate zero \\(z = 0\\) such that the adapted node is given by the mode \\(\\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z} = \\mathbf{0} = 0 \\times \\cdots \\times 0) = \\hat{\\boldsymbol{\\mathbf{\\phi}}}\\). The weighting function is given by \\[\\begin{equation} \\omega(0)^d = \\left( \\frac{1!}{\\varphi(0) H_{2}(0)^2} \\right)^d = \\left( \\frac{1}{\\varphi(0)} \\right)^d = \\left(2 \\pi\\right)^{d / 2}. \\end{equation}\\] The AGHQ estimate of the normalising constant for \\(k = 1\\) is then given by \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = p(\\mathbf{y}, \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\cdot | \\hat{\\mathbf{P}} | \\cdot (2 \\pi)^{d / 2} = p(\\mathbf{y}, \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\cdot \\frac{(2 \\pi)^{d / 2}}{| \\hat{\\mathbf{H}} | ^{1/2}}, \\end{equation}\\] which corresponds to the Laplace approximation \\(\\tilde p_{\\texttt{LA}}(\\mathbf{y})\\) given in Equation (6.6). This connection supports AGHQ being a natural extension of the Laplace approximation when greater accuracy than \\(k = 1\\) is required. Figure 6.3: The Gauss-Hermite quadrature nodes \\(\\mathbf{z} \\in \\mathcal{Q}(2, 3)\\) for a two-dimensional integral with three nodes per dimension (Panel A). Adaption occurs based on the mode (Panel B) and covariance of the integrand via either the Cholesky (Panel C) or spectral (Panel D) decomposition of the inverse curvature at the mode. Here, the integrand is \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\), where \\(\\text{sn}(\\cdot)\\) is the standard skewnormal probability density function with shape parameter \\(\\alpha \\in \\mathbb{R}\\). Two alternatives for the matrix decomposition \\(\\hat{\\boldsymbol{\\mathbf{H}}}^{-1} = \\hat{\\mathbf{P}} \\hat{\\mathbf{P}}^\\top\\) are the Cholesky and spectral decomposition (Jäckel 2005). For the Cholesky decomposition \\(\\hat{\\mathbf{P}} = \\hat{\\mathbf{L}}\\), where \\[\\begin{equation} \\hat{\\mathbf{L}} = \\begin{pmatrix} L_{11} & 0 & \\cdots & 0 \\\\ \\hat{L}_{12} & \\hat{L}_{22} & \\ddots & \\vdots \\\\ \\vdots & \\ddots& \\ddots& 0 \\\\ \\hat{L}_{1d} & \\ldots& \\hat{L}_{(d-1)d} & \\hat{L}_{dd}\\\\ \\end{pmatrix} \\end{equation}\\] is a lower triangular matrix. For the spectral decomposition \\(\\hat{\\mathbf{P}} = \\hat{\\mathbf{E}} \\hat{\\mathbf{\\Lambda}}^{1/2}\\), where \\(\\hat{\\mathbf{E}} = (\\hat{\\mathbf{e}}_{1}, \\ldots \\hat{\\mathbf{e}}_{d})\\) contains the eigenvectors of \\(\\hat{\\mathbf{H}}^{-1}\\) and \\(\\hat{\\mathbf{\\Lambda}}\\) is a diagonal matrix containing its eigenvalues \\((\\hat \\lambda_{1}, \\ldots, \\hat \\lambda_{d})\\). Figure 6.3 demonstrates GHQ and AGHQ for a two-dimensional example, using both decomposition approaches. Using the Cholesky decomposition results in adapted quadrature nodes which collapse along one of the dimensions, as a result of the matrix \\(\\hat{\\mathbf{L}}\\) being lower triangular. On the other hand, using the spectral decomposition results in adapted quadrature nodes which lie along the orthogonal eigenvectors of \\(\\hat{\\mathbf{H}}^{-1}\\). Using AGHQ, Bilodeau, Stringer, and Tang (2022) provide the first stochastic convergence rate for adaptive quadrature applied to Bayesian inference. 6.1.3 Integrated nested Laplace approximation The integrated nested Laplace approximation (INLA) method (Håvard Rue, Martino, and Chopin 2009) combines marginal Laplace approximations with quadrature to enable approximation of posterior marginal distributions. Consider the marginal Laplace approximation (Section 6.1.1.1) for a three-stage hierarchical model given by \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] To complete approximation of the posterior normalising constant, the marginal Laplace approximation can be integrated over the hyperparameters using a quadrature rule (Section 6.1.2) \\[\\begin{equation} \\tilde p(\\mathbf{y}) = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} \\tilde p_\\texttt{LA}(\\mathbf{z}, \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.12} \\end{equation}\\] Though any choice of quadrature rule is possible, following Stringer, Brown, and Stafford (2022) here I consider use of AGHQ. Let \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) be the \\(m\\)-dimensional GHQ nodes constructed using the product rule with \\(k\\) nodes per dimension, and \\(\\omega: \\mathbb{R}^m \\to \\mathbb{R}\\) the corresponding weighting function. These nodes are adapted by \\(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) = \\hat{\\mathbf{P}}_\\texttt{LA} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\) where \\[\\begin{align} \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA} &= \\arg\\max_{\\boldsymbol{\\mathbf{\\theta}}} \\log \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}), \\\\ \\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA} &= - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\theta}} \\partial \\boldsymbol{\\mathbf{\\theta}}^\\top} \\log \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) \\rvert_{\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}}, \\tag{6.13} \\\\ \\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA}^{-1} &= \\hat{\\mathbf{P}}_\\texttt{LA} \\hat{\\mathbf{P}}_\\texttt{LA}^\\top. \\end{align}\\] The nested AGHQ estimate of the posterior normalising constant is then \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = | \\hat{\\mathbf{P}}_\\texttt{LA} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.14} \\end{equation}\\] This estimate can be used to normalise the marginal Laplace approximation as follows \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_{\\texttt{AQ}}(\\mathbf{y})}. \\end{equation}\\] The posterior marginals \\(\\tilde p(\\theta_j \\, | \\, \\mathbf{y})\\) may be obtained by \\[\\begin{align} \\tilde p(\\theta_j \\, | \\, \\mathbf{y}) = \\int \\tilde p(\\theta_j \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}}_{-j}. \\end{align}\\] These integrals may be computed by reusing the AGHQ rule. More recent methods are discussed in Section 3.2 of Martins et al. (2013). Multiple methods have been proposed for obtaining the \\(\\tilde p(\\mathbf{x} \\, | \\, \\mathbf{y})\\) or individual marginals \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) Four methods are presented below, trading-off accuracy with computational expense. 6.1.3.1 Gaussian marginals Most easily, inferences for the latent field can be obtained by approximation of \\(p(\\mathbf{x} \\, | \\, \\mathbf{y})\\) using another application of the quadrature rule (Håvard Rue and Martino 2007) \\[\\begin{align} p(\\mathbf{x} \\, | \\, \\mathbf{y}) &= \\int p(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} = \\int p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) p(\\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} \\\\ &\\approx |\\hat{\\mathbf{P}}_\\texttt{LA}| \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.15} \\end{align}\\] The quadrature rule \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) is used both internally to normalise the marginal Laplace approximation, and externally to perform integration with respect to the hyperparameters. Equation (6.15) is a mixture of Gaussian distributions \\[\\begin{equation} p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}), \\tag{6.16} \\end{equation}\\] each with multinomial probabilities \\[\\begin{equation} \\lambda(\\mathbf{z}) = |\\hat{\\mathbf{P}}_\\texttt{LA}| \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\omega(\\mathbf{z}), \\end{equation}\\] where \\(\\sum \\lambda(\\mathbf{z}) = 1\\) and \\(\\lambda(\\mathbf{z}) > 0\\). Samples may therefore be naturally obtained for the complete vector \\(\\mathbf{x}\\) jointly by first drawing a node \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) with multinomial probabilities \\(\\lambda(\\mathbf{z})\\) then drawing a sample from the corresponding Gaussian distribution in Equation (6.16). Algorithms for fast and exact simulation from a Gaussian distribution have been developed, including by Håvard Rue (2001). The posterior marginals for any subset of the complete vector can simply be obtained by keeping the relevant entries of \\(\\mathbf{x}\\). 6.1.3.2 Laplace marginals An alternative higher accuracy, but more computationally expensive, approach is to calculate a Laplace approximation to the marginal posterior \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\tag{6.17} \\end{equation}\\] Here, the variable \\(x_i\\) is excluded from the Gaussian approximation such that \\[\\begin{equation} p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x}_{-i} \\, | \\, \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}})), \\end{equation}\\] with \\((N - 1)\\)-dimensional posterior mode \\[\\begin{equation} \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) = \\arg\\max_{\\mathbf{x}_{-i}} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] and \\([(N - 1) \\times (N - 1)]\\)-dimensional Hessian matrix evaluated at the posterior mode \\[\\begin{equation} \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) = - \\frac{\\partial^2}{\\partial \\mathbf{x}_{-i} \\partial \\mathbf{x}_{-i}^\\top} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] The approximate posterior marginal \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) may be obtained by normalising the marginal Laplace approximation (Equation (6.17)) before performing integration with respect to the hyperparameters (as in Equation (6.15)). The normalised Laplace approximation is \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p(\\mathbf{y})}. \\end{equation}\\] where either the estimate of the evidence in Equation (6.14) may be reused or a de novo estimate can be computed. Integration with respect to the hyperparameters is performed via \\[\\begin{align} p(x_i \\, | \\, \\mathbf{y}) &= \\int p(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} \\\\ &\\approx |\\hat{\\mathbf{P}}_\\texttt{LA}| \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\tilde \\omega(\\mathbf{z}). \\tag{6.18} \\end{align}\\] Equation (6.18) is a mixture of the normalised Laplace approximations \\(\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y})\\) over the hyperparameter quadrature nodes. However, unlike the Gaussian case (Section 6.1.3.1) it is not easy to directly sample each Laplace approximation. As such, Equation (6.18) may instead be represented by its evaluation at a number of nodes. One approach is to chose these nodes based on a one-dimensional AGHQ rule, using the mode and standard deviation of the Gaussian approximation to avoid unnecessary computation of the Laplace marginal mode and standard deviation. The probability density function of the marginal posterior may then be recovered using a Lagrange polynomial or spline interpolant to the log probabilities. An important downside of the Laplace approach is that posterior dependences between posterior marginal draws are not preserved, unlike in the mixture of Gaussians case (Equation (6.15)). Recent work using Gaussian copulas (Chiuchiolo, Niekerk, and Rue 2023) aims to retain the accuracy of the Laplace marginals strategy while obtaining a joint approximation. 6.1.3.3 Simplified Laplace marginals When the latent field \\(\\mathbf{x}\\) is a Gauss-Markov random fields [GMRF; Havard Rue and Held (2005)] it is possible to efficiently approximate the Laplace marginals in Section 6.1.3.2. The simplified approximation is achieved by a Taylor expansion on the numerator and denominator of Equation (6.17) up to third order. The approach is analogous to correcting the Gaussian approximation in Section 6.1.3.1 for location and skewness. Details are left to Section 3.2.3 of Håvard Rue, Martino, and Chopin (2009). 6.1.3.4 Simplified INLA Wood (2020) describe a method for approximating the Laplace marginals without depending on the Markov structure, while still achieving equivalent efficiency. This work was motivated by a setting in which, similar to extended latent Gaussian models [ELGMs; Stringer, Brown, and Stafford (2022)], precision matrices are not typically as sparse as GMRFs. Details are left to Wood (2020). 6.1.3.5 Augmenting a noisy structured additive predictor to the latent field Discussion of INLA is concluded by briefly mentioning a difference in implementation between Håvard Rue, Martino, and Chopin (2009) and Stringer, Brown, and Stafford (2022). Specifically, Håvard Rue, Martino, and Chopin (2009) augment the latent field to include a noisy structured additive predictor as follows \\[\\begin{align} \\boldsymbol{\\mathbf{\\eta}}^\\star &= \\boldsymbol{\\mathbf{\\eta}} + \\boldsymbol{\\mathbf{\\epsilon}}, \\\\ \\boldsymbol{\\mathbf{\\epsilon}} &\\sim \\mathcal{N}(\\mathbf{0}, \\tau^{-1} \\mathbf{I}_n), \\\\ \\mathbf{x}^\\star &= (\\boldsymbol{\\mathbf{\\eta}}^\\star, \\mathbf{x}). \\end{align}\\] Stringer, Brown, and Stafford (2022) (Section 3.2) omit this augmentation, highlighting several drawbacks including: fitting ELGMs, fitting LGMs to large datasets, and theoretical study of the approximation error. Similarly, in what Van Niekerk et al. (2023) (Section 2.1) refer to as the “modern” formula of INLA, the latent field is not augmented. The crux of the issue regards the dimensions and sparsity structure of the Hessian matrix \\(\\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})\\). Details are left to Stringer, Brown, and Stafford (2022). Based on these findings, this thesis does not augment the latent field. 6.1.4 Software 6.1.4.1 R-INLA The R-INLA software (Martins et al. 2013) implements the INLA method, as well as the stochastic partial differential equation (SPDE) approach of Lindgren, Rue, and Lindström (2011). R-INLA is the R interface to the core inla program, which is written in C (Martino and Rue 2009). Algorithms for sampling from GMRFs are used from the GMRFLib C library (Håvard Rue and Follestad 2001). First and second derivatives are either hard coded, or computed numerically using central finite differences (Fattah, Niekerk, and Rue 2022). For a review recent computational features of R-INLA, including parallelism via OpenMP (Diaz et al. 2018) and use of the PARDISO sparse linear equation solver (Bollhöfer et al. 2020), see Gaedke-Merzhäuser et al. (2023). Further information about R-INLA, including recent developments, can be found at https://r-inla.org. The connection between the latent field \\(\\mathbf{x}\\) and structured additive predictor \\(\\boldsymbol{\\mathbf{\\eta}}\\) is specified in R-INLA using a formula interface of the form y ~ .... The interface is similar to that used in the lm function in the core stats R package. For example, a model with one fixed effect a and one IID random effect b, has the formula y ~ a + f(b, model = \"iid\"). This interface is easy to engage with for new users, but can be limiting for more advanced users. The approach used to compute the marginals \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) can be chosen by setting method to \"gaussian\" (Section 6.1.3.1), \"laplace\" (Section 6.1.3.2) or simplified.laplace (Section 6.1.3.3). The quadrature grid used can be chosen by setting int.strategy to \"eb\" (empirical Bayes, one quadrature node), \"grid\" (a dense grid), or \"ccd\" [Box-Wilson central composite design; Box and Wilson (1992)]. Figure 6.4 demonstrates the latter two integration strategies. By default, the \"grid\" strategy is used for \\(m \\leq 2\\) and the \"ccd\" strategy is used for \\(m > 2\\). Various software packages have been built using R-INLA. Perhaps the most substantial is the inlabru R package (Bachl et al. 2019). As well as a simplified syntax, inlabru provides capabilities for fitting more general non-linear structured additive predictor expressions via linearisation and repeat use of R-INLA. These complex model components are specified in inlabru using the bru_mapper system. See the inlabru package vignettes for additional details. Further inference procedures which leverage R-INLA include INLA within MCMC (Gómez-Rubio and Rue 2018) and importance sampling with INLA (Berild et al. 2022). Figure 6.4: Consider the function \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\) as described in Figure 6.3. Panel A shows the grid method as used in R-INLA and detailed in Section 3.1 of Håvard Rue, Martino, and Chopin (2009). Briefly, equally-weighted quadrature points are generated by starting at the mode and taking steps of size \\(\\delta_z\\) along each eigenvector of the inverse curvature at the mode, scaled by the eigenvalues, until the difference in log-scale function evaluations (compared to the mode) is below a threshold \\(\\delta_\\pi\\). Intermediate values are included if they have sufficient log-scale function evaluation. Here, I set \\(\\delta_z = 0.75\\) and \\(\\delta_\\pi = 2\\). Panel B shows a CCD as used in R-INLA and detailed in Section 6.5 of Håvard Rue, Martino, and Chopin (2009). The CCD was generated using the rsm R package (Lenth 2009), and is comprised of: one centre point; four factorial points, used to help estimate linear effects; and four star points, used to help estimate the curvature. 6.1.4.2 TMB Template Model Builder [TMB; Kristensen et al. (2016)] is an R package which implements the Laplace approximation. In TMB, derivatives are obtained using automatic differentiation, also known as algorithmic differentiation [AD; Baydin et al. (2017)]. The approach of AD is to decompose any function into a sequence of elementary operations with known derivatives. The known derivatives of the elementary operations may then be composed by repeat use of the chain rule to obtain the function’s derivative. A review of AD and how it can be efficiently implemented is provided by C. C. Margossian (2019). TMB uses the C++ package CppAD (Bell 2023) for AD [Section 3; Kristensen et al. (2016)]. The development of TMB was strongly inspired by the Automatic Differentiation Model Builder [ADMB; Fournier et al. (2012); Bolker et al. (2013)] project. An algorithm is used in TMB to automatically determine matrix sparsity structure [Section 4.2; Kristensen et al. (2016)]. The R package Matrix and C++ package Eigen are then used for sparse and dense matrix calculations. Kristensen et al. (2016) highlight the modular design philosophy of TMB. Models are specified in TMB using a C++ template file which evaluates \\(\\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) in a Bayesian context or \\(\\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) in a frequentist setting. Other software packages have been developed which also use TMB C++ templates. The tmbstan R package (Monnahan and Kristensen 2018) allows running the Hamiltonian Monte Carlo (HMC) algorithm via Stan. The aghq R package (Stringer 2021) allows use of AGHQ, and AGHQ over the marginal Laplace approximation, via the mvQuad R package (Weiser 2016). The glmmTMB R package (Brooks et al. 2017) allows specification of common GLMM models via a formula interface. It is also possible to extract the TMB objective function used by glmmTMB, which may then be passed into aghq or tmbstan. A review of the use of TMB for spatial modelling, including comparison to R-INLA, is provided by Osgood-Zimmerman and Wakefield (2023). 6.1.4.3 Other software The mgcv [Mixed GAM computation vehicle; Wood (2017)] R package estimates generalised additive models (GAMs) specified using a formula interface. This package is briefly mentioned so as to note that the function mgcv::ginla implements the simplified INLA approach of Wood (2020) (Section 6.1.3.4). 6.2 A universal INLA implementation This section is about implementation of the INLA method using AD via the TMB package. Both the Gaussian and Laplace latent field marginal approximations are implemented. The implementation is universal in that it is compatible with any model with a TMB C++ template, rather than based on a restrictive formula interface. The TMB probabilistic programming language is described as “universal” in that it is an extension of the Turing-complete general purpose language C++. Martino and Riebler (2020) note that “implementing INLA from scratch is a complex task” and as a result “applications of INLA are limited to the (large class of) models implemented [in R-INLA]”. A universal INLA implementation facilitates application of the method to models which are not compatible with R-INLA. The Naomi model is one among many examples. Section 5 of Osgood-Zimmerman and Wakefield (2023) notes that “R-INLA is capable of using higher-quality approximations than TMB” (hyperparameter integration and latent field Laplace marginals) and “in return TMB is applicable to a wider class of models”. Yet there is no inherent reason for these capabilities to be in conflict: it is possible to have both high-quality approximations and flexibility. The potential benefits of a more flexible INLA implementation based on AD were noted by Skaug (2009) (a coauthor of TMB) in discussion of Håvard Rue, Martino, and Chopin (2009), who noted that such a system would be “fast, flexible, and easy-to-use”, as well as “automatic from a user’s perspective”. As this suggestion was made close to 15 years ago, it is surprising that its potential remains unrealised. I demonstrate the universal implementation with two examples: Section 6.2.1 considers a generalised linear mixed model (GLMM) of an epilepsy drug. The model was used in Section 5.2 of Håvard Rue, Martino, and Chopin (2009), and is compatible with R-INLA. For some parameters there is a notable difference in approximation error depending on use of Gaussian or Laplace marginals. This example demonstrates the correspondence between the Laplace marginal implementation developed in TMB, and that of R-INLA with method set to \"laplace\". Section 6.2.2 considers an extended latent Gaussian model (ELGM) of a tropical parasitic infection. The model was used in Section 5.2 of Bilodeau, Stringer, and Tang (2022), and is not compatible with R-INLA. This example demonstrates the benefit of a more widely applicable INLA implementation. 6.2.1 Epilepsy GLMM Thall and Vail (1990) considered a GLMM for an epilepsy drug double-blind clinical trial (Leppik et al. 1985). This model was modified by Breslow and Clayton (1993) and widely disseminated as a part of the BUGS [Bayesian inference using Gibbs sampling; D. Spiegelhalter et al. (1996)] manual. Patients \\(i = 1, \\ldots, 59\\) were each assigned either a new drug \\(\\texttt{Trt}_i = 1\\) or a placebo \\(\\texttt{Trt}_i = 0\\). Each patient made four visits the clinic \\(j = 1, \\ldots, 4\\), and the observations \\(y_{ij}\\) are the number of seizures of the \\(i\\)th person in the two weeks preceding their \\(j\\)th clinic visit (Figure 6.5). The covariates used in the model were baseline seizure counts \\(\\texttt{Base}_i\\), treatment \\(\\texttt{Trt}_i\\), age \\(\\texttt{Age}_i\\), and an indicator for the final clinic visit \\({\\texttt{V}_4}_j\\). Each of the covariates were centred. The observations were modelled using a Poisson distribution \\[\\begin{equation} y_{ij} \\sim \\text{Poisson}(e^{\\eta_{ij}}), \\end{equation}\\] with structured additive predictor \\[\\begin{align} \\eta_{ij} &= \\beta_0 + \\beta_\\texttt{Base} \\log(\\texttt{Base}_i / 4) + \\beta_\\texttt{Trt} \\texttt{Trt}_i + \\beta_{\\texttt{Trt} \\times \\texttt{Base}} (\\texttt{Trt}_i \\times \\log(\\texttt{Base}_i / 4)) \\\\ &+ \\beta_\\texttt{Age} \\log(\\texttt{Age}_i) + \\beta_{\\texttt{V}_4} {\\texttt{V}_4}_j + \\epsilon_i + \\nu_{ij}, \\quad i \\in [59], \\quad j \\in [4]. \\tag{6.19} \\end{align}\\] The prior distribution on each of the regression parameters, including the intercept \\(\\beta_0\\), was \\(\\mathcal{N}(0, 100^2)\\). The patient \\(\\epsilon_i \\sim \\mathcal{N}(0, 1/\\tau_\\epsilon)\\) and patient-visit \\(\\nu_{ij} \\sim \\mathcal{N}(0, 1/\\tau_\\nu)\\) random effects were IID with gamma precision prior distributions \\(\\tau_\\epsilon, \\tau_\\nu \\sim \\Gamma(0.001, 0.001)\\). Figure 6.5: The number of seizures in the treatment group was fewer, on average, than the number of seizures in the control group. This is not sufficient to conclude that the treatment was effective. The GLMM accounts for differences between the treatment and control group, including in baseline seizures and age, and so can be used to help estimate a causal treatment effect. Table 6.1: The inference methods and software considered to fit the epilepsy GLMM in Section 6.2.1. Method Software Section 6.2.1.1 Gaussian latent field marginals, EB over hyperparameters R-INLA Section 6.2.1.1 Gaussian latent field marginals, grid over hyperparameters R-INLA Section 6.2.1.1 Laplace latent field marginals, EB over hyperparameters R-INLA Section 6.2.1.1 Laplace latent field marginals, grid over hyperparameters R-INLA Section 6.2.1.2 Gaussian latent field marginals, EB over hyperparameters TMB Section 6.2.1.3 Gaussian latent field marginals, AGHQ over hyperparameters TMB and aghq Section 6.2.1.4 Laplace latent field marginals, EB over hyperparameters TMB Section 6.2.1.5 Laplace latent field marginals, AGHQ over hyperparameters TMB and aghq Section 6.2.1.6 NUTS tmbstan Section 6.2.1.7 NUTS rstan Inference for the epilepsy GLMM was conducted using a range of approaches (Table 6.1). Section 6.2.1.8 compares the results. The foremost objective of this exercise is to demonstrate correspondence between inferences obtained from R-INLA and those from TMB. Furthermore, illustrative code is used throughout this section to enhance understanding of the methods and software used. As such, this section is more verbose than future sections. 6.2.1.1 INLA with R-INLA The epilepsy data are available from the R-INLA package. The covariates may be obtained and their transformations centred by: centre <- function(x) (x - mean(x)) Epil <- Epil %>% mutate(CTrt = centre(Trt), ClBase4 = centre(log(Base/4)), CV4 = centre(V4), ClAge = centre(log(Age)), CBT = centre(Trt * log(Base/4))) The structured additive predictor in Equation (6.19) is then specified by: formula <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + f(rand, model = "iid", hyper = tau_prior) + f(Ind, model = "iid", hyper = tau_prior) The object tau_prior specifies the \\(\\Gamma(0.001, 0.001)\\) precision prior: tau_prior <- list(prec = list( prior = "loggamma", param = c(0.001, 0.001), initial = 1, fixed = FALSE) ) The prior is specified as loggamma because R-INLA represents the precision internally on the log scale, to avoid any \\(\\tau > 0\\) constraints. Inference may then be performed, specifying the latent field posterior marginals approach strat and quadrature approach int_strat: beta_prior <- list(mean = 0, prec = 1 / 100^2) epil_inla <- function(strat, int_strat) { inla( formula, control.fixed = beta_prior, family = "poisson", data = Epil, control.inla = list(strategy = strat, int.strategy = int_strat), control.predictor = list(compute = TRUE), control.compute = list(config = TRUE) ) } The object beta_prior specifies the \\(\\mathcal{N}(0, 100^2)\\) regression coefficient prior. The Poisson likelihood is specified via the family argument. Inferences may be then obtained via the fit object: fit <- epil_inla(strat = "gaussian", int_strat = "grid") As described in Section 6.1.4.1, strat may be set to one of \"gaussian\", \"laplace\", or \"simplified.laplace\" and int_strat may be set to one of \"eb\", \"grid\", or \"ccd\". 6.2.1.2 Gaussian marginals and EB with TMB With TMB, the log-posterior of the model is specified using a C++ template. For simple models, writing this template is usually a more involved task then specifying the formula object required for R-INLA. The TMB C++ template epil.cpp for the epilepsy GLMM is in Appendix C.1.1. This template specifies exactly the same model as R-INLA in Section 6.2.1.1. It is not trivial to do this, because each detail of the model must match. Lines with a DATA prefix specify the fixed data inputs to be passed to TMB. For example, the data \\(\\mathbf{y}\\) are passed via: DATA_VECTOR(y); Lines with a PARAMETER prefix specify the parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) to be estimated. For example, the regression coefficients \\(\\boldsymbol{\\mathbf{\\beta}}\\) are specified by: PARAMETER_VECTOR(beta); It is recommended to specify all parameters on the real scale to help performance of the optimisation procedure. More familiar versions of parameters, such as the precision rather than log precision, may be created outside the PARAMETER section. Lines of the form nll -= ddist(...) increment the negative log-posterior, where dist is the name of a distribution. For example, the Gaussian prior distributions on \\(\\boldsymbol{\\mathbf{\\beta}}\\) are implemented by: nll -= dnorm(beta, Type(0), Type(100), true).sum(); In R, the TMB user template may now be compiled and linked: compile("epil.cpp") dyn.load(dynlib("epil")) An objective function obj implementing \\(\\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) and its first and second derivatives may then be created: obj <- TMB::MakeADFun( data = dat, parameters = param, random = c("beta", "epsilon", "nu"), DLL = "epil" ) The object dat is a list of data inputs passed to TMB. The object param is a list of parameter starting values passed to TMB. The argument random determines which parameters are to be integrated out with a Gaussian approximation, here set to c(\"beta\", \"epsilon\", \"nu\"). Mathematically, these parameters correspond to the latent field \\[\\begin{equation} (\\beta_0, \\beta_\\texttt{Base}, \\beta_\\texttt{Trt}, \\beta_{\\texttt{Trt} \\times \\texttt{Base}}, \\beta_\\texttt{Age}, \\beta_{\\texttt{V}_4}, \\epsilon_1, \\ldots, \\epsilon_{59}, \\nu_{1,1}, \\ldots, \\nu_{59,4}) = (\\boldsymbol{\\mathbf{\\beta}}, \\boldsymbol{\\mathbf{\\epsilon}}, \\boldsymbol{\\mathbf{\\nu}}) = \\mathbf{x}. \\end{equation}\\] The objective function obj may then be optimised using a gradient based optimiser to obtain \\(\\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\). Here I use a quasi-Newton method (Dennis Jr, Gay, and Walsh 1981) as implemented by nlminb from the stats R package, making use of the first derivative obj$gr of the objective function: opt <- nlminb( start = obj$par, objective = obj$fn, gradient = obj$gr, control = list(iter.max = 1000, trace = 0) ) The sdreport function is used to evaluate the Hessian matrix of the parameters at a particular value. Typically, these Hessian matrices are for the hyperparameters, and based on the marginal Laplace approximation. Setting par.fixed to the previously obtained opt$par returns \\(\\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA}\\). However, by setting getJointPrecision = TRUE the the full Hessian matrix for the hyperparameters and latent field together is returned: sd_out <- TMB::sdreport( obj, par.fixed = opt$par, getJointPrecision = TRUE ) Figure 6.6: A submatrix of the full parameter Hessian obtained from TMB::sdreport with getJointPrecision = TRUE on the log scale. Entries for the latent field parameters \\(\\boldsymbol{\\mathbf{\\epsilon}}\\) and \\(\\boldsymbol{\\mathbf{\\nu}}\\) are omitted due to their respective lengths of 56 and 236. Light grey entries correspond to zeros on the real scale, which cannot be log transformed. Note that the epilepsy GLMM may also be succinctly fit in a frequentist setting (that is, using improper hyperparameter priors \\(p(\\boldsymbol{\\mathbf{\\theta}}) \\propto 1\\)) using the formula interface provided by glmmTMB: fit <- glmmTMB( y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + (1 | rand) + (1 | Ind), data = Epil, family = poisson(link = "log") ) 6.2.1.3 Gaussian marginals and AGHQ with TMB The objective function obj created in Section 6.2.1.2 may be directly passed to aghq to perform inference by integrating the marginal Laplace approximation over the hyperparameters using AGHQ. The argument k specifies the number of quadrature nodes to be used per hyperparameter dimension. Here there are two hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}} = (\\tau_\\epsilon, \\tau_\\nu)\\), and k is set to three, such that in total there are \\(3^2 = 9\\) quadrature nodes: init <- c(param$l_tau_epsilon, param$l_tau_nu) fit <- aghq::marginal_laplace_tmb(obj, k = 3, startingvalue = init) Draws from the mixture of Gaussians approximating the latent field posterior distribution (Equation (6.15)) can be obtained by: samples <- aghq::sample_marginal(aghq, M = 1000)$samps For a more complete aghq vignette, see Stringer (2021). 6.2.1.4 Laplace marginals and EB with TMB The Laplace latent field marginal \\(\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) may be obtained using TMB by setting random to \\(\\mathbf{x}_{-i}\\) in the MakeADFun function call to approximate \\(p(\\mathbf{x}_{-i} \\, | \\,x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) with a Gaussian distribution. However, it is not directly possible to do this, because the random argument takes a vector of strings as input (e.g. c(\"beta\", \"epsilon\", \"nu\")) and does not have a native method for indexing. Instead, I took the following steps to modify the TMB C++ template and enable the desired indexing: Include DATA_INTEGER(i) to pass the index \\(i\\) to TMB via the data argument of MakeADFun. Concatenate the latent field to PARAMETER_VECTOR(x_minus_i) and PARAMETER(x_i) such that random can be set to x_minus_i in the call to MakeADFun. Include DATA_IVECTOR(x_lengths) and DATA_IVECTOR(x_starts) to pass the (integer) start point and lengths of each subvector of \\(\\mathbf{x}\\) via the data argument of MakeADFun. The \\(j\\)th subvector may then be obtained from within the TMB template via x.segment(x_starts(j), x_lengths(j)). The modified TMB C++ template epil_modified.cpp for the epilepsy GLMM is in Appendix C.1.2, and may be compared to the unmodified version to provide an example of implementing the above steps. After suitable alterations are made to dat and param, it is then possible to obtain the desired objective function in TMB via: compile("epil_modified.cpp") dyn.load(dynlib("epil_modified.cpp")) obj_i <- MakeADFun( data = dat, parameters = param, random = "x_minus_i", DLL = "epil_modified", silent = TRUE, ) This section takes an EB approach, fixing the hyperparameters to their modal value \\(\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\) obtained previously in opt. The latent field marginals approximation is then directly proportional to the unnormalised Laplace approximation obtained above as obj_i, evaluated at \\((x_i, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA})\\) \\[\\begin{align} \\tilde p(x_i \\, | \\, \\mathbf{y}) &\\approx \\tilde p_\\texttt{LA}(x_i \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}) \\tilde p_\\texttt{LA}(\\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA} \\, | \\, \\mathbf{y}) \\\\ &\\propto \\tilde p_\\texttt{LA}(x_i, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}). \\end{align}\\] This expression may be evaluated at a set of GHQ nodes \\(z \\in \\mathcal{Q}(1, l)\\) adapted \\(z \\mapsto x_i(z)\\) based on the mode and standard deviation of the Gaussian marginal. Here, \\(l = 5\\) quadrature nodes were chosen to allow spline interpolation of the resulting log-posterior. Each evaluation of obj_i, which involves an inner optimisation loop to compute the Laplace approximation, can be initialised by \\(\\mathbf{x}_{-i}\\) set to the mode of the full \\(N\\)-dimensional Gaussian approximation \\(p_\\texttt{G}(\\mathbf{x} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y})\\) with the \\(i\\)th entry removed \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_{-i}\\). This is an efficient approach because the \\((N - 1)\\)-dimensional posterior mode, with \\(x_i\\) fixed, is likely to be similar to the \\(N\\)-dimensional posterior mode with the \\(i\\)th entry removed. A normalised posterior can be obtained by computing a de novo posterior normalising constant based on the set of evaluated \\(l\\) quadrature nodes. This approach requires creation of the objective function obj_i for \\(i = 1, \\ldots, N\\). Each of these functions are then evaluated at a set of \\(l\\) quadrature nodes. It is inefficient to run MakeADFun from scratch for each \\(i\\), when only one data input i is changing. TMB does have a DATA_UPDATE macro, which would allow changing of data “on the R side” without retaping via: obj_i$env$data$i <- i Although this approach would be more efficient, if else statements on data items which can be updated (as used in epil_modified.cpp) are not supported, so this is not yet possible. 6.2.1.5 Laplace marginals and AGHQ with TMB The approach taken in Section 6.2.1.4 may be extended by integrating the marginal Laplace approximation with respect to the hyperparameters. To perform this integration, the quadrature nodes used to integrate \\(p_\\texttt{LA}({\\boldsymbol{\\mathbf{\\theta}}}, \\mathbf{y})\\) may be reused. The latent field marginal approximation is then \\[\\begin{equation} \\tilde p(x_i \\, | \\, \\mathbf{y}) \\propto \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\omega(\\mathbf{z}). \\end{equation}\\] As in Section 6.2.1.4 this expression may be evaluated at a set of \\(l\\) quadrature nodes, and normalised de novo. Each objective function inner optimisation can be initialised using the mode \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}))_{-i}\\) of \\(p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y})\\). Integration over the hyperparameters requires each of the \\(N\\) objective functions to be evaluated at \\(k \\times l\\) points, rather than the \\(1 \\times l\\) points required in the EB approach. The complete algorithm is given in Appendix C.3. 6.2.1.6 NUTS with tmbstan Running NUTS with tmbstan using the objective function obj is easy to do: fit <- tmbstan::tmbstan(obj = obj, chains = 4, laplace = FALSE) As specified above, the objective function with no marginal Laplace approximation is used. To instead use the marginal Laplace approximation, set laplace = TRUE. Four chains of 2000 iterations, with the first 1000 iterations from each chain discarded as warm-up, were run. Convergence diagnostics are in Appendix C.1.4.1. 6.2.1.7 NUTS with rstan For interest in the relative inefficiency of tmbstan, the epilepsy model was also implemented in Stan. The Stan C++ template epil.stan for the epilepsy GLMM is in Appendix C.1.3. This may be of interest to users familiar with Stan syntax, to help provide context for TMB. The Stan template was validated as be equivalent to the TMB template up to a constant of proportionality. Inferences from Stan may be obtained by fit <- rstan::stan(file = "epil.stan", data = dat, chains = 4) Like for tmbstan, four chains of 2000 iterations, with 1000 iterations of burn-in, were run. Convergence diagnostics are in Appendix C.1.4.2. 6.2.1.8 Comparison Figure 6.7: Percentage difference in posterior summary estimate obtained from NUTS as compared to that obtained from a Gaussian (Section 6.2.1.3) or Laplace marginal (Section 6.2.1.5) with AGHQ over the hyperparameters. NUTS results were obtained with tmbstan. Results from R-INLA and TMB are similar, especially for the posterior mean, but do differ in places. Differences could be attributable to bias corrections used in R-INLA. Figure 6.8: The ECDF and ECDF difference for the \\(\\beta_0\\) latent field parameter. For this parameter, the Gaussian marginal results are inaccurate, and are corrected almost entirely by the Laplace marginal. An ECDF difference of zero corresponds to obtaining exactly the same results as NUTS, taken to be the gold-standard. Crucially, results obtained using R-INLA and TMB implementations are similar. Posterior means and standard deviations for the the six regression parameters \\(\\boldsymbol{\\mathbf{\\beta}}\\) from the inference methods implemented in TMB (Section 6.2.1.2, 6.2.1.3, 6.2.1.3, 6.2.1.5) were highly similar to their R-INLA analogues in Section 6.2.1.1 (Figure 6.7). Posterior distributions obtained were also similar. Figure 6.8 shows ECDF difference plots for Gaussian or Laplace marginals from TMB and R-INLA (as compared with results from NUTS implemented in tmbstan) for \\(\\beta_0\\). These results provide evidence that the implementation of INLA in TMB is correct. Figure 6.9: The number of seconds taken to perform inference for the epilepsy GLMM using each method and software implementation given in Table 6.1. Figures 6.9 shows the number of seconds taken to fit the epilepsy GLMM model for each approach. Gaussian marginals with either EB or AGHQ via TMB were the fastest approach. All of the approaches using R-INLA took a similar amount of time. The approaches using TMB to implement Laplace marginals were slower than their equivalent in R-INLA. The TMB implementation is relatively naive, based on a simple for loop, and does not use the more advanced approximations of R-INLA. Laplace marginals in TMB with AGHQ (\\(k^2 = 3^2 = 9\\) quadrature nodes) took 3.4 times as long as Laplace marginals in TMB with EB (\\(k^2 = 1^2 = 1\\) quadrature node). For this problem, the tmbstan implementation of NUTS took 38.9% of time of the rstan implementation. Diagnostics (Figures C.1 and C.2) show that both implementations converged. Monnahan and Kristensen (2018) (Supporting information) found runtime with rstan and tmbstan to be comparable, so the relatively large difference in this case is surprising. 6.2.2 Loa loa ELGM Figure 6.10: Empirical prevalence of Loa loa in 190 sampled villages in Cameroon and Nigeria. The map in Panel A shows the village locations, empirical prevalences, presence of zeros, and sample sizes. The zeros are typically located in close proximity to each other. The histogram in Panel B shows the empirical prevalences, and high number of zeros. Bilodeau, Stringer, and Tang (2022) considered a ELGM for the prevalence of the parasitic worm Loa loa. Counts of cases \\(y_i \\in \\mathbb{N}^{+}\\) from a sample of size \\(n_i \\in \\mathbb{N}^{+}\\) were obtained from field studies in \\(n = 190\\) villages in Cameroon and Nigeria [Schlüter et al. (2016); Figure 6.10]. Some areas are thought to be unsuitable for disease transmission, and possibly as a result there are relatively high number of villages with zero prevalence. To account for the possibility of structural zeros, following Diggle and Giorgi (2016), a zero-inflated binomial likelihood was used \\[\\begin{equation} p(y_i) = (1 - \\phi(s_i)) \\mathbb{I}(y_i = 0) + \\phi(s_i) \\text{Bin}(y_i \\, | \\, n_i, \\rho(s_i)) \\tag{6.20} \\end{equation}\\] where \\(s_i \\in \\mathbb{R}^2\\) is the village location, \\(\\phi(s_i) \\in [0, 1]\\) is the suitability probability, and \\(\\rho(s_i) \\in [0, 1]\\) is the disease prevalence. The prevalence and suitability were modelled jointly using logistic regressions \\[\\begin{align} \\text{logit}[\\phi(s)] &= \\beta_\\phi + u(s), \\\\ \\text{logit}[\\rho(s)] &= \\beta_\\rho + v(s). \\end{align}\\] The two regression coefficients \\(\\beta_\\phi\\) and \\(\\beta_\\rho\\) were given diffuse Gaussian prior distributions \\[\\begin{equation} \\beta_\\phi, \\beta_\\rho \\sim \\mathcal{N}(0, 1000). \\end{equation}\\] Independent Gaussian processes \\(u(s)\\) and \\(v(s)\\) were specified by a Matérn kernel (Stein 1999) with shared hyperparameters. Gamma penalised complexity (Simpson et al. 2017; Fuglstad et al. 2019) prior distributions were used for the standard deviation \\(\\sigma\\) and range \\(\\rho\\) hyperparameters such that (Brown 2015) \\[\\begin{align} \\mathbb{P}(\\sigma < 4) &= 0.975, \\\\ \\mathbb{P}(\\rho < 200\\text{km}) &= 0.975. \\end{align}\\] The smoothness parameter \\(\\nu\\) was fixed to 1. The zero-inflated likelihood in Equation (6.20) is not compatible with R-INLA. Section 2.2 of Brown (2015) demonstrates use of R-INLA to fit a simpler LGM model which includes covariates. Instead, Bilodeau, Stringer, and Tang (2022) implemented this model in TMB. Inference was then performed using Gaussian marginals and AGHQ via aghq and NUTS via tmbstan. This section considers inference using three approaches (Table 6.2), extending Bilodeau, Stringer, and Tang (2022) by including AGHQ with Laplace marginals. Table 6.2: The inference methods and software considered to fit the Loa loa ELGM in Section 6.2.2. Method Software Details Gaussian, AGHQ TMB and aghq \\(k = 3\\) Laplace, AGHQ TMB and aghq \\(k = 3\\) NUTS tmbstan 4 chains of 5000 iterations, with default NUTS settings as implemented in rstan (Carpenter et al. 2017) Bilodeau, Stringer, and Tang (2022) found that NUTS did not converge for the full model, but did converge when the values of \\(\\beta_\\phi\\) and \\(\\beta_\\rho\\) were fixed at their posterior mode (obtained using AGHQ with Gaussian marginals). To allow for comparison between Gaussian and Laplace marginals, the same approach was taken here. After obtaining posterior inferences at each \\(s_i\\), the gstat::krige function (E. J. Pebesma 2004) was used to implement conditional Gaussian field simulation [E. Pebesma and Bivand (2023); Chapter 12] over a fine spatial grid. Independent latent field and hyperparameter samples were used in each conditional simulation. For each method (Table 6.2) 500 conditional Gaussian field simulations were obtained. 6.2.2.1 Results Figure 6.11: Posterior mean of the suitability \\(\\mathbb{E}[\\phi_\\texttt{LA}(s)]\\) (Panel A) and prevalence \\(\\mathbb{E}[\\rho_\\texttt{LA}(s)]\\) (Panel B) random fields computed using Laplace marginals. Inferences over this fine spatial grid were using conditional Gaussian field simulation as implemented by gstat::krige. Figure 6.11 shows the suitability and prevalence posterior means across the fine grid obtained using AGHQ with Laplace marginals. Figure 6.12: Difference between the suitability posterior means with Gaussian marginals \\(\\mathbb{E}[\\phi_\\texttt{G}(s)]\\) and Laplace marginals \\(\\mathbb{E}[\\phi_\\texttt{LA}(s)]\\) to NUTS results. While the Gaussian approximation appears to systematically underestimate suitability, results from the Laplace approximation are substantially closer to results from NUTS. As \\(\\beta_\\phi\\) was fixed then differences in approximation accuracy between the Gaussian and Laplace approximations of \\(\\phi(s)\\) are due only to differences in estimation of \\(u(s)\\). The diverging colour palette used in this figure is from Thyng et al. (2016). Figure 6.13: Difference between the prevalence posterior means with Gaussian marginals \\(\\mathbb{E}[\\rho_\\texttt{G}(s)]\\) and Laplace marginals \\(\\mathbb{E}[\\rho_\\texttt{LA}(s)]\\) to NUTS results. Like the suitability in Figure 6.12, the error the the Gaussian approximation is higher than that of the Laplace approximation. As \\(\\beta_\\rho\\) was fixed this difference is as a result in differences in estimation of \\(v(s)\\). The diverging colour palette used in this figure is from Thyng et al. (2016). Figure 6.14: Absolute difference between the Gaussian and Laplace marginal posterior means and standard deviations to NUTS results at each \\(u(s_i), v(s_i): i \\in [190]\\). Relative differences are in Figure C.4. For close to every node, the Laplace approximation produced a more accurate posterior mean than the Gaussian approximation. For the posterior standard deviation (SD), the picture was more mixed. Figure 6.15: The element of the latent field with maximum difference in absolute difference to NUTS for the posterior mean was \\(u_{184}\\). While the Gaussian approximation has substantial error as compared with NUTS, the Laplace approximation is a close match. For both the suitability and prevalence posterior mean, using Laplace marginals rather than Gaussian marginals substantially reduced error compared to NUTS (Figures 6.12 and 6.13). As the hyperparameter posteriors for each approach were the same, differences in Gaussian field simulation results were due to differences in latent field posterior marginals at each of the 190 sites, shown in Figure 6.14. At some sites, the differences in ECDF were substantial (Figure 6.15). This improvement is even given that draws from the Laplace marginals do not take posterior dependences into account like the draws from the mixture of Gaussians used to construct the Gaussian marginals. Figure C.3 shows that the results from NUTS were suitable for use, and therefore that this comparison is valid. Figure 6.16: The number of minutes taken to perform inference for the Loa loa ELGM using each approach given in Table 6.2. Laplace marginals with AGHQ took 12% of time taken (23.1 hours) by NUTS (Figure 6.16). That said, Gaussian marginals with AGHQ took less than a minute to run: substantially less than the 2.77 hours taken by the Laplace marginals. A less naive Laplace implementation may achieve a runtime more competitive to the Gaussian. 6.3 The Naomi model The work in this chapter was conducted in search of a fast and accurate Bayesian inference method for the Naomi model (Eaton et al. 2021). Software has been developed for Naomi to allow countries to input their data and interactively generate estimates during week long workshops as a part of a yearly process supported by UNAIDS. Generation of estimates by country teams, rather than external agencies or researchers, is an important and distinctive feature of the HIV response. Drawing on expertise closest to the data being modelled improves the accuracy of the process, as well as strengthening trust in the resulting estimates, creating a virtuous cycle of data quality, use and ownership (Noor 2022). To allow interactive review and iteration of model results by workshop participants, any inference procedure for Naomi should ideally be fast and have low memory usage. Additionally, it should be reliable and automatic, across a range of country settings. Naomi is a complex model, comprised of multiple linked generalized linear mixed models (GLMMs), and as such these requirements present a challenging Bayesian inference problem. This section begins (Section 6.3.1) by describing a simplified version of Naomi. The model is simplified in that it is defined only at the time of the most recent household survey with HIV testing. The nowcasting and temporal projection components of the complete model are omitted. These time points play a limited role in inference as they correspond to a small proportion of the total data. As such, findings about inference for the simplified model are likely transferable to the complete model. Description of some features of the simplified model is left to the more exhaustive Appendix C.4. After outlining the model, Section 6.3.2 explains why it is an ELGM (Stringer, Brown, and Stafford 2022) rather than an LGM (Håvard Rue, Martino, and Chopin 2009). 6.3.1 Model structure Naomi synthesises data from three sources to estimate HIV indicators at a district-level, by age and sex. It may be described as having three components, corresponding to these three data sources. The model components are: the household survey component (Section 6.3.1.2); the antenatal care (ANC) clinic testing component (Section 6.3.1.4); the antiretroviral therapy (ART) attendance component (Section 6.3.1.4). After specifying common notation used throughout the model (Section 6.3.1.1) each of these components is described in turn. 6.3.1.1 Notation Consider a country in sub-Saharan Africa where a household survey with complex design has taken place. Let \\(x \\in \\mathcal{X}\\) index district, \\(a \\in \\mathcal{A}\\) index five-year age group, and \\(s \\in \\mathcal{S}\\) index sex. For ease of notation, let \\(i\\) index the finest district-age-sex division included in the model. (A district-age-sex specific quantity \\(z_{x,a,s}\\) may then be written as \\(z_i\\). When required the district, age, and sex corresponding to the index \\(i\\) may be recovered by \\(x(i) = x\\), \\(a(i) = a\\), and \\(s(i) = s\\).) Let: \\(N_i \\in \\mathbb{N}\\) be the known, fixed population size; \\(\\rho_i \\in [0, 1]\\) be the HIV prevalence; \\(\\alpha_i \\in [0, 1]\\) be the ART coverage; \\(\\kappa_i \\in [0, 1]\\) be the proportion recently infected among HIV positive persons; \\(\\lambda_i > 0\\) be the annual HIV incidence rate. Some observations are made at an aggregate level over a collection of strata \\(i\\) rather than for a single \\(i\\). Let \\(I \\subseteq \\mathcal{X} \\times \\mathcal{A} \\times \\mathcal{S}\\) be a set of indices \\(i\\) for which an aggregate observation is reported. The set of all \\(I\\) is denoted \\(\\mathcal{I}\\) such that \\(I \\in \\mathcal{I}\\). 6.3.1.2 Household survey component Independent logistic regression models are specified for HIV prevalence and ART coverage in the general population. Without giving the linear predictors in detail, these models are specified by \\[\\begin{equation} \\text{logit}(\\rho_i) = \\eta^\\rho_i, \\tag{6.21} \\end{equation}\\] and \\[\\begin{equation} \\text{logit}(\\alpha_i) = \\eta^\\alpha_i. \\tag{6.22} \\end{equation}\\] HIV incidence rate is modelled on the log scale as \\[\\begin{equation} \\log(\\lambda_i) = \\eta^\\lambda_i. \\tag{6.23} \\end{equation}\\] The structured additive predictor \\(\\eta^\\lambda_i\\) includes terms for adult HIV prevalence and adult ART coverage. The proportion recently infected among HIV positive persons is linked to HIV incidence via \\[\\begin{equation} \\kappa_i = 1- \\exp \\left( - \\lambda_i \\cdot \\frac{1 - \\rho_i}{\\rho_i} \\cdot (\\Omega_T - \\beta_T) - \\beta_T \\right), \\tag{6.24} \\end{equation}\\] where the mean duration of recent infection \\(\\Omega_T\\) and the proportion of long-term HIV infections misclassified as recent \\(\\beta_T\\) are set based on informative priors for the particular HIV test used. The three processes in Equations (6.21), (6.22), and (6.23) are each primarily informed by household survey data. Let \\(j\\) denote a surveyed individual, in district-age-sex strata \\(i(j)\\). Weighted aggregate survey observations are calculated based on individual responses \\(\\theta_j \\in \\{0, 1\\}\\) as \\[\\begin{equation} \\hat \\theta_I = \\frac{\\sum_{i(j) \\in I} w_j \\cdot\\theta_j}{\\sum_{i(j) \\in I} w_j}, \\tag{6.25} \\end{equation}\\] Survey weights \\(w_j\\) for each of \\(\\theta \\in \\{\\rho, \\alpha, \\kappa\\}\\) are supplied by the survey provider. These weights aim to reduce bias by decreasing possible correlation between response and recording mechanism (Meng 2018). The weighted aggregate number of outcomes are obtained by multiplying Equation (6.25) by the Kish effective sample size [ESS; Kish (1965)] \\[\\begin{equation} y^{\\theta}_{I} = m^{\\theta}_{I} \\hat \\theta_{I}, \\tag{6.26} \\end{equation}\\] where \\[\\begin{equation} m^{\\theta}_I = \\frac{\\left(\\sum_{i(j) \\in I} w_j\\right)^2}{\\sum_{i(j) \\in I} w_j^2}. \\tag{6.27} \\end{equation}\\] As the Kish ESS is maximised by constant survey weights, in exchange for reducing bias, survey weighting increases variance. Equations (6.25) and (6.27) are slightly imprecise in the notation used does not reflect the fact that \\(j\\) only runs over individuals within the relevant denominator. In particular, for ART coverage \\(\\alpha\\) and the proportion recently infected among HIV positive persons \\(\\kappa\\), only those individuals who are HIV positive are included in the set. The denominator for HIV prevalence \\(\\rho\\) includes all individuals. The weighted aggregate number of outcomes are modelled using a binomial working likelihood (Chen, Wakefield, and Lumely 2014) defined to operate on the reals \\[\\begin{equation} y^{\\theta}_{I} \\sim \\text{xBin}(m^{\\theta}_{I}, \\theta_{I}). \\tag{6.28} \\end{equation}\\] The terms \\(\\theta_{I}\\) are the following weighted aggregates \\[\\begin{equation} \\rho_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i}{\\sum_{i \\in I} N_i}, \\quad \\alpha_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i \\alpha_i}{\\sum_{i \\in I} N_i \\rho_i}, \\quad \\kappa_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i \\kappa_i}{\\sum_{i \\in I} N_i \\rho_i}, \\tag{6.29} \\end{equation}\\] where the denominators of \\(\\alpha_{I}\\) and \\(\\kappa_{I}\\) reflect their restriction to HIV positive persons. 6.3.1.3 ANC testing component Women attending ANC clinics are routinely tested for HIV, to help prevent mother-to-child transmission. HIV prevalence \\(\\rho^\\text{ANC}_i \\in [0, 1]\\) and ART coverage \\(\\alpha^\\text{ANC}_i \\in [0, 1]\\) among pregnant women are modelled as offset from the general population indicators. (For \\(s(i)\\) male, these quantities are not defined.) Again not detailing the linear predictors, the model is of the form \\[\\begin{align} \\text{logit}(\\rho^\\text{ANC}_i) &= \\text{logit}(\\rho_i) + \\eta^{\\rho^\\text{ANC}}_i, \\\\ \\text{logit}(\\alpha^\\text{ANC}_i) &= \\text{logit}(\\alpha_i) + \\eta^{\\alpha^\\text{ANC}}_i. \\end{align}\\] The terms \\(\\eta^{\\rho^\\text{ANC}}_i\\) and \\(\\eta^{\\alpha^\\text{ANC}}_i\\) can be interpreted as the differences in HIV prevalence and ART coverage between pregnant women attending ANC, and the general population. As such, both the household survey data informs ANC indicators, and the ANC indicator informs general population indicators. These two processes are informed by likelihoods specified for aggregate ANC clinic data from the year of the most recent survey. Let: the number of ANC clients with ascertained status be fixed as \\(m^{\\rho^\\text{ANC}}_I\\); the number of those with positive status are \\(y^{\\rho^\\text{ANC}}_I \\leq m^{\\rho^\\text{ANC}}_I\\); the number of those already on ART prior to their first ANC visit are \\(y^{\\alpha^\\text{ANC}}_I \\leq y^{\\rho^\\text{ANC}}_I\\). These data are modelled using nested binomial likelihoods \\[\\begin{align*} y^{\\rho^\\text{ANC}}_I &\\sim \\text{Bin}(m^{\\rho^\\text{ANC}}_I, \\rho^\\text{ANC}_{I}), \\\\ y^{\\alpha^\\text{ANC}}_I &\\sim \\text{Bin}(y^{\\rho^\\text{ANC}}_I, \\alpha^\\text{ANC}_{I}). \\end{align*}\\] It is not necessary to use an extended binomial working likelihood, as in Section 3.5, because the ANC data are not survey weighted and therefore are integer valued. Analogous to Equation (6.29) in the household survey component, the weighted aggregates used here are \\[\\begin{equation*} \\rho^\\text{ANC}_{I} = \\frac{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC}}{\\sum_{i \\in I} \\Psi_i}, \\quad \\alpha^\\text{ANC}_{I} = \\frac{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC} \\alpha^\\text{ANC}_i}{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC}}, \\end{equation*}\\] where \\(\\Psi_i\\) are the number of pregnant women, which are assume to be fixed. 6.3.1.4 ART attendance component Data on attendance of ART clinics are routinely collected. These data provide helpful information about HIV prevalence and coverage of ART, but are challenging to use because people living with HIV sometimes choose to access ART services outside of the district that they reside in. (Indeed, this section of the model remains a challenge, and is under active development (Esra et al. 2024).) Multinomial logistic regression equations are used to model the probabilities of individuals accessing treatment outside their home district. Briefly, let \\(\\gamma_{x, x'}\\) be the probability that a person on ART residing in district \\(x\\) receives ART in district \\(x'\\). These probabilities are set to \\(\\gamma_{x, x'} = 0\\) unless \\(x = x'\\) or the two districts are neighbouring such that \\(x \\sim x'\\). As such, it is assumed that no one travels beyond their district or its immediate neighbours to receive ART services. (Of course, in reality this assumption is violated.) The log-odds are modelled using a structured additive predictor which only depends on the home district \\(x\\) \\[\\begin{equation} \\tilde \\gamma_{x, x'} = \\text{logit}(\\gamma_{x, x'}) = \\eta_{x}^{\\tilde \\gamma}. \\end{equation}\\] As a result, it is assumed that travel to each neighbouring district, for all age-sex strata, is equally likely. Let the number of people observed receiving ART in strata \\(i\\) be \\(y^{A}_i\\) with corresponding aggregate \\[\\begin{equation} y^{A}_I = \\sum_{i \\in I} y^{A}_i. \\tag{6.30} \\end{equation}\\] Let the probability of a person in strata \\(i\\) travelling from district \\(x(i) = x\\) to \\(x'\\) to receive ART be \\[\\begin{equation} \\pi_{i, x(i) = x, x'} = \\rho_{i} \\alpha_{i} \\gamma_{x(i) = x, x'}. \\tag{6.31} \\end{equation}\\] These probabilities are the product of three probabilities, each for a person in strata \\(i\\): the probability of a having HIV \\(\\rho_{i}\\), the probability of taking ART \\(\\alpha_{i}\\), the probability of travelling from district \\(x(i) = x\\) to district \\(x'\\) to receive ART \\(\\gamma_{x(i) = x, x'}\\). Let the unobserved count of people in strata \\(i\\) who travel to \\(x'\\) to receive ART be \\(A_{i, x(i) = x, x'}\\), such that \\[\\begin{equation} A_i = \\sum_{x' \\sim x, x' = x} A_{i, x(i) = x', x}. \\end{equation}\\] Each unobserved count can be considered as arising from a binomial distribution, with sample size given by the population in strata \\(i\\), here with \\(x(i) = x'\\) such that \\[\\begin{equation} A_{i, x(i) = x', x} \\sim \\text{Bin}(N_{i, x(i) = x'}, \\pi_{i, x(i) = x', x}). \\end{equation}\\] Each aggregate attendance observation (Equation (6.30)) is modelled using a Gaussian approximation to a sum of binomials. This sum is over both the strata \\(i \\in I\\) and the number of ART clients travelling from district \\(x(i) = x'\\) to \\(x\\) to receive treatment. The Gaussian approximation is \\[\\begin{equation} y^{A}_I \\sim \\mathcal{N}(\\mu^A_I, {\\sigma^A_I}^2), \\end{equation}\\] where the mean is \\[\\begin{equation} \\mu^A_I = \\sum_{i \\in I} \\sum_{x' \\sim x, x' = x} N_{i, x(i) = x'} \\cdot \\pi_{i, x(i) = x', x}, \\tag{6.32} \\end{equation}\\] and the variance is \\[\\begin{equation} {\\sigma^A_I}^2 = \\sum_{i \\in I} \\sum_{x' \\sim x, x' = x} N_{i, x(i) = x'} \\cdot \\pi_{i, x(i) = x', x} \\cdot (1 - \\pi_{i, x(i) = x', x}). \\tag{6.33} \\end{equation}\\] Equations (6.32) and (6.33) are based on a Gaussian approximation to the binomial distribution \\(\\text{Bin}(n, p)\\) with mean \\(np\\) and variance \\(np(1 - p)\\), together with the equations for a linear combination of Gaussian random variables. 6.3.2 Naomi as an ELGM In all, Naomi is a joint model on the observations \\[\\begin{equation} \\mathbf{y} = (y^{\\theta}_I), \\quad \\theta \\in \\{\\rho, \\alpha, \\kappa, \\rho^\\text{ANC}, \\alpha^\\text{ANC}, A\\}, \\quad I \\in \\mathcal{I}. \\end{equation}\\] The observations are modelled using the structured additive predictor \\(\\boldsymbol{\\mathbf{\\eta}}\\), which includes intercept effects, age random effects, and spatial random effects which may be concatenated into the latent field \\(\\mathbf{x}\\). The latent field is controlled by hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}}\\) which include standard deviations, first-order autoregressive model correlation parameters, and reparameterised Besag-York-Mollie model [BYM2; Simpson et al. (2017)] proportion parameters. These features are described in more detail in Appendix C.4. Naomi has a large Gaussian latent field, governed by a smaller number of hyperparameters \\(m < N\\). However, it has complexities which place it outside the class of LGMs, as defined in Section 3.3.4. Instead, it is an ELGM, as defined in Section 3.3.5. In an ELGM, each mean response is allowed to depend non-linearly upon more than one structured additive predictor. The departures of Naomi from the LGM framework are enumerated below. When dependence on a specific number of structured additive predictors is given, it is in isolation, rather than in conjunction. Throughout Naomi, processes are modelled at the finest district-age-sex division \\(i\\), but likelihoods are defined for observations aggregated over sets of indices \\(i \\in I\\). As such, these aggregate observations are related to \\(|I|\\) structured additive predictors, rather than just one. Multiple link functions are used in Naomi, such that there is no one inverse link function \\(g\\) as specified in definition of an LGM. This is a relatively minor point, and it is possible to specify models with several likelihoods in R-INLA by setting family to be vector valued [Section 6.4; Gómez-Rubio (2020)]. In Section 6.3.1.2, HIV incidence depends on district-level adult HIV prevalence and ART coverage (Equation (C.4))). Each \\(\\log(\\lambda_i)\\) therefore depends on 28 structured additive predictors, where 28 arises by the product of 2 sexes (male and female), 7 age groups (\\(\\{\\text{15-19}, \\ldots, \\text{45-49}\\}\\)), and 2 indicators, HIV prevalence and ART coverage. This reflects basic HIV epidemiology: incidence of sexually transmitted HIV is proportional to unsuppressed viral load among an individual’s potential sexual partners. The district-level adult averages are used as a proxy. In Section 6.3.1.2, the proportion recently infected \\(\\kappa_i\\) is given by a non-linear function (Equation (6.24)) of HIV incidence \\(\\lambda_i\\), HIV prevalence \\(\\rho_i\\), mean duration of recent infection \\(\\Omega_T\\) and proportion of long-term HIV infections misclassified as recent \\(\\beta_T\\). Though arguably a contorting of the ELGM framework, by considering \\(\\Omega_T\\) and \\(\\beta_T\\) as (Gaussian) linear predictors, then each \\(\\kappa_i\\) depends on four structured additive predictors. In Section 6.3.1.3, HIV prevalence and ART coverage among pregnant women are modelled as offset from their respective indicators in the general population. Thus each mean response depends on two structured additive predictors. The copy feature in R-INLA [Section 6.5; Gómez-Rubio (2020)] allows for this type of model structure. In Section 6.3.1.3, nested binomial likelihoods are used. In Section 6.3.1.4 a multinomial model with softmax link function is used. The multinomial likelihood takes as input \\(|\\{x': x' \\sim x\\}| + 1\\) structured additive predictors, one for each neighbouring district plus one for remaining in the home district. In Section 6.3.1.4 the probability of an individual receiving ART in a given district is the product of three probabilities. Though intended for use with LGMs, the advanced features of R-INLA [Chapter 6; Gómez-Rubio (2020)] allow for fitting of some ELGMs as described above. In some sense then, the above exercise is mostly academic rather than practical. The crux is that Naomi cannot be fit using R-INLA because it is not possible to specify such a complex model using a formula interface. The limitations of modelling with formula interfaces are not unique to R-INLA. Indeed, any such statistical software will see requests for users for additional features. The practical impossibility of meeting all feature requests motivates a more universal INLA implementation (Section 6.2) for advanced users. 6.4 AGHQ in moderate dimensions Inference for the Naomi model was previously conducted using a marginal Laplace approximation, and optimisation over the hyperparameters, implemented using TMB. This approach was illustrated for the epilepsy example in Section 6.2.1.2 and is analogous for Naomi. It would be desirable to instead integrate with respect to the hyperparameters, taking an INLA-like approach as described in Section 6.1.3. Section 6.2 attends to part of the challenge, by developing INLA methods which compatible with the Naomi model log-posterior as implemented in TMB. However, naive quadrature methods are not directly applicable to Naomi. This is because Naomi has \\(m = 24\\) hyperparameters. Although \\(m = 24\\) cannot be described as high-dimensional, it is certainly more than the \\(m < 4\\) or so hyperparameters typical for use of INLA. Hence here the term moderate-dimensional is used. Naive use of AGHQ with the product rule requires evaluation of \\(|\\mathcal{Q}(m, k)| = k^m\\) quadrature points. This would be intractable for \\(m = 24\\) and any \\(k > 1\\). As a result, a quadrature rule which does not scale exponentially is required for integrating out the Naomi model hyperparameters. This section focuses on the development of an AGHQ rule for moderate dimension, for use within an inference procedure for the Naomi model. Though the rule is to be applied within a nested Laplace approximation approach, it is not limited to this setting. 6.4.1 AGHQ with variable levels Rather than having the same number of quadrature nodes for each dimension of \\(\\boldsymbol{\\mathbf{\\theta}}\\), it is possible to use a variable number of nodes per dimension. In line with the terminology used in the mvQuad package, the number of nodes per dimension are referred to as “levels”. Let \\(\\mathbf{k} = (k_1, \\ldots, k_m)\\) be a vector of levels, where each \\(k_j \\in \\mathbb{Z}^+\\). A GHQ grid with (potentially) variable levels is then given by \\[\\begin{equation} \\mathcal{Q}(m, \\mathbf{k}) = \\mathcal{Q}(1, k_1) \\times \\cdots \\times \\mathcal{Q}(1, k_m). \\end{equation}\\] The size of this grid is given by the product of the levels \\(|\\mathcal{Q}(m, \\mathbf{k})| = \\prod_{j = 1}^m k_j\\). The corresponding weighting function is given by \\[\\begin{equation} \\omega(\\mathbf{z}) = \\prod_{j = 1}^m \\omega_{k_j}(z_j). \\end{equation}\\] This expression is a product of the univariate weighting functions for the relevant GHQ rule with \\(k_j\\) nodes. 6.4.2 Principal components analysis A special case of the variable levels approach above is to set the first \\(s \\leq m\\) levels to be \\(k\\) and the remaining \\(m - s \\geq 0\\) levels to be one. Denote \\(\\mathcal{Q}(m, s, k)\\) to be \\(\\mathcal{Q}(m, \\mathbf{k})\\) with levels \\(k_j = k, j \\leq s\\) and \\(k_j = 1, j > s\\) for some \\(s \\leq m\\). For example, for \\(m = 2\\) and \\(s = 1\\) then \\(\\mathbf{k} = (k, 1)\\). When the spectral decomposition is used to adapt the quadrature nodes, this choice of levels is analogous to principal components analysis (PCA). Figure 6.17 illustrates PCA-AGHQ for a case when \\(m = 2\\) and \\(s = 1\\). Since AGHQ with \\(k = 1\\) corresponds to the Laplace approximation, PCA-AGHQ can be interpreted as performing AGHQ on the first \\(s\\) principal components of the inverse curvature, and a Laplace approximation on the remaining \\(m - s\\) principal components. As such, it may be argued that PCA-AGHQ provides a natural compromise between the EB and AGHQ integration strategies. For concreteness, the normalising constant obtained by application of PCA-AGHQ to integration of the marginal Laplace approximation (Equation (6.12)) is given by \\[\\begin{equation} \\tilde p_\\texttt{PCA}(\\mathbf{y}) = |\\hat{\\mathbf{E}}_{\\texttt{LA}} \\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}}^{1/2}|\\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, s, k)} \\tilde p_\\texttt{LA}(\\hat{\\mathbf{E}}_{\\texttt{LA}, s} \\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}, s}^{1/2} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}) \\omega(\\mathbf{z}), \\end{equation}\\] where \\(\\hat{\\mathbf{E}}_{\\texttt{LA}, s}\\) is an \\(m \\times s\\) matrix containing the first \\(s\\) eigenvectors, \\(\\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}, s}\\) is the \\(s \\times s\\) diagonal matrix containing the first \\(s\\) eigenvalues, and \\[\\begin{equation} \\omega(\\mathbf{z}) = \\prod_{j = 1}^s \\omega_s(z_j) \\times \\prod_{j = s + 1}^m \\omega_1(z_j). \\end{equation}\\] Figure 6.17: Consider the function \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\) as described in Figure 6.3. Panel A shows the usual AGHQ nodes with a spectral matrix decomposition. Panel B shows the adapted PCA-AGHQ nodes \\(\\mathcal{Q}(2, 1, 3)\\). These nodes correspond exactly to those in Panel A along the first eigenvector. The proportion of variation explained by this direction is around 95%, with the remaining 5% explained by the second eigenvector. 6.5 Malawi case-study Figure 6.18: District-level HIV prevalence, ART coverage, and new HIV cases and HIV incidence for adults 15-49 in Malawi. Inference here was conducted using a Gaussian approximation and EB via TMB. This section presents a case-study of approximate Bayesian inference methods applied to the Naomi model in Malawi. Data from Malawi has previously been used to demonstrate the Naomi model, including as a part of the naomi R package vignette available from https://github.com/mrc-ide/naomi. Malawi was chosen for the vignette and this case-study in part because it has a small number of districts, \\(n = 30\\), limiting the computational demand of the model. Three Bayesian inference approaches were considered: Gaussian marginals and EB with TMB. This approach was previously used in production for Naomi. As short-hand, this approach is referred to as GEB. Gaussian marginals and PCA-AGHQ with TMB. This is a novel approach, enabled by the methodological work of Section 6.4. As short-hand, this approach is referred to as GPCA-AGHQ. NUTS with tmbstan. Conditional on assessing chain convergence and suitability, to be discussed in Section 6.5.1, inferences from NUTS represent a gold-standard. The TMB C++ user-template used to specify the log-posterior, described in Appendix C.4.4, was the same for each approach. The dimension of the latent field was \\(N = 467\\) and the dimension of the hyperparameters was \\(m = 24\\). For GEB and GPCA-AGHQ, hyperparameter and latent field samples were simulated following deterministic inference. For all methods, age-sex-district specific HIV prevalence, ART coverage and HIV incidence were simulated from the latent field and hyperparameter posterior samples. Model outputs from GEB are illustrated in Figure 6.18. 6.5.1 NUTS convergence and suitability The Naomi model was difficult to efficiently sample from using NUTS via tmbstan. Four chains run in parallel for 100 thousand iterations each were required to obtain acceptable NUTS diagnostics. For ease-of-storage, the samples were thinned by a factor of 20, resulting in 5000 iterations kept per chain, with the first 2500 removed as burn-in. The effective sample size ratios were typically low (Figure C.6). The lowest effective sample size was 208 (2.5% quantile 318, 50% quantile 1231, and 97.5% quantile 2776; Panel C.7A). The largest potential scale reduction factor was 1.021 (2.5% quantile 1, 50% quantile 1.003, and 97.5% quantile 1.017; Panel C.7B). Though inaccuracies remain possible, these diagnostics are sufficient to treat inferences obtained from NUTS as a gold-standard. Correlation structure in the posterior can result in sampler inefficiency. Each of the four pairs of AR1 log standard deviation \\(\\log(\\sigma)\\) and logit lag-one autocorrelation parameter \\(\\text{logit}(\\phi)\\) posteriors were positively correlated (mean absolute correlation 0.81, Figure C.8). These parameters are partially identifiable as variation can either be explained by high standard deviation and high autocorrelation or low standard deviation and low autocorrelation. On the other hand, the BYM2 log standard deviation \\(\\log(\\sigma)\\) and logit proportion parameter \\(\\text{logit}(\\phi)\\) were, as designed, more orthogonal (mean absolute correlation 0.17, Figure C.9). The informativeness of data about a parameter can be summarised by the posterior contraction (Schad, Betancourt, and Vasishth 2021) which compares the prior variance \\(\\mathbb{V}_\\text{prior}(\\phi)\\) to posterior variance \\(\\mathbb{V}_\\text{post}(\\phi)\\) via \\[\\begin{equation} c(\\phi) = 1 - \\frac{\\mathbb{V}_\\text{prior}}{\\mathbb{V}_\\text{post}(\\phi)}. \\end{equation}\\] Posterior variances were extracted from NUTS results, and prior variances obtained by simulating from a model with the likelihood component removed (Figure C.10). The average posterior contraction was positive for all latent field parameter vectors, and for the majority of hyperparameters (Figure C.11). However, for seven hyperparameters the posterior contraction was very close to zero. Furthermore, for some latent field parameter vectors, the average contraction was small. Based on this findings, these parameters may not be identifiable. 6.5.2 Use of PCA-AGHQ Figure 6.19: Under PCA, the proportion of total variation explained is given by the sum of the first \\(s\\) eigenvalues over the sum of all eigenvalues. A typical rule-of-thumb is to include dimensions sufficient to explain 90% of total variation. In this case, for computational reasons, 87% was considered sufficient. Figure 6.20: The full rank original covariance matrix (Panel A) was closely reproduced by its reduced rank (\\(s = 8\\)) matrix approximation (Panel B). For the PCA-AGHQ quadrature grid, a Scree plot based on the spectral decomposition of \\(\\hat {\\mathbf{H}}_\\texttt{LA}^{-1}\\) (as defined in Equation (6.13)) was used to select the number of principal components to keep (Figure 6.19). Keeping \\(s = 8\\) principal components was sufficient to explain 87% of total variation. The reduced rank approximation to the inverse curvature with this choice of \\(s\\) was visually similar to the full rank matrix (Figure 6.20). Figure 6.21: Each principal component loading, obtained by the eigendecomposition of the inverse curvature, gives the direction of maximum variation conditional on inclusion of each previous principal component loading. For example, the first principal component loading is a sum of log_sigma_alpha_as and logit_phi_alpha_as. The principal component (PC) loadings (Figure 6.21) provide interpretable information about which directions had the greatest variation. Many of the first PC loadings are sums of two hyperparameters. As such, there is some redundancy in the hyperparameter parameterisation, supporting the findings of Section 6.5.1 regarding correlation structure in the hyperparameter posterior. It is exactly this correlation structure that PCA, and PCA-AGHQ, looks to utilise. Figure 6.22: The grey histograms show the 24 hyperparameter marginal distributions obtained with NUTS. The green lines indicate the position of the 6561 PCA-AGHQ nodes projected onto each hyperparameter marginal. For some hyperparameters, the PCA-AGHQ nodes vary over the domain of the posterior marginal distribution, while for others they concentrate at the mode. Projecting the \\(3^8 = 6561\\) PCA-AGHQ quadrature nodes onto each hyperparameter dimension, there was substantial variation in coverage by hyperparameter (Figure 6.22). Approximately 12 hyperparameters had well covered marginals: greater than the 8 naively obtained with a dense grid, but nonetheless far fewer than the full 24. Coverage was higher among hyperparameters on the logistic scale, and lower among hyperparameters on the logarithmic scale. This discrepancy occurred due to logistic hyperparameters naturally having higher posterior marginal standard deviation than logarithmic hyperparameters (Figure C.13). 6.5.3 Time taken Figure 6.23: The number of hours taken to perform inference for the Naomi ELGM (Section 6.3.1) using each approach. Inference with NUTS took 79 hours, while inference with GPCA-AGHQ took 1.2 hours and GEB just 0.9 minutes (Figure 6.23). Both the NUTS and GPCA-AGHQ algorithms can be run under a range of settings, trading off accuracy and runtime. 6.5.4 Inference comparison Posterior inferences from GEB, GPCA-AGHQ and NUTS were compared using point estimates (Section 6.5.4.1) and distributional quantities (Section 6.5.4.2). 6.5.4.1 Point estimates Figure 6.24: The latent field posterior mean and posterior standard deviation point estimates from each inference method as compared with those from NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. For both the posterior mean and posterior standard deviation, GPCA-AGHQ reduced RMSE and MAE as compared with GEB. Latent field point estimates obtained from GPCA-AGHQ were closer to the gold-standard results from NUTS than those obtained from GEB (Figure 6.24). The root mean square error (RMSE) between posterior mean estimates from GPCA-AGHQ and NUTS (0.063) was 20% lower than that between GEB and NUTS (0.078). For the posterior standard deviation estimates, there was a substantial 60% reduction in RMSE: from 0.14 (TMB) to 0.05 (PCA-AGHQ). However, puzzlingly, improvements in latent field estimate accuracy only transferred to model outputs to a limited extent (Figures C.15 and C.16). 6.5.4.2 Distributional quantities 6.5.4.2.1 Kolmogorov-Smirnov Figure 6.25: The average Kolmogorov-Smirnov (KS) test statistic for each latent field parameter of the Naomi model. Vectors of parameters were grouped together. For points above the dashed line at zero, performance of GEB was better. For points below the dashed line, performance of GPCA-AGHQ was better. Most notably, for the latent field parameters ui_lambda_x the test statistic for GEB was substantially higher than for GPCA-AGHQ. This parameter, of length 32, corresponds to \\(\\mathbf{u}_x^\\lambda\\) and plays a key role in the ART attendance component of the Naomi (Section 6.3.1.4). Figure 6.26: The parameter ui_lambda_x[26] had the greatest difference in KS test statistics between GEB and GPCA-AGHQ to NUTS. For this parameter, the potential scale reduction factor was 1 and effective sample size was 2100. The two-sample Kolmogorov-Smirnov (KS) test statistic (Smirnov 1948) is the maximum absolute difference between two ECDFs \\(F(\\omega) = \\frac{1}{n} \\sum_{i = 1}^n \\mathbb{I}_{\\phi_i \\leq \\omega}\\). It is a relatively stringent, worst case, measure of distance between empirical distributions. The average KS test statistic for GPCA-AGHQ (0.077) was 8.6% less than the average KS test statistic for GEB (0.084). For both GEB and GPCA-AGHQ the KS test statistic for a parameter was correlated with low NUTS ESS (Figure C.17). This may be due to by difficulties estimating particular parameters for all inference methods, or high KS values caused by NUTS inaccuracies. 6.5.4.2.2 Maximum mean discrepancy Let \\(\\Phi^{1} = \\{\\boldsymbol{\\mathbf{\\phi}}^1_i\\}_{i = 1}^n\\) and \\(\\Phi^2 = \\{\\boldsymbol{\\mathbf{\\phi}}^2_i\\}_{i = 1}^n\\) be two sets of joint posterior samples, and \\(k\\) be a kernel. The maximum mean discrepancy [MMD; Gretton et al. (2006)] is a measure of distance between joint distributions, and can be estimated empirically by samples \\[\\begin{equation} \\text{MMD}(\\Phi^1, \\Phi^2) = \\sqrt{\\frac{1}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}^1_i, \\boldsymbol{\\mathbf{\\phi}}^1_j) - \\frac{2}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}_i^1, \\boldsymbol{\\mathbf{\\phi}}_j^2) + \\frac{1}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}^2_i, \\boldsymbol{\\mathbf{\\phi}}^2_j)}. \\end{equation}\\] The kernel was set to \\(k(\\boldsymbol{\\mathbf{\\phi}}^1, \\boldsymbol{\\mathbf{\\phi}}^2) = \\exp(-\\sigma \\lVert \\boldsymbol{\\mathbf{\\phi}}^1 - \\boldsymbol{\\mathbf{\\phi}}^2 \\rVert^2)\\) with \\(\\sigma\\) estimated from data using the kernlab package (Karatzoglou et al. 2019). The first and third order MMD statistics for GEB were 0.08 and 0.0048. Those of GPCA-AGHQ (0.078 and 0.0044) were just 3% and 7% lower. 6.5.5 Exceedance probabilities As a more realistic use case for the Naomi model outputs, consider the two following case-studies based on exceedance probabilities. 6.5.5.1 Meeting the second 90 Ambitious targets for scaling up ART treatment have been developed by UNAIDS, with the goal of ending the AIDS epidemic by 2030 (UNAIDS 2014). Meeting the 90-90-90 fast-track target requires that 90% of people living with HIV know their status, 90% of those are on ART, and 90% of those have a suppressed viral load. Inferences from Naomi can be used to identify treatment gaps by calculating the probability that the second 90 target has been met, that is \\(\\mathbb{P}(\\alpha_i > 0.9^2 = 0.81)\\) for each strata \\(i\\). Figure 6.27: The probability each strata has met the second 90 (ART coverage above 81%) calculated using each inference method, as compared with NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. Strata probabilities of having met the second 90 target were more accurately estimated by GPCA-AGHQ than GEB (Figure 6.27). Both GPCA-AGHQ and GEB had substantial error as compared to results from NUTS, however, particularly for girls and women. This discrepancy in accuracy by sex may be caused by interactions between the household survey and ANC components of the model creating a more challenging posterior geometry. 6.5.5.2 Finding strata with high incidence Some HIV interventions are cost-effective only within high HIV incidence settings, typically defined as higher than 1% incidence per year. Inferences from Naomi can be used to calculate the probability of a strata having high incidence by evaluating \\(\\mathbb{P}(\\lambda_i > 0.01)\\). Figure 6.28: The probability each strata has high HIV incidence (above 1% per year) calculated using each inference method, as compared with NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. GPCA-AGHQ gave more accurate estimates of the probability that a strata has high HIV incidence than GEB (Figure 6.28). Again, both methods had significant error. Unlike in Section 6.5.5.1, there was little difference in error by sex. 6.6 Discussion This chapter made two main contributions. First, the universal INLA implementation of Section 6.2. Second, the PCA-AGHQ rule (Sections 6.4). Both were applied to the Naomi model in Malawi in Section 6.5. These contributions are discussed in turn, before outlining suggestions for future work. 6.6.1 A universal INLA implementation Monnahan and Kristensen (2018) write that “to our knowledge, TMB is the only software platform capable of toggling between integration tools [the Laplace approximation and NUTS] so effortlessly”. Section 6.2 made important progress towards adding INLA to the integration tools easily accessible using TMB. Reaching this milestone would be of significant value to both applied and methodological researchers. The implementation is not intended to replace R-INLA, and indeed for the majority of users a formula-based interface is preferred. Both formula-based and universal statistical tools have value, as they inhabit different use-cases. For the NUTS algorithm, a universal interface is available via Stan, and packages such as brms (Bürkner 2017) and rstanarm (Goodrich et al. 2020) enable researchers to fit common models using a formula interface. Furthermore, developers of formula-based tools do have incentives to engage with the needs of their users, and indeed do so. For example, after requesting for the generalised binomial distribution used in Equation (6.28) to be included in R-INLA, a prototype version was shortly made available. That said, it is ultimately more sustainable for advanced users to have capacity to implement their own distributions and models. 6.6.2 PCA-AGHQ with application to INLA for Naomi For the simplified Naomi model applied to data from Malawi, GPCA-AGHQ more accurately inferred latent field posterior marginal distributions than GEB. However, model output posterior marginals did not see the same improvements. Approximate posterior exceedance probabilities from both GEB and GPCA-AGHQ had systematic inaccuracies as compared with NUTS. GEB and GPCA-AGHQ were substantially faster than NUTS, which took over two days to reach convergence. Inaccuracies in model outputs from GEB and GPCA-AGHQ do have potential to meaningfully mislead policy (Sections 6.5.5.1 and 6.5.5.2). As such, where possible, gold-standard NUTS results should be computed. Though NUTS is to slow too run during a workshop, it could be run afterwards. As the UNAIDS HIV estimates process occurs annually, requiring days to compute more accurate estimates is viable. That said, Malawi is one of the countries with the fewest number of districts. As NUTS took days to run in Malawi, for larger countries, with hundreds of districts, it may be impossible to run NUTS to convergence, and approximate methods may be required. To empower users, GPCA-AGHQ and NUTS could be added to the Naomi web interface (https://naomi.unaids.org) as alternatives to GEB. Analysts would be able to quickly iterate over model options using EB, before switching to a more accurate approach once they are happy with the results. PCA-AGHQ can be adjusted to suit the computational budget available by choice of the number of dimensions kept in the PCA \\(s\\) and the number of points per dimension \\(k\\). The scree plot is a well established heuristic for choosing \\(s\\). Heuristics for choosing \\(k\\) are less well established. Whether it is preferable for a given computational budget to increase \\(s\\) or increase \\(k\\) is an open question. Further strategies, such as gradually lowering \\(k\\) over the principal components, could also be considered. 6.6.3 Suggestions for future work Finally, this section presents suggestions for future work based on this chapter. Some suggestions relate more to individual contributions, others take a broader view, or relate to multiple contributions. 6.6.3.1 Further comparisons Comparison to further Bayesian inference methods could be included in Section 6.5. Four possibilities stand out as being particularly valuable: There exist other quadrature rules for moderate dimension, such as the CCD. It would be of interest to compare INLA with a PCA-AGHQ rule to INLA with other such quadrature rules. Rather than use quadrature to integrate the marginal Laplace approximation, an alternative approach is to run HMC (Monnahan and Kristensen 2018; C. Margossian et al. 2020). When run to convergence, inferential error of this method would solely be due to the Laplace approximation, helping to clarify the extent to which the inferential error of INLA is attributable to the quadrature grid. Preliminary testing of this approach, using tmbstan and setting laplace = TRUE, did not show immediate success but likely could be worked on. NUTS is not especially well suited to sampling from Gaussian latent field models like Naomi. Other MCMC algorithms, such as blocked Gibbs sampling (Geman and Geman 1984) or slice sampling (Neal 2003), could be considered. It may be difficult to implement such algorithms using TMB. Many MCMC algorithms are implemented and customisable (including, for example, the choice of block structure) within the NIMBLE probabilistic programming language (Valpine et al. 2017). Requiring rewriting the Naomi model log-posterior outside of TMB would be a substantial downside. Finally, it would be of substantial interest to implement the Naomi model using the iterative INLA method via inlabru. However, as inlabru, like R-INLA, is based on a formula interface, it may not be possible to do so directly. 6.6.3.2 Better quadrature grids PCA-AGHQ is a sensible approach to allocating more computational to dimensions which contribute more to the integral in question. However, its application to Naomi surfaced instances where it overlooked potential benefits, or otherwise did not behave as one might wish: The amount of variation explained in the Hessian matrix may not be of direct interest. For the Naomi model, interest is in the effect of including each dimension on the relevant model outputs. As such, using alternative measures of importance from sensitivity analysis, such as Shapley values (Shapley et al. 1953) or Sobol indices, could be preferable. Use of PCA is challenging when the dimensions have different scales. For the Naomi model, logit-scale hyperparameters were systematically favoured over those on the log-scale. When the quadrature rule is used within an INLA algorithm, it is more important to allocate quadrature nodes to those hyperparameter marginals which are non-Gaussian. This is because the Laplace approximation is exact when the integrand is Gaussian, so a single quadrature node is sufficient. The difficulty is, of course, knowing in advance which marginals will be non-Gaussian. This could be done if there were a cheap way to obtain posterior means, which could then be compared to posterior modes obtained using optimisation. Another approach would be to measure the fit of marginal samples from a cheap approximation, like EB. The measures of fit would have to be for marginals, ruling out approaches like PSIS (Yao et al. 2018) which operate on joint distributions. Finally, it may be possible to achieve better performance by pruning and prerotation, as discussed by Jäckel (2005). 6.6.3.3 Computational improvements Approximation: The most significant improvement likely could come by using approximations to the Laplace marginals. In particular, he simplified Laplace marginals of Wood (2020) (Section 6.1.3.4) should be implemented, as the ELGM setting has relatively dense precision matrices. Parallelisation: Integration over a moderate number of hyperparameters resulted in use of quadrature grids with a large number of nodes. Computation at each node is independent, so algorithm run-time could potentially be significantly improved using parallel computing. This point is discussed by Kristensen et al. (2016) who highlight that TMB could applied to perform function evaluations in parallel, for example using the parallel R package. Hardware: Further computational speed-ups might be obtained using graphics processing units (GPUs) specialised for the relevant matrix operations. 6.6.3.4 Statistical theory The class of functions which are integrated exactly by PCA-AGHQ remains to be shown. Theorem 1 of Stringer, Brown, and Stafford (2022) bounds the total variation error of AGHQ, establishing convergence in probability of coverage probabilities under the approximate posterior distribution to those under the true posterior distribution. Similar theory could be established for PCA-AGHQ, or more generally AGHQ with varying levels. The challenge of connecting this theory to nested use of any quadrature rule, like that in the INLA algorithm, remains an important open question. 6.6.3.5 Testing quadrature assumptions It may be possible to test the assumptions made by use of AGHQ grids, allowing their suitability for a particular integral to be assessed. Specifically, AGHQ assumes that the integrand is closely approximated by a polynomial multiplied by a Gaussian density. Given NUTS hyperparameter samples (or better yet, hyperparameter samples from the Laplace NUTS hybrid discussed in Section 6.6.3.1) this assumption could be tested by fitting a model using a polynomial times Gaussian kernel. This approach could be generalised to also test the suitability of PCA-AGHQ grids. 6.6.3.6 Exploration of the accuracy of INLA for complex models The universal INLA implementation can be used to measure the accuracy of INLA for a wider range of models than were previously possible. An important benefit of using TMB is that comparisons to NUTS can easily be made using exactly the same model template. Among the ELGM-type structures of particular interest for spatial epidemiology are aggregated likelihood models and evidence synthesis models. 6.6.3.7 Methods dissemination The approach used to implement Laplace marginals with TMB was relatively ad-hoc, and involved modification of the TMB C++ template (Section 6.2.1.4). For wider dissemination of this method, it is important that the user is not burdened with making these modifications. One possibility would be to change the random argument in TMB::MakeADFun to allow for indexing. Another (less desirable) option would be to algorithmically generate the modified TMB C++ template based on the original template. Figure 6.29: Monthly R package downloads from the Comprehensive R Archive Network (CRAN) for brms, glmmTMB, nimble, rstan and TMB, obtained using the cranlogs (Csárdi 2023) R package. Unfortunately, R-INLA is not available from CRAN, and so could not be included in this figure. The official rstan documentation recommends installation of a development version hosted outside CRAN. As such, this metric may underestimate the popularity of rstan. Though gaining in popularity, the user-base of TMB is relatively small, and package downloads are in large part driven by use of the more easy-to-use glmmTMB package (Figure 6.29). For users unfamiliar with C++, it can be challenging to use TMB directly. One possibility is to look to disseminate methods via the users of glmmTMB. Another approach would be to implement methods in other probabilistic programming languages, such as Stan or NIMBLE. Implementation in Stan is made possible by the bridgestan package (Ward 2023), which provides access to the methods of a Stan model, and could be combined with the prototyping of an adjoint-differentiated Laplace approximation done in Stan by C. Margossian et al. (2020). The ratio of downloads of rstan as compared with brms suggests a larger proportion of Stan users are interested in specifying their own model. Implementation in NIMBLE is also possible as of version >1.0.0 which includes functionality for automatic differentiation and Laplace approximation [Part V; de Valpine et al. (2023)] like TMB built using CppAD. Both NIMBLE and Stan developers are actively looking into implementation of algorithms combining the Laplace approximation and quadrature. References Bachl, Fabian E, Finn Lindgren, David L Borchers, and Janine B Illian. 2019. “inlabru: an R package for Bayesian spatial modelling from ecological survey data.” Methods in Ecology and Evolution 10 (6): 760–66. Baydin, Atılım Günes, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18 (1): 5595–5637. Bell, Bradley. 2023. “CppAD: a package for C++ algorithmic differentiation.” http://www.coin-or.org/CppAD. Berild, Martin Outzen, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. 2022. “Importance Sampling with the Integrated Nested Laplace Approximation.” Journal of Computational and Graphical Statistics 31 (4): 1225–37. Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12. Bollhöfer, Matthias, Olaf Schenk, Radim Janalik, Steve Hamm, and Kiran Gullapalli. 2020. “State-of-the-art sparse direct solvers.” Parallel Algorithms in Computational Science and Engineering, 3–33. Box, George EP, and Kenneth B Wilson. 1992. “On the experimental attainment of optimum conditions.” In Breakthroughs in Statistics: Methodology and Distribution, 270–310. Springer. Breslow, Norman E, and David G Clayton. 1993. “Approximate inference in generalized linear mixed models.” Journal of the American Statistical Association 88 (421): 9–25. Brooks, Mollie E, Kasper Kristensen, Koen J Van Benthem, Arni Magnusson, Casper W Berg, Anders Nielsen, Hans J Skaug, Martin Machler, and Benjamin M Bolker. 2017. “glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.” The R Journal 9 (2): 378–400. Brown, Patrick E. 2015. “Model-based geostatistics the easy way.” Journal of Statistical Software 63: 1–24. Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Casella, George. 1985. “An introduction to empirical Bayes data analysis.” The American Statistician 39 (2): 83–87. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chiuchiolo, Cristian, Janet van Niekerk, and Håvard Rue. 2023. “Joint Posterior Inference for Latent Gaussian Models with r-INLA.” Journal of Statistical Computation and Simulation 93 (5): 723–52. Csárdi, Gábor. 2023. cranlogs: Download Logs from the ’RStudio’ ’CRAN’ Mirror. Davis, Philip J, and Philip Rabinowitz. 1975. Methods of numerical integration. Academic Press. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dennis Jr, John E, David M Gay, and Roy E Walsh. 1981. “An adaptive nonlinear least-squares algorithm.” ACM Transactions on Mathematical Software (TOMS) 7 (3): 348–68. Diaz, Jose Monsalve, Swaroop Pophale, Oscar Hernandez, David E Bernholdt, and Sunita Chandrasekaran. 2018. “OpenMP 4.5 Validation and Verification Suite for Device Offload.” In Evolving OpenMP for Evolving Architectures: 14th International Workshop on OpenMP, IWOMP 2018, Barcelona, Spain, September 26–28, 2018, Proceedings 14, 82–95. Springer. Diggle, Peter J, and Emanuele Giorgi. 2016. “Model-based geostatistics for prevalence mapping in low-resource settings.” Journal of the American Statistical Association 111 (515): 1096–1120. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Fattah, EA, JV Niekerk, and H Rue. 2022. “Smart gradient-an adaptive technique for improving gradient estimation.” Foundations of Data Science. Fournier, David A, Hans J Skaug, Johnoel Ancheta, James Ianelli, Arni Magnusson, Mark N Maunder, Anders Nielsen, and John Sibert. 2012. “AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models.” Optimization Methods and Software 27 (2): 233–49. Fuglstad, Geir-Arne, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2019. “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association 114 (525): 445–52. Gaedke-Merzhäuser, Lisa, Janet van Niekerk, Olaf Schenk, and Håvard Rue. 2023. “Parallelized integrated nested Laplace approximations for fast Bayesian inference.” Statistics and Computing 33 (1): 25. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Gómez-Rubio, Virgilio, and Håvard Rue. 2018. “Markov Chain Monte Carlo with the Integrated Nested Laplace Approximation.” Statistics and Computing 28: 1033–51. Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm. Gretton, Arthur, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex Smola. 2006. “A Kernel Method for the Two-Sample-Problem.” Advances in Neural Information Processing Systems 19. Jäckel, Peter. 2005. “A note on multivariate Gauss-Hermite quadrature.” London: ABN-Amro. Re. Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Maintainer Alexandros Karatzoglou. 2019. “Package ‘Kernlab’.” CRAN R Project. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Laplace, P. S. 1774. “Memoire sur la probabilite de causes par les evenements.” Memoire de l’Academie Royale Des Sciences. Lenth, Russell. 2009. “Response-Surface Methods in R, Using rsm.” Journal of Statistical Software 32 (7): 1–17. https://doi.org/10.18637/jss.v032.i07. Leppik, IE, FE Dreifuss, T Bowman-Cloyd, N Santilli, M Jacobs, C Crosby, J Cloyd, et al. 1985. “A double-blind crossover evaluation of progabide in partial seizures.” Neurology 35 (4): 285. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Margossian, Charles C. 2019. “A review of automatic differentiation and its efficient implementation.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (4): e1305. Margossian, Charles, Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. “Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.” Advances in Neural Information Processing Systems 33: 9086–97. Martino, Sara, and Andrea Riebler. 2020. “Integrated Nested Laplace Approximations (INLA).” In Wiley StatsRef: Statistics Reference Online, 1–19. John Wiley & Sons, Ltd. https://doi.org/https://doi.org/10.1002/9781118445112.stat08212. Martino, Sara, and Håvard Rue. 2009. “Implementing approximate Bayesian inference using Integrated Nested Laplace Approximation: A manual for the inla program.” Department of Mathematical Sciences, NTNU, Norway. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Naylor, John C, and Adrian FM Smith. 1982. “Applications of a method for the efficient computation of posterior distributions.” Journal of the Royal Statistical Society Series C: Applied Statistics 31 (3): 214–25. Neal, Radford M. 2003. “Slice sampling.” The Annals of Statistics 31 (3): 705–67. Noor, Abdisalan Mohamed. 2022. “Country Ownership in Global Health.” PLOS Global Public Health 2 (2): e0000113. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Pebesma, Edzer J. 2004. “Multivariable geostatistics in S: the gstat package.” Computers & Geosciences 30: 683–91. Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016. Press, William H, Teukolsky Saul A, William T Vetterling, and Brian P Flannery. 2007. Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press. Rue, Håvard. 2001. “Fast sampling of Gaussian Markov random fields.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 325–38. Rue, Håvard, and Turid Follestad. 2001. “GMRFLib: a C-library for fast and exact simulation of Gaussian Markov random fields.” SIS-2002-236. Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Rue, Håvard, and Sara Martino. 2007. “Approximate Bayesian inference for hierarchical Gaussian Markov random field models.” Journal of Statistical Planning and Inference 137 (10): 3177–92. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Rue, Håvard, Andrea Riebler, Sigrunn H Sørbye, Janine B Illian, Daniel P Simpson, and Finn K Lindgren. 2017. “Bayesian computing with INLA: a review.” Annual Review of Statistics and Its Application 4: 395–421. Schad, Daniel J, Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103. Schlüter, Daniela K, Martial L Ndeffo-Mbah, Innocent Takougang, Tony Ukety, Samuel Wanji, Alison P Galvani, and Peter J Diggle. 2016. “Using community-level prevalence of Loa loa infection to predict the proportion of highly-infected individuals: statistical modelling to support lymphatic filariasis and onchocerciasis elimination programs.” PLOS Neglected Tropical Diseases 10 (12): e0005157. Shapley, Lloyd S et al. 1953. “A value for n-person games.” Princeton University Press Princeton. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Skaug, Hans J. 2009. “Discussion of \"Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations\".” In Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:319–92. 2. Wiley Online Library. Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” Annals of Mathematical Statistics 19 (2): 279–81. Spiegelhalter, David, Andrew Thomas, Nicky Best, and Wally Gilks. 1996. “BUGS 0.5 Examples.” MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK 256. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Thall, Peter F, and Stephen C Vail. 1990. “Some covariance models for longitudinal count data with overdispersion.” Biometrics, 657–71. Thyng, Kristen M, Chad A Greene, Robert D Hetland, Heather M Zimmerle, and Steven F DiMarco. 2016. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29 (3): 9–13. Tierney, Luke, and Joseph B Kadane. 1986. “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association 81 (393): 82–86. UNAIDS. 2014. “90-90-90. An ambitious treatment target to help end the AIDS epidemic.” ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. Valpine, Perry de, Daniel Turek, Christopher J Paciorek, Clifford Anderson-Bergman, Duncan Temple Lang, and Rastislav Bodik. 2017. “Programming with models: writing statistical algorithms for general model structures with NIMBLE.” Journal of Computational and Graphical Statistics 26 (2): 403–13. Van Niekerk, Janet, Elias Krainski, Denis Rustand, and Håvard Rue. 2023. “A new avenue for Bayesian inference with INLA.” Computational Statistics & Data Analysis 181: 107692. Ward, Brian. 2023. bridgestan: BridgeStan, Accessing Stan Model Functions in R. Weiser, Constantin. 2016. mvQuad: Methods for Multivariate Quadrature. http://CRAN.R-project.org/package=mvQuad. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. "],["conclusions.html", "7 Conclusions 7.1 Contributions 7.2 Future work 7.3 Broader reflections", " 7 Conclusions This chapter concludes the thesis by discussing its most important contributions, some promising avenues for future work, and broader reflections about the work. 7.1 Contributions Effective response to the HIV epidemic depends on strategic information provided by models of data. This thesis contributes both to generating this information and to advancing statistical methods. Chapter 4 found that spatially structured random effects should be used in small-area models for HIV. Kernel models performed better for data simulated from an adjacency-based spatial process than adjacency-based models did for data simulated from a kernel model. However, adjacency-based models performed better under cross-validation of real HIV survey data. Model comparison was conducted using strictly proper scoring rules, with checks for calibration. Figure 7.1: Panel A shows the front page of UNAIDS (2023b). Panel B shows the page containing text and a figure based on the work done in Chapter 5. In this figure, 30 countries are included. Chapter 5 estimated HIV risk group proportions for AGYW to enable implementation of the Global AIDS strategy (UNAIDS 2021b). Risk group proportion estimates were used to behaviourally disaggregate HIV prevalence and incidence and assess the benefits of a variety of risk stratification strategies. This work is the basis for a tool used to prioritise delivery of HIV prevention services by countries in SSA. The tool now encompasses at least 30 countries, expanding from the initial 13 included [Figure 7.1; UNAIDS (2023b)]. Models will be rerun each year to populate the tool with updated information as a part of the UNAIDS annual HIV estimates process. Alongside these applied contributions, Chapter 5 exemplified specification of complex multinomial spatio-temporal models in R-INLA using the Poisson-multinomial transformation, including using two- and three-way Kronecker product interactions. The Naomi model has been used in over 35 countries in SSA to produce district-level estimates of HIV indicators by synthesising evidence from multiple sources. Chapter 6 developed deterministic Bayesian inference methods, motivated by the aim of providing more accurate inferences for this challenging and practically important model. Its most important methodological contributions are two-fold. First, an implementation of INLA which is compatible with models specified using a TMB C++ template. For the first time, practitioners can now fit essentially any model using the INLA method. Second, a quadrature rule which combines PCA and AGHQ to naturally extend the applicability of INLA methods to moderate hyperparameter dimension, allowing more complex models to be fit. Additionally, Chapter 6 provides detailed description and analysis of the Naomi model. Indeed, Esra et al. (2024) used tables and text from Appendix C in an update to Eaton et al. (2021). 7.2 Future work Promising avenues for future work, that I might prioritise, include: It would be valuable to extend the risk group model developed in Chapter 5, and the resulting tool, to include all adults 15-49. Although AGYW are disproportionately at risk of HIV infection, 56% of new infections in SSA occur in other demographic groups. Modelling of age-stratified sexual partnerships (Wolock et al. 2021) may help to overcome reporting biases by harmonising male and female reporting. This model would likely fall outside the scope of R-INLA, but would be possible to write with TMB and therefore amenable to the inference methods advanced in Chapter 6. Although suitable for early stage research, wider adoption of the INLA implementation developed in Chapter 6 would be greatly enhanced by improvements to its speed and usability. The most important speed enhancement would come from using the simplified approximation to the Laplace marginals developed by Wood (2020). Although the naive implementation used in this thesis is viable for integrating Laplace marginals over a small number of hyperparameter quadrature nodes, such as the \\(3^2 = 9\\) nodes used Sections 6.2.2 and 6.2.1, it becomes prohibitively slow for larger numbers. Usability would be improved by providing the method as a part of statistical software, likely via the aghq package. The primary difficulty which would have to be overcome to do so is that the random argument of TMB::MakeADFun does not allow indexing. Figure 7.2: For the Loa loa ELGM (Section 6.2.2), increasing the number of quadrature nodes per hyperparameter dimension from \\(k = 3\\) to \\(k = 7\\) did little to improve accuracy. On the other hand, using Laplace marginals rather than Gaussian marginals did have a substantial effect (Figures 6.12 and 6.13). It would be valuable to better understand, and aspirationally have diagnostics for, the circumstances under which accuracy of INLA methods could be improved by additional computation. The universal INLA implementation developed in Chapter 6 enables empirical and methodological research that was previously not possible, or prohibitively difficult. INLA-like methods can now be tested for a broader class of models, such as the Loa loa and Naomi ELGMs (Sections 6.2.2 and 6.5). That a single TMB C++ template for the log-posterior supports inference using multiple methods, including gold-standard NUTS via tmbstan, is a substantial asset in conducting this type of research. As an example research question, within this class of models, what is the best way to obtain accurate inferences within a fixed computational budget. Is it better to use additional hyperparameter grid points, or more accurate latent field approximations? For the Loa loa ELGM in Section 6.2.2, the benefit of using Laplace marginals exceeded that of a denser AGHQ grid (Figure 7.2). It would also be of interest to find methods to obtain accurate inferences for particular parameters, or functions of parameters, using INLA-like methods. For example, in Section 6.5, although the PCA-AGHQ grid improved latent field parameter inferences, it did little to improve model output accuracy. Is there a way in which computational effort could be focused on obtaining accurate estimates of Naomi model outputs? Additionally, it is relatively easy to make alterations to the implementation, facilitating possible innovation in the design of INLA-like algorithms. Previously, it has been difficult for researchers not involved in development of R-INLA to engage in methodological work about the INLA method. Theoretical research could be conducted to complement the work described above, extending the findings of Bilodeau, Stringer, and Tang (2022). This work is benefited by the complete specification (Appendix C.3) of the INLA-like algorithm used in this thesis. 7.3 Broader reflections Conducting the work in this thesis involved testing the boundaries of available statistical software. For example, I found it challenging, if not impossible, to implement a common model using different inferential software. As the Frequently Asked Questions section of the R-INLA website (Havard Rue 2023) notes: “the devil is in the details”. Similarly, I encountered issues implementing a desired collection of different models in a common inferential software. From personal experience, my colleagues have also encountered similar problems. Needless to say, conflation of statistical models and inference methodologies limits the validity of any findings. To avoid this issue I implemented all models in Chapters 4 and 6 using TMB model templates. (Additionally, I would recommend implementing the model used in Chapter 5 in TMB for future development.) Alongside being sufficiently flexible to meet my model specification requirements, TMB is compatible with a range of inference methodologies, including those advanced in this thesis. As such, TMB remains (Osgood-Zimmerman and Wakefield 2023) an under-rated statistical tool. In demonstrating some of its capabilities, I hope this thesis contributes to its wider adoption. The work done in this thesis, particularly Chapters 4 and 6, focused on producing experimental, empirical evidence. This approach reflects the complexity of the models and methods used in this thesis. Understanding complex systems from a theoretical perspective can be challenging. That said, in my opinion the work in this thesis could benefit from closer integration with statistical theory. Although a full theoretical understanding of these models or algorithms may be ambitious, better understanding simplified examples, limiting cases, or constituent parts could still prove valuable. Working with the data in Chapter 5 deepened my appreciation for the realistic challenges faced in applied work, and data quality being linchpin for any successful statistical analysis. While from the real world, the data in Chapters 4 and 6 underwent substantial cleaning, processing, and vetting before I handled them, as is typical in methodological research. It is important that methodological and theoretical statisticians appreciate the real challenges of applied work, by doing it themselves, or working in close collaboration with those who do. There are both direct and indirect paths to impact for the work in this thesis. Directly, the methodological contributions of Chapters 4 and 6 may eventually lead to marginally more accurate indicator estimates, contributing to a broadly more effective response. However, these improvements in accuracy seem of minor consequence within the broader context of the HIV response, and factors limiting its effectiveness. The applied contributions of Chapter 5 have a more promising case for direct impact. Indeed, I have seen evidence of engagement with this work by decision makers. To the best of my abilities, this thesis, and the work described within it, was written in keeping with the principles of open science. I hope that having done so facilitates my work to be scrutinised, and more optimistically, built upon. In part this hope has already been realised, as with limited input from me, Dr. Kathryn Risher was able to extend my code for Chapter 5 to include additional countries (Panel 7.1B). This would not have been possible without tools from the R ecosystem such as rmarkdown and rticles for reporting, devtools for R package development, as well as those written by software engineers within the MRC Centre for Global Infectious Disease Analysis such as orderly and didehpc. It is crucial that academia adjusts to appropriately incentivises software contributions, and encourages adaption of open science best practices. Work done to inform public health decision making should be held to high standards of transparency, reproducibility and collaboration. Especially so in an outbreak response scenario (Grieve et al. 2023), where time is limited and decisions may be of significant consequence. References Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Grieve, Richard, Youqi Yang, Sam Abbott, Giridhara R Babu, Malay Bhattacharyya, Natalie Dean, Stephen Evans, et al. 2023. “The Importance of Investing in Data, Models, Experiments, Team Science, and Public Trust to Help Policymakers Prepare for the Next Pandemic.” PLOS Global Public Health 3 (11): e0002601. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Rue, Havard. 2023. “‘R-INLA‘ Project - FAQ.” https://www.r-inla.org/faq. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. Wolock, Timothy M, Seth Flaxman, Kathryn A Risher, Tawanda Dadirai, Simon Gregson, and Jeffrey W Eaton. 2021. “Evaluating distributional regression strategies for modelling self-reported sexual age-mixing.” Edited by Eduardo Franco, Talía Malagón, and Adam Akullian. eLife 10 (June): e68318. https://doi.org/10.7554/eLife.68318. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. "],["models-for-areal-spatial-structure.html", "A Models for areal spatial structure A.1 Comparison of AGHQ to NUTS A.2 Lengthscale prior sensitivity A.3 Simulation study A.4 HIV study", " A Models for areal spatial structure A.1 Comparison of AGHQ to NUTS Figure A.1: A comparison of time taken to fit AGHQ via aghq as compared with NUTS via tmbstan for each inferential model. For the models run using NUTS via tmbstan there was significant variation in time taken depending on initial random seed. As such, these timings and more broadly the inferences obtained from NUTS in Appendix A.1 should be interpreted with appropriate skepticism. Figure A.2: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting an IID inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1686, and the maximum value of the potential scale reduction factor was 1.00. Figure A.3: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a Besag inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1056, and the maximum value of the potential scale reduction factor was 1.00. Figure A.4: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a BYM2 inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 35, and the maximum value of the potential scale reduction factor was 1.06. Figure A.5: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a FCK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 355, and the maximum value of the potential scale reduction factor was 1.01. Figure A.6: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a CK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1471, and the maximum value of the potential scale reduction factor was 1.00. Figure A.7: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a FIK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 289, and the maximum value of the potential scale reduction factor was 1.01. Figure A.8: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a IK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1623, and the maximum value of the potential scale reduction factor was 1.00. A.2 Lengthscale prior sensitivity Table A.1: Six lengthscale prior distributions were considered for use in the simulation (Section 4.3) and HIV prevalence (Section 4.4) studies. Description Prior Additional details Gamma \\(l \\sim \\text{Gamma}(1, 1)\\) \\(-\\) Geometry-informed inverse-gamma \\(l \\sim \\text{IG}(a, b)\\) The parameters \\(a\\) and \\(b\\) chosen such that 5% of the prior mass was below and above the 5% and 95% quantile for distance between points (Betancourt 2017) Geometry-informed normal \\(l \\sim \\mathcal{N}^{+}(0, \\sigma)\\) The parameter \\(\\sigma\\) set as one third the difference between the minimum and maximum distance between points (Betancourt 2017) Log-normal \\(l \\sim \\text{Log-normal}(0, 1)\\) \\(-\\) Non-informative \\(p(l) = 1\\) This is an improper prior in that it does not integrate to one Oracle normal \\(l \\sim \\mathcal{N}^{+}(2.5, 1)\\) The mean of this prior was set to the true value of the lengthscale Figure A.9: The probability density for each lengthscale prior distribution as given in Table A.1. Figure A.10: Lengthscale posterior distributions obtained using NUTS to fit a centroid kernel model to integrated kernel data. The true value, 2.5, is shown as a dashed vertical line. Six different lengthscale prior distributions were considered as given in Table A.1. The geometry used was the grid (Panel 4.6E). A.3 Simulation study A.3.1 Lengthscale recovery Figure A.11: The lengthscale posterior mean and 95% credible interval obtained using the centroid kernel model on integrated kernel data for the first 40 simulation replicates on each geometry. The true lengthscale, and lengthscale obtained using the heuristic method of N. Best et al. (1999), are shown as dashed horizontal lines. A.3.2 BYM2 proportion Figure A.12: The BYM2 proportion parameter posterior mean and 95% credible interval obtained for the first 40 simulation replicates for the realistic geometries. When the simulated data is IID, the BYM2 proportion parameter is in the majority of cases below 0.5, corresponding to have inferred that the noise is mostly IID (spatially unstructured) When the simulated data is either Besag or IK, the BYM2 proportion parameter is in the majority of cases above 0.5, corresponding to have inferred that the noise is mostly Besag (spatially structured). A.3.3 Mean squared error #qpkiqsdwcd table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #qpkiqsdwcd thead, #qpkiqsdwcd tbody, #qpkiqsdwcd tfoot, #qpkiqsdwcd tr, #qpkiqsdwcd td, #qpkiqsdwcd th { border-style: none; } #qpkiqsdwcd p { margin: 0; padding: 0; } #qpkiqsdwcd .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #qpkiqsdwcd .gt_caption { padding-top: 4px; padding-bottom: 4px; } #qpkiqsdwcd .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #qpkiqsdwcd .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #qpkiqsdwcd .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #qpkiqsdwcd .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #qpkiqsdwcd .gt_column_spanner_outer:first-child { padding-left: 0; } #qpkiqsdwcd .gt_column_spanner_outer:last-child { padding-right: 0; } #qpkiqsdwcd .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #qpkiqsdwcd .gt_spanner_row { border-bottom-style: hidden; } #qpkiqsdwcd .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #qpkiqsdwcd .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #qpkiqsdwcd .gt_from_md > :first-child { margin-top: 0; } #qpkiqsdwcd .gt_from_md > :last-child { margin-bottom: 0; } #qpkiqsdwcd .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #qpkiqsdwcd .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #qpkiqsdwcd .gt_row_group_first td { border-top-width: 2px; } #qpkiqsdwcd .gt_row_group_first th { border-top-width: 2px; } #qpkiqsdwcd .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #qpkiqsdwcd .gt_first_summary_row.thick { border-top-width: 2px; } #qpkiqsdwcd .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #qpkiqsdwcd .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #qpkiqsdwcd .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_left { text-align: left; } #qpkiqsdwcd .gt_center { text-align: center; } #qpkiqsdwcd .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #qpkiqsdwcd .gt_font_normal { font-weight: normal; } #qpkiqsdwcd .gt_font_bold { font-weight: bold; } #qpkiqsdwcd .gt_font_italic { font-style: italic; } #qpkiqsdwcd .gt_super { font-size: 65%; } #qpkiqsdwcd .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #qpkiqsdwcd .gt_asterisk { font-size: 100%; vertical-align: 0; } #qpkiqsdwcd .gt_indent_1 { text-indent: 5px; } #qpkiqsdwcd .gt_indent_2 { text-indent: 10px; } #qpkiqsdwcd .gt_indent_3 { text-indent: 15px; } #qpkiqsdwcd .gt_indent_4 { text-indent: 20px; } #qpkiqsdwcd .gt_indent_5 { text-indent: 25px; } Table A.2: The average mean squared error (MSE) of each inferential model in estimating \\(\\rho\\), under different simulation and geometry settings. Entries for FCK and CK on geometry 2 are empty because model was undefined in that case. The units used in this table are expressed in thousandths. Simulation model Inferential model IID Besag BYM2 FCK CK FIK IK 1 IID 8.20 7.56 7.99 7.84 7.67 7.90 7.61 Besag 7.31 6.39 7.15 7.31 6.76 7.27 6.63 IK 7.44 6.30 7.27 7.74 6.83 7.58 6.62 2 IID 8.43 7.62 8.23 - - 7.99 8.32 Besag 7.56 6.58 7.39 - - 7.25 6.42 IK 7.16 5.91 6.95 - - 6.91 4.95 3 IID 8.23 7.72 8.19 8.09 7.85 8.05 7.75 Besag 7.73 6.71 7.63 7.78 7.01 7.55 6.67 IK 7.56 6.24 7.30 7.75 6.78 7.53 6.18 4 IID 8.71 8.03 8.49 8.53 8.31 8.35 8.12 Besag 7.48 6.65 7.34 7.55 7.08 7.44 6.89 IK 7.38 6.11 7.12 7.60 6.71 7.45 6.36 Grid IID 7.63 7.65 7.66 7.72 7.79 7.89 7.84 Besag 4.06 3.29 3.77 3.94 3.36 3.71 3.32 IK 5.97 4.30 4.81 4.98 3.50 4.47 3.41 Cote d'Ivoire IID 7.72 7.78 7.74 7.89 7.99 8.08 7.96 Besag 4.88 3.96 4.45 4.62 4.07 4.36 4.00 IK 5.61 3.96 4.50 4.73 3.18 4.19 3.10 Texas IID 7.63 7.71 7.65 8.59 8.05 8.60 7.80 Besag 5.13 4.05 4.62 4.60 4.36 4.34 4.26 IK 6.29 4.51 5.06 4.44 3.45 4.04 3.37 A.3.4 Continuous ranked probability score #iajjvpgkrj table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #iajjvpgkrj thead, #iajjvpgkrj tbody, #iajjvpgkrj tfoot, #iajjvpgkrj tr, #iajjvpgkrj td, #iajjvpgkrj th { border-style: none; } #iajjvpgkrj p { margin: 0; padding: 0; } #iajjvpgkrj .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #iajjvpgkrj .gt_caption { padding-top: 4px; padding-bottom: 4px; } #iajjvpgkrj .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #iajjvpgkrj .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #iajjvpgkrj .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #iajjvpgkrj .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #iajjvpgkrj .gt_column_spanner_outer:first-child { padding-left: 0; } #iajjvpgkrj .gt_column_spanner_outer:last-child { padding-right: 0; } #iajjvpgkrj .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #iajjvpgkrj .gt_spanner_row { border-bottom-style: hidden; } #iajjvpgkrj .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #iajjvpgkrj .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #iajjvpgkrj .gt_from_md > :first-child { margin-top: 0; } #iajjvpgkrj .gt_from_md > :last-child { margin-bottom: 0; } #iajjvpgkrj .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #iajjvpgkrj .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #iajjvpgkrj .gt_row_group_first td { border-top-width: 2px; } #iajjvpgkrj .gt_row_group_first th { border-top-width: 2px; } #iajjvpgkrj .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #iajjvpgkrj .gt_first_summary_row.thick { border-top-width: 2px; } #iajjvpgkrj .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #iajjvpgkrj .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #iajjvpgkrj .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_left { text-align: left; } #iajjvpgkrj .gt_center { text-align: center; } #iajjvpgkrj .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #iajjvpgkrj .gt_font_normal { font-weight: normal; } #iajjvpgkrj .gt_font_bold { font-weight: bold; } #iajjvpgkrj .gt_font_italic { font-style: italic; } #iajjvpgkrj .gt_super { font-size: 65%; } #iajjvpgkrj .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #iajjvpgkrj .gt_asterisk { font-size: 100%; vertical-align: 0; } #iajjvpgkrj .gt_indent_1 { text-indent: 5px; } #iajjvpgkrj .gt_indent_2 { text-indent: 10px; } #iajjvpgkrj .gt_indent_3 { text-indent: 15px; } #iajjvpgkrj .gt_indent_4 { text-indent: 20px; } #iajjvpgkrj .gt_indent_5 { text-indent: 25px; } Table A.3: The average continuous ranked probability score (CRPS) of each inferential model in estimating \\(\\rho\\), under different simulation and geometry settings. Entries for FCK and CK on geometry 2 are empty because model was undefined in that case. The units used in this table are thousandths. Simulation model Inferential model IID Besag BYM2 FCK CK FIK IK 1 IID 32.6 33.9 32.7 32.1 33.4 32.3 33.5 Besag 30.7 29.5 30.6 30.7 30.0 30.7 29.9 IK 31.2 29.1 31.1 32.1 30.1 31.7 29.7 2 IID 33.1 33.4 32.8 - - 32.7 39.9 Besag 32.0 30.6 31.6 - - 31.2 33.2 IK 28.9 26.2 28.6 - - 28.4 24.2 3 IID 32.9 33.8 33.1 32.4 33.5 32.6 35.0 Besag 32.9 31.1 32.4 33.0 31.5 32.2 31.6 IK 30.7 28.1 30.3 31.4 29.0 30.8 27.9 4 IID 34.3 34.9 34.2 34.2 34.8 33.8 34.7 Besag 32.3 31.2 31.9 32.1 31.8 31.9 31.7 IK 29.8 27.3 29.3 30.5 28.3 29.9 27.7 Grid IID 32.4 34.2 32.5 33.1 34.0 35.1 35.1 Besag 24.6 22.7 23.3 23.4 23.8 23.5 24.1 IK 28.7 23.7 24.6 24.4 21.1 23.1 21.0 Cote d'Ivoire IID 32.4 34.5 32.5 33.7 34.8 35.8 35.6 Besag 26.5 24.4 24.9 25.3 25.9 25.3 26.0 IK 27.7 22.2 23.4 23.6 19.6 22.2 19.6 Texas IID 32.1 34.0 32.3 39.2 35.7 40.0 35.6 Besag 27.3 24.7 25.3 27.1 27.5 26.9 27.0 IK 29.7 24.5 25.4 23.0 20.8 22.3 20.9 Figure A.13: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the first vignette geometry (Panel 4.6A). Credible intervals were generated using 1.96 times the standard error. Figure A.14: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the second vignette geometry (Panel 4.6B). Credible intervals were generated using 1.96 times the standard error. Figure A.15: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on third vignette geometry (Panel 4.6C). Credible intervals were generated using 1.96 times the standard error. Figure A.16: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the fourth vignette geometry (Panel 4.6D). Credible intervals were generated using 1.96 times the standard error. Figure A.17: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the first vignette geometry (Panel 4.6A). Figure A.18: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the second vignette geometry (Panel 4.6B). Figure A.19: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the third vignette geometry (Panel 4.6C). Figure A.20: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the fourth vignette geometry (Panel 4.6D). Figure A.21: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the grid geometry (Panel 4.6E). Figure A.22: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the Côte d’Ivoire geometry (Panel 4.6F). Figure A.23: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the Texas geometry (Panel 4.6G). A.3.5 Calibration Figure A.24: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the first vignette geometry (Panel 4.6A). Figure A.25: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the second vignette geometry (Panel 4.6B). Figure A.26: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the third vignette geometry (Panel 4.6C). Figure A.27: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the fourth vignette geometry (Panel 4.6D). Figure A.28: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the grid geometry (Panel 4.6E). Figure A.29: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the Côte d’Ivoire geometry (Panel 4.6F). Figure A.30: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the Texas geometry (Panel 4.6G). A.4 HIV study A.4.1 Lengthscale Figure A.31: The lengthscale hyperparameter prior and posterior distributions for each of the four considered PHIA surveys (Table 4.3), using both the CK and IK inferential models. A.4.2 BYM2 proportion Figure A.32: The BYM2 proportion hyperparameter prior and posterior distributions for each of the four considered PHIA surveys (Table 4.3). A value of zero corresponds to IID noise. A value of one corresponds to Besag noise. For each survey, excluding the Côte d’Ivoire 2017 PHIA, the posterior distribution for the BYM2 proportion is concentrated towards a value of one. This result can be interpreted as suggesting that the variation in HIV prevalence from these surveys is spatially structured. A.4.3 Estimates Figure A.33: The HIV prevalence posterior mean and 95% credible interval for each area of Côte d’Ivoire, based on the 2017 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10A. Figure A.34: The HIV prevalence posterior mean and 95% credible interval for each area of Malawi, based on the 2016 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10B. Figure A.35: The HIV prevalence posterior mean and 95% credible interval for each area of Tanzania, based on the 2017 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10C. Figure A.36: The HIV prevalence posterior mean and 95% credible interval for each area of Zimbabwe, based on the 2016 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10D. A.4.4 Cross-validation A.4.4.1 Mean squared error #irhisghlpm table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #irhisghlpm thead, #irhisghlpm tbody, #irhisghlpm tfoot, #irhisghlpm tr, #irhisghlpm td, #irhisghlpm th { border-style: none; } #irhisghlpm p { margin: 0; padding: 0; } #irhisghlpm .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #irhisghlpm .gt_caption { padding-top: 4px; padding-bottom: 4px; } #irhisghlpm .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #irhisghlpm .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #irhisghlpm .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #irhisghlpm .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #irhisghlpm .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #irhisghlpm .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #irhisghlpm .gt_column_spanner_outer:first-child { padding-left: 0; } #irhisghlpm .gt_column_spanner_outer:last-child { padding-right: 0; } #irhisghlpm .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #irhisghlpm .gt_spanner_row { border-bottom-style: hidden; } #irhisghlpm .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #irhisghlpm .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #irhisghlpm .gt_from_md > :first-child { margin-top: 0; } #irhisghlpm .gt_from_md > :last-child { margin-bottom: 0; } #irhisghlpm .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #irhisghlpm .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #irhisghlpm .gt_row_group_first td { border-top-width: 2px; } #irhisghlpm .gt_row_group_first th { border-top-width: 2px; } #irhisghlpm .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #irhisghlpm .gt_first_summary_row.thick { border-top-width: 2px; } #irhisghlpm .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #irhisghlpm .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #irhisghlpm .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #irhisghlpm .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #irhisghlpm .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_left { text-align: left; } #irhisghlpm .gt_center { text-align: center; } #irhisghlpm .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #irhisghlpm .gt_font_normal { font-weight: normal; } #irhisghlpm .gt_font_bold { font-weight: bold; } #irhisghlpm .gt_font_italic { font-style: italic; } #irhisghlpm .gt_super { font-size: 65%; } #irhisghlpm .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #irhisghlpm .gt_asterisk { font-size: 100%; vertical-align: 0; } #irhisghlpm .gt_indent_1 { text-indent: 5px; } #irhisghlpm .gt_indent_2 { text-indent: 10px; } #irhisghlpm .gt_indent_3 { text-indent: 15px; } #irhisghlpm .gt_indent_4 { text-indent: 20px; } #irhisghlpm .gt_indent_5 { text-indent: 25px; } Table A.4: The mean pointwise leave-one-out and spatial leave-one-out MSE in estimating \\(\\rho_i\\), with standard errors, for each inferential model across the four considered PHIA surveys. The units used in this table are thousandths. PHIA survey Mean squared error (units: 1/1000) IID Besag BYM2 FCK CK FIK IK LOO Côte d’Ivoire, 2017 0.21 0.22 0.20 0.21 0.19 0.21 0.20 Malawi, 2016 7.10 2.39 2.59 3.59 3.70 2.43 2.54 Tanzania, 2017 1.66 1.14 1.43 0.95 0.65 0.78 0.66 Zimbabwe, 2016 4.76 2.51 2.54 2.51 1.88 2.15 1.83 SLOO Côte d’Ivoire, 2017 0.20 0.22 0.21 0.24 0.25 0.26 0.25 Malawi, 2016 7.13 2.41 3.32 8.22 7.95 7.05 6.70 Tanzania, 2017 1.65 1.09 2.46 1.86 2.80 1.86 2.59 Zimbabwe, 2016 4.73 2.49 3.44 3.95 3.36 3.93 3.42 A.4.4.2 Continuous ranked probability score Figure A.37: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.38: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Malawi 2016 PHIA survey 4.10B. Figure A.39: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Tanzania 2017 PHIA survey 4.10C. Figure A.40: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Zimbabwe 2016 PHIA survey 4.10D. Figure A.41: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.42: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Malawi 2016 PHIA survey (Panel 4.10B). Figure A.43: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Tanzania 2017 PHIA survey (Panel 4.10C). Figure A.44: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Zimbabwe 2016 PHIA survey (Panel 4.10D). Figure A.45: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.46: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Malawi 2016 PHIA survey (Panel 4.10B). Figure A.47: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Tanzania 2017 PHIA survey (Panel 4.10C). Figure A.48: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Zimbabwe 2016 PHIA survey (Panel 4.10D). References Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. "],["a-model-for-risk-group-proportions.html", "B A model for risk group proportions B.1 The Global AIDS Strategy B.2 Household survey data B.3 Spatial analysis levels B.4 Survey questions and risk group allocation B.5 Additional figures", " B A model for risk group proportions B.1 The Global AIDS Strategy Table B.1: Prioritisation strata for AGYW given by UNAIDS (2021b) based on to HIV incidence in the general population and behavioural risk. Prioritisation strata Criterion Low 0.3-1.0% incidence and low-risk behaviour, or <0.3% incidence and high-risk behaviour Moderate 1.0-3.0% incidence and low-risk behaviour, or 0.3-1.0% incidence and high-risk behaviour High 1.0-3.0% incidence and high-risk behaviour Very high >3.0% incidence Table B.2: Commitments recommended by UNAIDS (2021b) to be met for each HIV intervention, given in terms of the proportion of the AGYW prioritisation strata reached. The symbol “-” represents no commitment. Intervention Low Moderate High Very High Condoms and lube for those with non-regular partners(s), unknown STI status, not on PrEP 50% 70% 95% 95% STI screening and treatment 10% 10% 80% 80% Access to PEP - - 50% 90% PrEP use - 5% 50% 50% Economic empowerment - - 20% 20% B.2 Household survey data #ejwvuleznx table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #ejwvuleznx thead, #ejwvuleznx tbody, #ejwvuleznx tfoot, #ejwvuleznx tr, #ejwvuleznx td, #ejwvuleznx th { border-style: none; } #ejwvuleznx p { margin: 0; padding: 0; } #ejwvuleznx .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #ejwvuleznx .gt_caption { padding-top: 4px; padding-bottom: 4px; } #ejwvuleznx .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #ejwvuleznx .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #ejwvuleznx .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #ejwvuleznx .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #ejwvuleznx .gt_column_spanner_outer:first-child { padding-left: 0; } #ejwvuleznx .gt_column_spanner_outer:last-child { padding-right: 0; } #ejwvuleznx .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #ejwvuleznx .gt_spanner_row { border-bottom-style: hidden; } #ejwvuleznx .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #ejwvuleznx .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #ejwvuleznx .gt_from_md > :first-child { margin-top: 0; } #ejwvuleznx .gt_from_md > :last-child { margin-bottom: 0; } #ejwvuleznx .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #ejwvuleznx .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #ejwvuleznx .gt_row_group_first td { border-top-width: 2px; } #ejwvuleznx .gt_row_group_first th { border-top-width: 2px; } #ejwvuleznx .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #ejwvuleznx .gt_first_summary_row.thick { border-top-width: 2px; } #ejwvuleznx .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #ejwvuleznx .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #ejwvuleznx .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_left { text-align: left; } #ejwvuleznx .gt_center { text-align: center; } #ejwvuleznx .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #ejwvuleznx .gt_font_normal { font-weight: normal; } #ejwvuleznx .gt_font_bold { font-weight: bold; } #ejwvuleznx .gt_font_italic { font-style: italic; } #ejwvuleznx .gt_super { font-size: 65%; } #ejwvuleznx .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #ejwvuleznx .gt_asterisk { font-size: 100%; vertical-align: 0; } #ejwvuleznx .gt_indent_1 { text-indent: 5px; } #ejwvuleznx .gt_indent_2 { text-indent: 10px; } #ejwvuleznx .gt_indent_3 { text-indent: 15px; } #ejwvuleznx .gt_indent_4 { text-indent: 20px; } #ejwvuleznx .gt_indent_5 { text-indent: 25px; } Table B.3: The sample size by age group for each included survey in the analysis. The column “TS question” refers to whether or not the survey included a specific question about transactional sex (TS). Type Year TS question Sample size 15-19 20-24 25-29 Total Botswana BAIS 2013 ✓ 557 588 649 1794 Cameroon DHS 2004 ✗ 2678 2210 1732 6620 DHS 2011 ✗ 3588 3115 2656 9359 PHIA 2017 ✗ 2140 1923 1851 5914 DHS 2018 ✓ 3349 2463 2345 8157 Kenya DHS 2003 ✗ 1819 1709 1391 4919 DHS 2008 ✗ 1767 1743 1420 4930 DHS 2014 ✗ 2861 2534 2858 8253 Lesotho DHS 2004 ✗ 1761 1456 1026 4243 DHS 2009 ✗ 1834 1545 1195 4574 DHS 2014 ✗ 1537 1293 1069 3899 PHIA 2017 ✓ 1156 1202 1054 3412 Mozambique AIS 2009 ✗ 1031 1106 987 3124 DHS 2011 ✗ 3065 2468 2340 7873 AIS 2015 ✗ 1554 1390 1080 4024 Malawi DHS 2000 ✗ 2914 2998 2358 8270 DHS 2004 ✗ 2407 2823 2135 7365 DHS 2010 ✗ 5032 4387 4309 13728 DHS 2015 ✓ 5273 5094 3976 14343 PHIA 2016 ✓ 1646 1934 1511 5091 Namibia DHS 2000 ✗ 1428 1313 1099 3840 DHS 2006 ✗ 2203 1870 1544 5617 DHS 2013 ✗ 1852 1709 1482 5043 PHIA 2017 ✓ 1491 1525 1370 4386 Eswatini DHS 2006 ✗ 1265 1027 731 3023 PHIA 2017 ✗ 1031 895 811 2737 Tanzania AIS 2003 ✗ 1466 1377 1270 4113 AIS 2007 ✗ 2137 1676 1509 5322 DHS 2010 ✗ 2221 1860 1613 5694 AIS 2012 ✗ 2474 1923 1815 6212 Uganda DHS 2000 ✗ 1687 1541 1326 4554 DHS 2006 ✗ 1948 1661 1406 5015 AIS 2011 ✗ 2451 2164 1921 6536 DHS 2011 ✗ 2025 1664 1614 5303 DHS 2016 ✓ 4276 3782 3014 11072 PHIA 2016 ✗ 3289 3059 2574 8922 South Africa DHS 2016 ✓ 1505 1408 1397 4310 Zambia DHS 2007 ✗ 1598 1405 1373 4376 DHS 2013 ✗ 3685 3036 2789 9510 PHIA 2016 ✓ 2120 2045 1619 5784 DHS 2018 ✓ 3112 2687 2166 7965 Zimbabwe DHS 1999 ✗ 1468 1230 1011 3709 DHS 2005 ✗ 2128 1943 1438 5509 DHS 2010 ✗ 1966 1796 1680 5442 DHS 2015 ✓ 2154 1779 1647 5580 PHIA 2016 ✓ 2114 1817 1573 5504 Total 103063 92173 79734 274970 Table B.4: All of that household surveys that were excluded from the risk group model in Section 5.3. Survey Reason for exclusion Mozambique 2003 DHS No GPS coordinates available to place survey clusters within districts. Tanzania 2015 DHS Insufficient sexual behaviour questions. Uganda 2004 AIS Unable to download region boundaries. Zambia 2002 DHS No GPS coordinates available to place survey clusters within districts. B.3 Spatial analysis levels Table B.5: The number of areas and analysis level for each country that was used in the analysis. Country Number of areas Analysis level Botswana 27 Health district Cameroon 58 Department Kenya 47 County Lesotho 10 District Mozambique 161 District Malawi 33 Health district and cities Namibia 38 District Eswatini 4 Region Tanzania 195 District Uganda 136 District South Africa 52 District Zambia 116 District Zimbabwe 63 District B.4 Survey questions and risk group allocation Table B.6: The behavioural survey questions included in AIDS Indicator Survey (AIS) and Demographic and Health Surveys (DHS) used to determine AGYW risk group membership. Variable(s) Description \\(\\texttt{v501}\\) Current marital status of the respondent. \\(\\texttt{v529}\\) Computed time since last sexual intercourse. \\(\\texttt{v531}\\) Age at first sexual intercourse–imputed. \\(\\texttt{v766b}\\) Number of sexual partners during the last 12 months (including husband). \\(\\texttt{v767[a, b, c]}\\) Relationship with last three sexual partners. Options are: spouse, boyfriend not living with respondent, other friend, casual acquaintance, relative, commercial sex worker, live-in partner, other. \\(\\texttt{v791a}\\) Had sex in return for gifts, cash or anything else in the past 12 months. (Asked only to women 15-24 who are not in a union.) Table B.7: The behavioural survey questions included in Population-Based HIV Impact Assessment (PHIA) surveys used to determine AGYW risk group membership. Variable(s) Description \\(\\texttt{part12monum}\\) Number of sexual partners during the last 12 months (including husband). \\(\\texttt{part12modkr}\\) Reason for leaving blank. \\(\\texttt{partlivew[1, 2, 3]}\\) Does the person you had sex with live in this household? \\(\\texttt{partrelation[1, 2, 3]}\\) Relationship with last three sexual partners. Options are: husband, live-in partner, partner (not living with), ex-spouse/partner, friend/acquaintance, sex worker, sex worker client, stranger, other, don’t know, refused. \\(\\texttt{sellsx12mo}\\) Had sex for money and/or gifts in the last 12 months. \\(\\texttt{buysx12mo}\\) Paid money or given gifts for sex in the last 12 months. B.5 Additional figures Figure B.1: The proportion of posterior variance explained by each random effect, calculated as a ratio of the random effect variance posterior mean to the sum of all random effect variance posterior means. To allow calculation of this metric by country, the model was run for each country individually. Figure B.2: For the 20-24 and 25-29 age groups, the proportion of AGYW in the one cohabiting partner and non-regular or multiple partner(s) risk groups was bimodal. References UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” "],["naomi-aghq-appendix.html", "C Fast approximate Bayesian inference C.1 Epilepsy example C.2 Loa loa example C.3 AGHQ with Laplace marginals algorithm C.4 Simplified Naomi model description C.5 NUTS convergence and suitability C.6 Use of PCA-AGHQ C.7 Inference comparison", " C Fast approximate Bayesian inference C.1 Epilepsy example C.1.1 TMB C++ template // epil.cpp #include <TMB.hpp> template <class Type> Type objective_function<Type>::operator()() { DATA_INTEGER(N); DATA_INTEGER(J); DATA_INTEGER(K); DATA_MATRIX(X); DATA_VECTOR(y); DATA_MATRIX(E); // Epsilon matrix PARAMETER_VECTOR(beta); PARAMETER_VECTOR(epsilon); PARAMETER_VECTOR(nu); PARAMETER(l_tau_epsilon); PARAMETER(l_tau_nu); Type tau_epsilon = exp(l_tau_epsilon); Type tau_nu = exp(l_tau_nu); Type sigma_epsilon = sqrt(1 / tau_epsilon); Type sigma_nu = sqrt(1 / tau_nu); vector<Type> eta(X * beta + nu + E * epsilon); vector<Type> lambda(exp(eta)); Type nll; nll = Type(0.0); // Note: dgamma() is parameterised as (shape, scale) // R-INLA is parameterised as (shape, rate) nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true); nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true); nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum(); nll -= dnorm(nu, Type(0), sigma_nu, true).sum(); nll -= dnorm(beta, Type(0), Type(100), true).sum(); nll -= dpois(y, lambda, true).sum(); ADREPORT(tau_epsilon); ADREPORT(tau_nu); return(nll); } C.1.2 Modified TMB C++ template // epil_modified.cpp #include <TMB.hpp> template <class Type> Type objective_function<Type>::operator()() { DATA_INTEGER(N); DATA_INTEGER(J); DATA_INTEGER(K); DATA_MATRIX(X); DATA_VECTOR(y); DATA_MATRIX(E); // Epsilon matrix DATA_IVECTOR(x_starts); // Start index of each subvector of x DATA_IVECTOR(x_lengths); // Length of each subvector of x DATA_INTEGER(i); // Index i PARAMETER(x_i); PARAMETER_VECTOR(x_minus_i); vector<Type> x(301); int k = 0; for (int j = 0; j < 301; j++) { if (j + 1 == i) { // +1 because C++ does zero-indexing x(j) = x_i; } else { x(j) = x_minus_i(k); k++; } } vector<Type> beta = x.segment(x_starts(0), x_lengths(0)); vector<Type> epsilon = x.segment(x_starts(1), x_lengths(1)); vector<Type> nu = x.segment(x_starts(2), x_lengths(2)); PARAMETER(l_tau_epsilon); PARAMETER(l_tau_nu); Type tau_epsilon = exp(l_tau_epsilon); Type tau_nu = exp(l_tau_nu); Type sigma_epsilon = sqrt(1 / tau_epsilon); Type sigma_nu = sqrt(1 / tau_nu); vector<Type> eta(X * beta + nu + E * epsilon); vector<Type> lambda(exp(eta)); Type nll; nll = Type(0.0); // Note: dgamma() is parameterised as (shape, scale) // R-INLA is parameterised as (shape, rate) nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true); nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true); nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum(); nll -= dnorm(nu, Type(0), sigma_nu, true).sum(); nll -= dnorm(beta, Type(0), Type(100), true).sum(); nll -= dpois(y, lambda, true).sum(); ADREPORT(tau_epsilon); ADREPORT(tau_nu); return(nll); } C.1.3 Stan C++ template // epil.stan data { int<lower=0> N; // Number of patients int<lower=0> J; // Number of clinic visits int<lower=0> K; // Number of predictors (inc. intercept) matrix[N * J, K] X; // Design matrix int<lower=0> y[N * J]; // Outcome variable matrix[N * J, N] E; // Epsilon matrix } parameters { vector[K] beta; // Vector of coefficients vector[N] epsilon; // Patient specific errors vector[N * J] nu; // Patient-visit errors real<lower=0> tau_epsilon; // Precision of epsilon real<lower=0> tau_nu; // Precision of nu } transformed parameters { vector[N * J] eta = X * beta + nu + E * epsilon; } model { beta ~ normal(0, 100); tau_epsilon ~ gamma(0.001, 0.001); tau_nu ~ gamma(0.001, 0.001); epsilon ~ normal(0, sqrt(1 / tau_epsilon)); nu ~ normal(0, sqrt(1 / tau_nu)); y ~ poisson_log(eta); } C.1.4 NUTS convergence and suitability C.1.4.1 tmbstan Figure C.1: Traceplots for the tmbstan parameters with the lowest ESS and highest potential scale reduction factor. These were l_tau_nu (an \\(\\text{ESS}\\) of 377) and beta[3] (an \\(\\hat R\\) of 1.006). C.1.4.2 rstan Figure C.2: Traceplots for the rstan parameters with the lowest ESS and highest potential scale reduction factor. These were tau_nu (an \\(\\text{ESS}\\) of 437) and tau_nu (an \\(\\hat R\\) of 1.009). Rather than plotting the traceplot for tau_nu twice, the parameter epsilon[18] is included, which had the second highest \\(\\hat R\\) of 1.008. C.2 Loa loa example C.2.1 NUTS convergence and suitability Figure C.3: Traceplots for the parameters with the lowest ESS and highest potential scale reduction factor for the Loa loa ELGM example. C.2.2 Inference comparison Figure C.4: Relative difference between the Gaussian and Laplace marginal posterior means and standard deviations to NUTS results at each \\(u(s_i), v(s_i): i \\in [190]\\). Absolute differences are in Figure 6.14. C.3 AGHQ with Laplace marginals algorithm This section provides the INLA-like algorithm for AGHQ with Laplace marginals used in this thesis. The algorithm for AGHQ with Gaussian marginals used in this thesis is as given in Stringer, Brown, and Stafford (2022), and implemented in the aghq package. Calculate the mode, Hessian at the mode, lower Cholesky, and Laplace approximation \\[\\begin{align} \\hat{\\boldsymbol{\\mathbf{\\theta}}} &= \\arg \\max_{\\boldsymbol{\\mathbf{\\theta}}} {\\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}, \\\\ \\hat{\\mathbf{H}} &= - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\theta}} \\partial \\boldsymbol{\\mathbf{\\theta}}^\\top} \\log \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) \\rvert_{\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}}, \\\\ \\hat{\\mathbf{H}}^{-1} &= \\hat{\\mathbf{L}} \\hat{\\mathbf{L}}^\\top, \\\\ \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &= \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}, \\end{align}\\] where \\(\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1})\\) is a Gaussian approximation to \\(p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) with mode and precision matrix given by \\[\\begin{align} \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}) &= \\arg \\max_\\mathbf{x} \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}), \\\\ \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) &= - \\frac{\\partial^2}{\\partial \\mathbf{x} \\partial \\mathbf{x}^\\top} \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{align}\\] Generate a set of nodes \\(\\mathbf{u} \\in \\mathcal{Q}(m, k)\\) and weights \\(\\omega: \\mathbf{u} \\to \\mathbb{R}\\) from a Gauss-Hermite quadrature rule with \\(k\\) nodes per dimension. Adapt these nodes based on the mode and lower Cholesky via \\(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}) = \\hat{\\boldsymbol{\\mathbf{\\theta}}} + \\mathbf{L} \\mathbf{u}\\). Use this quadrature rule to calculate the normalising constant \\(\\tilde p_{\\texttt{AQ}}(\\mathbf{y})\\) as follows \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = \\sum_{\\mathbf{u} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}), \\mathbf{y}) \\omega(\\mathbf{u}). \\tag{C.1} \\end{equation}\\] For \\(i \\in [N]\\) generate \\(l\\) nodes \\(x_i(\\mathbf{v})\\) via a Gauss-Hermite quadrature rule \\(\\mathbf{v} \\in \\mathcal{Q}(1, l)\\) adapted based on the mode \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_i\\) and standard deviation \\(\\sqrt{\\text{diag}[\\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1}]_i}\\) of the Gaussian marginal. A value of \\(l \\geq 4\\) is recommended to enable B-spline interpolation. For \\(x_i \\in \\{ x_i(\\mathbf{v}) \\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) and \\(\\boldsymbol{\\mathbf{\\theta}} \\in \\{ \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}) \\}_{\\mathbf{u} \\in \\mathcal{Q}(m, k)}\\) calculate the modes and Hessians \\[\\begin{align} \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) &= \\arg \\max_{\\mathbf{x}_{-i}} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}), \\\\ \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) &= - \\frac{\\partial^2}{\\partial \\mathbf{x}_{-i} \\partial \\mathbf{x}_{-i}^\\top} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}, \\end{align}\\] where optimisation to obtain \\(\\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})\\) can be initialised at \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_{-i}\\). For \\(x_i \\in \\{ x_i(\\mathbf{v}) \\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) calculate \\[\\begin{equation} p_\\texttt{AQ}(x_i \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_\\texttt{LA}(x_i, \\mathbf{y})}{\\tilde p_{\\texttt{AQ}}(\\mathbf{y})}, \\tag{C.2} \\end{equation}\\] where \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\mathbf{y}) = \\sum_{\\mathbf{u} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}), \\mathbf{y}) \\omega(\\mathbf{u}). \\end{equation}\\] and \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] Equation (C.2) can be calculated using the estimate of the evidence given in Equation (C.1), but it is more numerically accurate, and requires little extra computation, to use the estimate \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = \\sum_{\\mathbf{v} \\in \\mathcal{Q}(1, l)} \\tilde p_\\texttt{LA}(x_i(\\mathbf{v}), \\mathbf{y}) \\omega(\\mathbf{v}) \\end{equation}\\] Given \\(\\{x_i(\\mathbf{v}), \\tilde p_\\texttt{AQ}(x_i(\\mathbf{v}) \\, | \\, \\mathbf{y})\\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) create a spline interpolant to each posterior marginal on the log-scale. Samples, and thereby relevant posterior marginal summaries, may be obtained using inverse transform sampling. C.4 Simplified Naomi model description This section describes the simplified version of the Naomi model (Eaton et al. 2021) in more detail. The concise \\(i\\) indexing used in Section 6.3 is replaced by a more complete \\(x, s, a\\) indexing. There are four sections: Section C.4.1 gives the process specifications, giving the terms in each structured additive predictor, along with their distributions. Section C.4.2 gives additional details about the likelihood terms not provided in Section 6.3. Section C.4.3 gives identifiability constraints used in circumstances where incomplete data is available for the country. Section C.4.4 provides details of the TMB implementation. C.4.1 Process specification Table C.1: The Naomi model can be conceptualised as having five processes. This table gives the number of latent field parameters and hyperparameters in each process, where \\(n\\) is the number of districts in the country. Model component Latent field Hyperparameter Section C.4.1.1 HIV prevalence \\(22 + 5n\\) 9 Section C.4.1.2 ART coverage \\(25 + 5n\\) 9 Section C.4.1.3 HIV incidence rate \\(2 + n\\) 3 Section C.4.1.4 ANC testing \\(2 + 2n\\) 2 Section C.4.1.5 ART attendance \\(n\\) 1 Total \\(51 + 14n\\) 24 C.4.1.1 HIV prevalence HIV prevalence \\(\\rho_{x, s, a} \\in [0, 1]\\) was modelled on the logit scale using the structured additive predictor \\[\\begin{equation} \\text{logit}(\\rho_{x, s, a}) = \\beta^\\rho_0 + \\beta_{S}^{\\rho, s = \\text{M}} + \\mathbf{u}^\\rho_a + \\mathbf{u}_a^{\\rho, s = \\text{M}} + \\mathbf{u}^\\rho_x + \\mathbf{u}_x^{\\rho, s = \\text{M}} + \\mathbf{u}_x^{\\rho, a < 15} + \\boldsymbol{\\mathbf{\\eta}}^\\rho_{R_x, s, a}. \\tag{C.3} \\end{equation}\\] Table C.2 provides a description of the terms included in Equation (C.3). Independent half-normal prior distributions were chosen for the five standard deviation terms \\[\\begin{equation} \\{\\sigma_A^\\rho, \\sigma_{AS}^\\rho, \\sigma_X^\\rho, \\sigma_{XS}^\\rho, \\sigma_{XA}^\\rho\\} \\sim \\mathcal{N}^{+}(0, 2.5), \\end{equation}\\] independent uniform prior distributions for the two AR1 correlation parameters \\[\\begin{equation} \\{\\phi_A^\\rho, \\phi_{AS}^\\rho\\} \\sim \\mathcal{U}(-1, 1), \\end{equation}\\] and independent beta prior distributions for the two BYM2 proportion parameters \\[\\begin{equation} \\{\\phi_X^\\rho, \\phi_{XS}^\\rho\\} \\sim \\text{Beta}(0.5, 0.5). \\end{equation}\\] Table C.2: Each term in Equation (C.3) together with, where applicable, its prior distribution and a written description of its role. Term Distribution Description \\(\\beta^\\rho_0\\) \\(\\mathcal{N}(0, 5)\\) Intercept \\(\\beta_{s}^{\\rho, s = \\text{M}}\\) \\(\\mathcal{N}(0, 5)\\) The difference in logit prevalence for men compared to women \\(\\mathbf{u}^\\rho_a\\) \\(\\text{AR}1(\\sigma_A^\\rho, \\phi_A^\\rho)\\) Age random effects for women \\(\\mathbf{u}_a^{\\rho, s = \\text{M}}\\) \\(\\text{AR}1(\\sigma_{AS}^\\rho, \\phi_{AS}^\\rho)\\) Age random effects for the difference in logit prevalence for men compared to women age \\(a\\) \\(\\mathbf{u}^\\rho_x\\) \\(\\text{BYM}2(\\sigma_X^\\rho, \\phi_X^\\rho)\\) Spatial random effects for women \\(\\mathbf{u}_x^{\\rho, s = \\text{M}}\\) \\(\\text{BYM}2(\\sigma_{XS}^\\rho, \\phi_{XS}^\\rho)\\) Spatial random effects for the difference in logit prevalence for men compared to women in district \\(x\\) \\(\\mathbf{u}_x^{\\rho, a < 15}\\) \\(\\text{ICAR}(\\sigma_{XA}^\\rho)\\) Spatial random effects for the difference in logit paediatric prevalence to adult women prevalence in district \\(x\\) \\(\\boldsymbol{\\mathbf{\\eta}}^\\rho_{R_x, s, a}\\) \\(-\\) Fixed offsets specifying assumed odds ratios for prevalence outside the age ranges for which data were available. Calculated from Spectrum model (Stover et al. 2019) outputs for region \\(R_x\\) C.4.1.2 ART coverage ART coverage \\(\\alpha_{x, s, a} \\in [0, 1]\\) was modelled on the logit scale using the structured additive predictor \\[\\begin{equation} \\text{logit}(\\alpha_{x, s, a}) = \\beta^\\alpha_0 + \\beta_{S}^{\\alpha, s = \\text{M}} + \\mathbf{u}^\\alpha_a + \\mathbf{u}_a^{\\alpha, s = \\text{M}} + \\mathbf{u}^\\alpha_x + \\mathbf{u}_x^{\\alpha, s = \\text{M}} + \\mathbf{u}_x^{\\alpha, a < 15} + \\boldsymbol{\\mathbf{\\eta}}^\\alpha_{R_x, s, a} \\end{equation}\\] with terms and prior distributions analogous to the HIV prevalence process model in Section C.4.1.1 above. C.4.1.3 HIV incidence rate HIV incidence rate \\(\\lambda_{x, s, a} > 0\\) was modelled on the log scale using the structured additive predictor \\[\\begin{equation} \\log(\\lambda_{x, s, a}) = \\beta_0^\\lambda + \\beta_S^{\\lambda, s = \\text{M}} + \\log(\\rho_{x}^{\\text{15-49}}) + \\log(1 - \\omega \\cdot \\alpha_{x}^{\\text{15-49}}) + \\mathbf{u}_x^\\lambda + \\boldsymbol{\\mathbf{\\eta}}_{R_x, s, a}^\\lambda. \\tag{C.4} \\end{equation}\\] Table C.3 provides a description of the terms included in Equation (C.4). Table C.3: Each term in Equation (C.4) together with, where applicable, its prior distribution and a written description of its role. Term Distribution Description \\(\\beta^\\lambda_0\\) \\(\\mathcal{N}(0, 5)\\) Intercept term proportional to the average HIV transmission rate for untreated HIV positive adults \\(\\beta_S^{\\lambda, s = \\text{M}}\\) \\(\\mathcal{N}(0, 5)\\) The log incidence rate ratio for men compared to women \\(\\rho_{x}^{\\text{15-49}}\\) \\(-\\) The HIV prevalence among adults 15-49 in district \\(x\\) calculated by aggregating age-specific HIV prevalences \\(\\alpha_{x}^{\\text{15-49}}\\) \\(-\\) The ART coverage among adults 15-49 in district \\(x\\) calculated by aggregating age-specific ART coverages \\(\\omega = 0.7\\) \\(-\\) Average reduction in HIV transmission rate per increase in population ART coverage fixed based on inputs to the Estimation and Projection Package (EPP) model \\(\\mathbf{u}_x^\\lambda\\) \\(\\mathcal{N}(0, \\sigma^\\lambda)\\) IID spatial random effects with \\(\\sigma^\\lambda \\sim \\mathcal{N}^+(0, 1)\\) \\(\\boldsymbol{\\mathbf{\\eta}}^\\lambda_{R_x, s, a}\\) \\(-\\) Fixed log incidence rate ratios by sex and age group calculated from Spectrum model outputs for region \\(R_x\\) The proportion recently infected among HIV positive persons \\(\\kappa_{x, s, a} \\in [0, 1]\\) was modelled as \\[\\begin{equation} \\kappa_{x, s, a} = 1 - \\exp \\left(- \\lambda_{x, s, a} \\cdot \\frac{1 - \\rho_{x, s, a}}{\\rho_{x, s, a}} \\cdot (\\Omega_T - \\beta_T ) - \\beta_T \\right), \\end{equation}\\] where \\(\\Omega_T \\sim \\mathcal{N}(\\Omega_{T_0}, \\sigma^{\\Omega_T})\\) is the mean duration of recent infection, and \\(\\beta_T \\sim \\mathcal{N}^{+}(\\beta_{T_0}, \\sigma^{\\beta_T})\\) is the false recent ratio. The prior distribution for \\(\\Omega_T\\) was informed by the characteristics of the recent infection testing algorithm. For PHIA surveys this was \\(\\Omega_{T_0} = 130 \\text{ days}\\) and \\(\\sigma^{\\Omega_T} = 6.12 \\text{ days}\\). For PHIA surveys there was assumed to be no false recency, such that \\(\\beta_{T_0} = 0.0\\), \\(\\sigma^{\\beta_T} = 0.0\\), and \\(\\beta_T = 0\\). C.4.1.4 ANC testing HIV prevalence \\(\\rho_{x, a}^\\text{ANC}\\) and ART coverage \\(\\alpha_{x, a}^\\text{ANC}\\) among pregnant women were modelled as being offset on the logit scale from the corresponding district-age indicators \\(\\rho_{x, F, a}\\) and \\(\\alpha_{x, F, a}\\) according to \\[\\begin{align} \\text{logit}(\\rho_{x, a}^{\\text{ANC}}) &= \\text{logit}(\\rho_{x, F, a}) + \\beta^{\\rho^{\\text{ANC}}} + \\mathbf{u}_x^{\\rho^{\\text{ANC}}} + \\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\rho^{\\text{ANC}}}, \\tag{C.5} \\\\ \\text{logit}(\\alpha_{x, a}^{\\text{ANC}}) &= \\text{logit}(\\alpha_{x, F, a}) + \\beta^{\\alpha^{\\text{ANC}}} + \\mathbf{u}_x^{\\alpha^{\\text{ANC}}} + \\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\alpha^{\\text{ANC}}} \\tag{C.6}. \\end{align}\\] Table C.4 provides a description of the terms included in Equation (C.5) and Equation (C.6). Table C.4: Each term in Equations (C.5) and (C.6) together with (where applicable) its prior distribution and a written description of its role. The notation \\(\\theta\\) is used as stand in for \\(\\theta \\in \\{\\rho, \\alpha\\}\\). Term Distribution Description \\(\\beta^{\\theta^{\\text{ANC}}}\\) \\(\\mathcal{N}(0, 5)\\) Intercept giving the average difference between population and ANC outcomes \\(\\mathbf{u}_x^{\\theta^{\\text{ANC}}}\\) \\(\\mathcal{N}(0, \\sigma_X^{\\theta^{\\text{ANC}}})\\) IID district random effects with \\(\\sigma_X^{\\theta^{\\text{ANC}}} \\sim \\mathcal{N}^+(0, 1)\\) \\(\\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\theta^{\\text{ANC}}}\\) \\(-\\) Offsets for the log fertility rate ratios for HIV positive women compared to HIV negative women and for women on ART to HIV positive women not on ART, calculated from Spectrum model outputs for region \\(R_x\\) In the full Naomi model, for adult women 15-49 the number of ANC clients \\(\\Psi_{x, a} > 0\\) were modelled as \\[\\begin{equation} \\log (\\Psi_{x, a}) = \\log (N_{x, \\text{F}, a}) + \\psi_{R_x, a} + \\beta^\\psi + \\mathbf{u}_x^\\psi, \\end{equation}\\] where \\(N_{x, \\text{F}, a}\\) are the female population sizes, \\(\\psi_{R_x, a}\\) are fixed age-sex fertility ratios in Spectrum region \\(R_x\\), \\(\\beta^\\psi\\) are log rate ratios for the number of ANC clients relative to the predicted fertility, and \\(\\mathbf{u}_x^\\psi \\sim \\mathcal{N}(0, \\sigma^\\psi)\\) are district random effects. Here these terms are fixed to \\(\\beta^\\psi = 0\\) and \\(\\mathbf{u}_x^\\psi = \\mathbf{0}\\) such that \\(\\Psi_{x, a}\\) are simply constants. C.4.1.5 ART attendance Let \\(\\gamma_{x, x'} \\in [0, 1]\\) be the probability that a person on ART residing in district \\(x\\) receives ART in district \\(x'\\). Assume that \\(\\gamma_{x, x'} = 0\\) for \\(x \\notin \\{x, \\text{ne}(x)\\}\\) such that individuals seek treatment only in their residing district or its neighbours \\(\\text{ne}(x) = \\{x': x' \\sim x\\}\\), where \\(\\sim\\) is an adjacency relation, and \\(\\sum_{x' \\in \\{x, \\text{ne}(x)\\}} \\gamma_{x, x'} = 1\\). The probabilities \\(\\gamma_{x, x'}\\) for \\(x \\sim x'\\) were modelled using multinomial logistic regression model, based on the log-odds ratios \\[\\begin{equation} \\tilde \\gamma_{x, x'} = \\log \\left( \\frac{\\gamma_{x, x'}}{1 - \\gamma_{x, x'}} \\right) = \\tilde \\gamma_0 + \\mathbf{u}_x^{\\tilde \\gamma}. \\tag{C.7} \\end{equation}\\] Table C.5 provides a description of the terms included in Equation (C.7). Fixing \\(\\tilde \\gamma_{x, x} = 0\\) then the multinomial probabilities may be recovered using the softmax \\[\\begin{equation} \\gamma_{x, x'} = \\frac{\\exp(\\tilde \\gamma_{x, x'})}{\\sum_{x^\\star \\in \\{x, \\text{ne}(x)\\}} \\exp(\\tilde \\gamma_{x, x^\\star})}. \\end{equation}\\] Table C.5: Each term in Equation (C.7) together with, where applicable, its prior distribution and a written description of its role. As no terms include \\(x'\\), \\(\\gamma_{x, x'}\\) is only a function of \\(x\\). Term Distribution Description \\(\\tilde \\gamma_0\\) \\(-\\) Fixed intercept \\(\\tilde \\gamma_0 = -4\\). Implies a prior mean on \\(\\gamma_{x, x'}\\) of 1.8%, such that a-priori \\((100 - 1.8 \\times \\text{ne}(x))\\%\\) of ART clients in district \\(x\\) obtain treatment in their home district \\(\\mathbf{u}_x^{\\tilde \\gamma}\\) \\(\\mathcal{N}(0, \\sigma_X^{\\tilde \\gamma})\\) District random effects, with \\(\\sigma_X^{\\tilde \\gamma} \\sim \\mathcal{N}^+(0, 2.5)\\) C.4.2 Additional likelihood specification Though Section 6.3 provides a complete description of Naomi’s likelihood specification, any additional useful details are provided here. C.4.2.1 Household survey data The generalised binomial \\(y \\sim \\text{xBin}(m, p)\\) is defined for \\(y, m \\in \\mathbb{R}^+\\) with \\(y \\leq m\\) such that \\[\\begin{align} \\log p(y) = &\\log \\Gamma(m + 1) - \\log \\Gamma(y + 1) \\\\ &- \\log \\Gamma(m - y + 1) + y \\log p + (m - y) \\log(1 - p), \\end{align}\\] where the gamma function \\(\\Gamma\\) is such that \\(\\forall n \\in \\mathbb{N}\\), \\(\\Gamma(n) = (n - 1)!\\). C.4.3 Identifiability constraints If data are missing, some parameters are fixed to default values to help with identifiability. In particular: If survey data on HIV prevalence or ART coverage by age and sex are not available then \\(\\mathbf{u}_a^\\theta = 0\\) and \\(\\mathbf{u}_{a, s = \\text{M}}^\\theta = 0\\). In this case, the average age-sex pattern from the Spectrum is used. For the Malawi case-study (Section 6.5), HIV prevalence and ART coverage data are not available for those aged 65+. As a result, there are \\(|\\{\\text{0-4}, \\ldots, \\text{50-54}\\}| = 13\\) age groups included for the age random effects. If no ART data, either survey or ART programme, are available but data on ART coverage among ANC clients are available, the level of ART coverage is not identifiable, but spatial variation is identifiable. In this instance, overall ART coverage is determined by the Spectrum offset, and only area random effects are estimated such that \\[\\begin{equation} \\text{logit} \\left(\\alpha_{x, s, a} \\right) = \\mathbf{u}_x^\\alpha + \\boldsymbol{\\mathbf{\\eta}}_{R_x, s, a}^\\alpha. \\end{equation}\\] If survey data on recent HIV infection are not included in the model, then \\(\\beta_0^\\lambda = \\beta_S^{\\lambda, s = \\text{M}} = 0\\) and \\(\\mathbf{u}_x^\\lambda = \\mathbf{0}\\). The sex ratio for HIV incidence is determined by the sex incidence rate ratio from Spectrum, and the incidence rate in all districts is modelled assuming the same average HIV transmission rate for untreated adults, but varies according to district-level estimates of HIV prevalence and ART coverage. C.4.4 Implementation The TMB C++ code for the negative log-posterior of the simplified Naomi model is available from https://github.com/athowes/naomi-aghq. For ease of understanding, Table C.6 provides correspondence between the mathematical notation used in Section C.4 and the variable names used in the TMB code, for all hyperparameters and latent field parameters. For further reference on the TMB software see Kristensen (2021). Table C.6: Correspondence between the variable name used in the Naomi TMB template and the mathematical notation used in Appendix C.4. The parameter type, either a hyperparameter or element of the latent field, is also given. All of the parameters are defined on the real-scale in some dimension. In the final three columns (\\(\\rho\\), \\(\\alpha\\), and \\(\\lambda\\)) indication is given as to which component of the model the parameter is primarily used in. Variable name Notation Type Domain \\(\\rho\\) \\(\\alpha\\) \\(\\lambda\\) logit_phi_rho_x \\(\\text{logit}(\\phi_X^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_x \\(\\log(\\sigma_X^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_xs \\(\\text{logit}(\\phi_{XS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_xs \\(\\log(\\sigma_{XS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_a \\(\\text{logit}(\\phi_A^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_a \\(\\log(\\sigma_A^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_as \\(\\text{logit}(\\phi_{AS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_as \\(\\log(\\sigma_{AS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_xa \\(\\log(\\sigma_{XA}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_x \\(\\text{logit}(\\phi_X^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_x \\(\\log(\\sigma_X^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_xs \\(\\text{logit}(\\phi_{XS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_xs \\(\\log(\\sigma_{XS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_a \\(\\text{logit}(\\phi_A^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_a \\(\\log(\\sigma_A^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_as \\(\\text{logit}(\\phi_{AS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_as \\(\\log(\\sigma_{AS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_xa \\(\\log(\\sigma_{XA}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes OmegaT_raw \\(\\Omega_T\\) Hyper \\(\\mathbb{R}\\) Yes log_betaT \\(\\log(\\beta_T)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_lambda_x \\(\\log(\\sigma^\\lambda)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_ancrho_x \\(\\log(\\sigma_X^{\\rho^{\\text{ANC}}})\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_ancalpha_x \\(\\log(\\sigma_X^{\\alpha^{\\text{ANC}}})\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_or_gamma \\(\\log(\\sigma_X^{\\tilde \\gamma})\\) Hyper \\(\\mathbb{R}\\) beta_rho \\((\\beta^\\rho_0, \\beta_{s}^{\\rho, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_alpha \\((\\beta^\\alpha_0, \\beta_{S}^{\\alpha, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_lambda \\((\\beta_0^\\lambda, \\beta_S^{\\lambda, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_anc_rho \\(\\beta^{\\rho^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}\\) Yes beta_anc_alpha \\(\\beta^{\\alpha^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}\\) Yes u_rho_x \\(\\mathbf{w}^\\rho_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_rho_x \\(\\mathbf{v}^\\rho_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_rho_xs \\(\\mathbf{w}_x^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_rho_xs \\(\\mathbf{v}_x^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_rho_a \\(\\mathbf{u}^\\rho_a\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_rho_as \\(\\mathbf{u}_a^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_rho_xa \\(\\mathbf{u}_x^{\\rho, a < 15}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_x \\(\\mathbf{w}^\\alpha_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_alpha_x \\(\\mathbf{v}^\\alpha_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_xs \\(\\mathbf{w}_x^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_alpha_xs \\(\\mathbf{v}_x^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_a \\(\\mathbf{u}^\\alpha_a\\) Latent \\(\\mathbb{R}^{13}\\) Yes u_alpha_as \\(\\mathbf{u}_a^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_alpha_xa \\(\\mathbf{u}_x^{\\alpha, a < 15}\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_lambda_x \\(\\mathbf{u}_x^\\lambda\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_anc_rho_x \\(\\mathbf{u}_x^{\\rho^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_anc_alpha_x \\(\\mathbf{u}_x^{\\alpha^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes log_or_gamma \\(\\mathbf{u}_x^{\\tilde \\gamma}\\) Latent \\(\\mathbb{R}^{n}\\) C.5 NUTS convergence and suitability Figure C.5: For NUTS run on the Naomi ELGM, the maximum potential scale reduction factor was 1.021, below the value of 1.05 typically used as a cutoff for acceptable chain mixing, indicating that the results are acceptable to use. Additionally, the vast majority (93.7%) of \\(\\hat R\\) values were less than 1.1. Figure C.6: The efficiency of the NUTS, as measured by the ratio of effective sample size to total number of iterations run, was low for most parameters (Panel A). As a result, the number of iterations required for the the effective number of samples (mean 1265) to be satisfactory was high (Panel B). Figure C.7: Traceplots for the parameter with the lowest ESS which was log_sigma_alpha_xs (an \\(\\text{ESS}\\) of 208, Panel A) and highest potential scale reduction factor which was ui_lambda_x[10] (an \\(\\hat R\\) of 1.021, Panel B). Figure C.8: Pairs plots for the parameters \\(\\log(\\sigma_{A}^\\rho)\\) and \\(\\text{logit}(\\phi_{A}^\\rho)\\), or log_sigma_rho_a and logit_phi_rho_a as implemented in code. These parameters are the log standard deviation and logit lag-one correlation parameter of an AR1 process. In the posterior distribution obtained with NUTS, they have a high degree of correlation. Figure C.9: Pairs plots for the parameters \\(\\log(\\sigma_X^\\alpha)\\) and \\(\\text{logit}(\\phi_X^\\alpha)\\), or log_sigma_alpha_x and logit_phi_alpha_x as implemented in code. These parameters are the log standard deviation and logit BYM2 proportion parameter of a BYM2 process. In the posterior distribution obtained with NUTS, they are close to uncorrelated. Figure C.10: Prior standard deviations were calculated by using NUTS to simulate from the prior distribution. This approach is more convenient than simulating directly from the model, but can lead to inaccuracies. Figure C.11: The posterior contraction for each parameter in the model. Values are averaged for parameters of length greater than one. The posterior contraction is zero when the prior distribution and posterior distribution have the same standard deviation. This could indicate that the data is not informative about the parameter. The closer the posterior contraction is to one, the more than the marginal posterior distribution has concentrated about a single point. C.6 Use of PCA-AGHQ Figure C.12: The standard deviation of the quadrature nodes can be used as a measure of coverage of the posterior marginal distribution. Nodes spaced evenly within the marginal distribution would be expected to uniformly distributed quantile, corresponding to a standard deviation of 0.2867, shown as a dashed line. Figure C.13: The estimated posterior marginal standard deviation of each hyperparameter varied substantially based on its scale, either logarithmic or logistic. Figure C.14: The logarithm of the normalising constant estimated using PCA-AGHQ and a range of possible values of \\(k = 2, 3, 5\\) and \\(s \\leq 8\\). Using this range of settings, there was not convergence of the logarithm of the normalising constant estimate. The time taken by GPCA-AGHQ increases exponentially with number of PCA-AGHQ dimensions kept. C.7 Inference comparison C.7.1 Point estimates Figure C.15: Differences in Naomi model output posterior means as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. In all cases, error is reduced by GPCA-AGHQ, most of all for ART coverage. Figure C.16: Differences in Naomi model output posterior standard deviations as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. Error is increased by GPCA-AGHQ for HIV prevalence and HIV incidence, and reduced for ART coverage. C.7.2 Distributional quantities Figure C.17: The Kolmogorov-Smirnov (KS) test statistic for each latent field parameter is correlated with the effective sample size (ESS) from NUTS, for both GEB and GPCA-AGHQ. This may be because parameters which are harder to estimate with INLA-like methods also have posterior distributions which are more difficult to sample from. Alternatively, it may be that high KS values are caused by inaccurate NUTS estimates generated by limited effective samples. Akaike, Hirotugu. 1973. “Information theory as an extension of the maximum likelihood principle–In: Second International Symposium on Information Theory (Eds) BN Petrov, F.” Csaki. BNPBF Csaki Budapest: Academiai Kiado. Aldor-Noiman, Sivan, Lawrence D Brown, Andreas Buja, Wolfgang Rolke, and Robert A Stine. 2013. “The power to see: A new graphical test of normality.” The American Statistician 67 (4): 249–60. Arambepola, Rohan, Tim CD Lucas, Anita K Nandi, Peter W Gething, and Ewan Cameron. 2022. “A simulation study of disaggregation regression for spatial disease mapping.” Statistics in Medicine 41 (1): 1–16. Auvert, Bertran, Dirk Taljaard, Emmanuel Lagarde, Joelle Sobngwi-Tambekou, Rémi Sitta, and Adrian Puren. 2005. “Randomized, controlled intervention trial of male circumcision for reduction of HIV infection risk: the ANRS 1265 Trial.” PLOS Medicine 2 (11): e298. Bachl, Fabian E, Finn Lindgren, David L Borchers, and Janine B Illian. 2019. “inlabru: an R package for Bayesian spatial modelling from ecological survey data.” Methods in Ecology and Evolution 10 (6): 760–66. Baeten, Jared M, Deborah Donnell, Patrick Ndase, Nelly R Mugo, James D Campbell, Jonathan Wangisi, Jordan W Tappero, et al. 2012. “Antiretroviral Prophylaxis for HIV Prevention in Heterosexual Men and Women.” New England Journal of Medicine 367 (5): 399–410. Bailey, Michael A. 2023. “A New Paradigm for Polling.” Harvard Data Science Review 5 (3). Bailey, Robert C, Stephen Moses, Corette B Parker, Kawango Agot, Ian Maclean, John N Krieger, Carolyn FM Williams, Richard T Campbell, and Jeckoniah O Ndinya-Achola. 2007. “Male circumcision for HIV prevention in young men in Kisumu, Kenya: a randomised controlled trial.” The Lancet 369 (9562): 643–56. Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Baral, Stefan, Chris Beyrer, Kathryn Muessig, Tonia Poteat, Andrea L Wirtz, Michele R Decker, Susan G Sherman, and Deanna Kerrigan. 2012. “Burden of HIV among female sex workers in low-income and middle-income countries: a systematic review and meta-analysis.” The Lancet Infectious Diseases 12 (7): 538–49. Barré-Sinoussi, Françoise, Jean-Claude Chermann, Fran Rey, Marie Therese Nugeyre, Sophie Chamaret, Jacqueline Gruest, Charles Dauguet, et al. 1983. “Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS).” Science 220 (4599): 868–71. Baydin, Atılım Günes, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18 (1): 5595–5637. Bell, Bradley. 2023. “CppAD: a package for C++ algorithmic differentiation.” http://www.coin-or.org/CppAD. Bennett, James E, Helen Tamura-Wicks, Robbie M Parks, Richard T Burnett, C Arden Pope III, Matthew J Bechle, Julian D Marshall, Goodarz Danaei, and Majid Ezzati. 2019. “Particulate matter air pollution and national and county life expectancy loss in the USA: A spatiotemporal analysis.” PLOS Medicine 16 (7): e1002856. Berger, James. 2006. “The Case for objective Bayesian analysis.” Bayesian Analysis 1 (3): 385–402. Berild, Martin Outzen, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. 2022. “Importance Sampling with the Integrated Nested Laplace Approximation.” Journal of Computational and Graphical Statistics 31 (4): 1225–37. Bernardo, José M, and Adrian FM Smith. 2001. Bayesian theory. John Wiley & Sons. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Best, Nicky, Sylvia Richardson, and Andrew Thomson. 2005. “A comparison of Bayesian spatial models for disease mapping.” Statistical Methods in Medical Research 14 (1): 35–59. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. Bhatt, Samir, DJ Weiss, E Cameron, D Bisanzio, B Mappin, U Dalrymple, KE Battle, et al. 2015. “The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015.” Nature 526 (7572): 207–11. Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Bivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied spatial data analysis with R. Springer. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational inference: A review for statisticians.” Journal of the American Statistical Association 112 (518): 859–77. Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12. Bollhöfer, Matthias, Olaf Schenk, Radim Janalik, Steve Hamm, and Kiran Gullapalli. 2020. “State-of-the-art sparse direct solvers.” Parallel Algorithms in Computational Science and Engineering, 3–33. Bosse, Nikos I, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, and Sebastian Funk. 2023. “Scoring epidemiological forecasts on transformed scales.” PLOS Computational Biology 19 (8): e1011393. Bosse, Nikos I., Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. 2022. “Evaluating Forecasts with scoringutils in R.” arXiv. https://arxiv.org/abs/2205.07090. Box, George EP, and Kenneth B Wilson. 1992. “On the experimental attainment of optimum conditions.” In Breakthroughs in Statistics: Methodology and Distribution, 270–310. Springer. Bradley, Valerie C, Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-Li Meng, and Seth Flaxman. 2021. “Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake.” Nature 600 (7890): 695–700. Breslow, Norman E, and David G Clayton. 1993. “Approximate inference in generalized linear mixed models.” Journal of the American Statistical Association 88 (421): 9–25. Brier, Glenn W. 1950. “Verification of forecasts expressed in terms of probability.” Monthly Weather Review 78 (1): 1–3. Brooks, Mollie E, Kasper Kristensen, Koen J Van Benthem, Arni Magnusson, Casper W Berg, Anders Nielsen, Hans J Skaug, Martin Machler, and Benjamin M Bolker. 2017. “glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.” The R Journal 9 (2): 378–400. Brown, Patrick E. 2015. “Model-based geostatistics the easy way.” Journal of Statistical Software 63: 1–24. Broyles, Laura N, Robert Luo, Debi Boeras, and Lara Vojnov. 2023. “The risk of sexual transmission of HIV in individuals with low-level HIV viraemia: a systematic review.” The Lancet. Brugh, Kristen N, Quinn Lewis, Cameron Haddad, Jon Kumaresan, Timothy Essam, and Michelle S Li. 2021. “Characterizing and mapping the spatial variability of HIV risk among adolescent girls and young women: A cross-county analysis of population-based surveys in Eswatini, Haiti, and Mozambique.” PLOS One 16 (12): e0261520. Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01. Bürkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. “Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (14): 2499–2523. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Casella, George. 1985. “An introduction to empirical Bayes data analysis.” The American Statistician 39 (2): 83–87. CDC. 2014. “Understanding the HIV Care Continuum.” CDC. http://www.cdc.gov/hiv/pdf/dhap_continuum.pdf. Chau, Siu Lun, Shahine Bouabid, and Dino Sejdinovic. 2021. “Deconditional downscaling with Gaussian processes.” Advances in Neural Information Processing Systems 34: 17813–25. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chiuchiolo, Cristian, Janet van Niekerk, and Håvard Rue. 2023. “Joint Posterior Inference for Latent Gaussian Models with r-INLA.” Journal of Statistical Computation and Simulation 93 (5): 723–52. Chopin, Nicolas, Omiros Papaspiliopoulos, et al. 2020. An introduction to sequential Monte Carlo. Vol. 4. Springer. Cleland, John, J Ties Boerma, Michel Caraël, and Sharon S Weir. 2004. “Monitoring sexual behaviour in general populations: a synthesis of lessons of the past decade.” Sexually Transmitted Infections 80 (suppl 2): ii1–7. Cohen, Myron S, Ying Q Chen, Marybeth McCauley, Theresa Gamble, Mina C Hosseinipour, Nagalingeswaran Kumarasamy, James G Hakim, et al. 2011. “Prevention of HIV-1 infection with early antiretroviral therapy.” New England Journal of Medicine 365 (6): 493–505. Cooper, Alex, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari. 2024. “Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors.” Bayesian Analysis 1 (1): 1–25. Cramb, SM, EW Duncan, PD Baade, and KL Mengersen. 2018. “Investigation of Bayesian spatial models.” Cancer Council Queensland; Queensland University of Technology (QUT). Crampin, Amelia C, Albert Dube, Sebastian Mboma, Alison Price, Menard Chihana, Andreas Jahn, Angela Baschieri, et al. 2012. “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology 41 (3): 676–85. Cressie, Noel, and Christopher K Wikle. 2015. Statistics for spatio-temporal data. John Wiley & Sons. Csárdi, Gábor. 2023. cranlogs: Download Logs from the ’RStudio’ ’CRAN’ Mirror. Davis, Philip J, and Philip Rabinowitz. 1975. Methods of numerical integration. Academic Press. Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dean, CB, MD Ugarte, and AF Militino. 2001. “Detecting interaction between random region and fixed age effects in disease mapping.” Biometrics 57 (1): 197–202. Dempster, Arthur P, Nan M Laird, and Donald B Rubin. 1977. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological) 39 (1): 1–22. Dennis Jr, John E, David M Gay, and Roy E Walsh. 1981. “An adaptive nonlinear least-squares algorithm.” ACM Transactions on Mathematical Software (TOMS) 7 (3): 348–68. Diaz, Jose Monsalve, Swaroop Pophale, Oscar Hernandez, David E Bernholdt, and Sunita Chandrasekaran. 2018. “OpenMP 4.5 Validation and Verification Suite for Device Offload.” In Evolving OpenMP for Evolving Architectures: 14th International Workshop on OpenMP, IWOMP 2018, Barcelona, Spain, September 26–28, 2018, Proceedings 14, 82–95. Springer. Diggle, Peter J, and Emanuele Giorgi. 2016. “Model-based geostatistics for prevalence mapping in low-resource settings.” Journal of the American Statistical Association 111 (515): 1096–1120. Diggle, Peter J, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. “Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm.” Statistical Science 28 (4): 542–63. Dominguez, Kenneth L., Dawn K. Smith, Vasavi Thomas, Nicole Crepaz, Karen Lang, Walid Heneine, Janet M. McNicholl, et al. 2016. “Updated Guidelines for Antiretroviral Postexposure Prophylaxis After Sexual, Injection Drug Use, or Other Nonoccupational Exposure to HIV—United States, 2016.” https://stacks.cdc.gov/view/cdc/38856. Donegan, Connor. 2022. “geostan: An R package for Bayesian spatial analysis.” The Journal of Open Source Software 7 (79): 4716. https://doi.org/10.21105/joss.04716. Duane, Simon, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195 (2): 216–22. Duncan, Earl W, Nicole M White, and Kerrie Mengersen. 2017. “Spatial smoothing in Bayesian models: a comparison of weights matrix specifications and their impact on inference.” International Journal of Health Geographics 16 (1): 1–16. Dwyer-Lindgren, Laura, Michael A Cork, Amber Sligar, Krista M Steuben, Kate F Wilson, Naomi R Provost, Benjamin K Mayala, et al. 2019. “Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017.” Nature 570 (7760): 189–93. Dwyer-Lindgren, Laura, Abraham D Flaxman, Marie Ng, Gillian M Hansen, Christopher JL Murray, and Ali H Mokdad. 2015. “Drinking patterns in US counties from 2002 to 2012.” American Journal of Public Health 105 (6): 1120–27. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Economist Impact. 2023. “A triple dividend: the health, social and economic gains from financing the HIV response in Africa.” Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Fattah, EA, JV Niekerk, and H Rue. 2022. “Smart gradient-an adaptive technique for improving gradient estimation.” Foundations of Data Science. Fay, Robert E, and Roger A Herriot. 1979. “Estimates of income for small places: an application of James-Stein procedures to census data.” Journal of the American Statistical Association 74 (366a): 269–77. Fisher, Ronald Aylmer. 1936. “Design of experiments.” British Medical Journal 1 (3923): 554. FitzJohn, Rich, Robert Ashton, Alex Hill, Martin Eden, Wes Hinsley, Emma Russell, and James Thompson. 2023. Orderly: Lightweight Reproducible Reporting. Flaxman, Seth R, Yu-Xiang Wang, and Alexander J Smola. 2015. “Who supported Obama in 2012? Ecological inference through distribution regression.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 289–98. Follestad, Turid, and Håvard Rue. 2003. “Modelling spatial variation in disease risk using Gaussian Markov random field proxies for Gaussian random fields.” Fournier, David A, Hans J Skaug, Johnoel Ancheta, James Ianelli, Arni Magnusson, Mark N Maunder, Anders Nielsen, and John Sibert. 2012. “AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models.” Optimization Methods and Software 27 (2): 233–49. Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. “A note on intrinsic conditional autoregressive models for disconnected graphs.” Spatial and Spatio-Temporal Epidemiology 26: 25–34. Fuglstad, Geir-Arne, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2019. “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association 114 (525): 445–52. Gaedke-Merzhäuser, Lisa, Janet van Niekerk, Olaf Schenk, and Håvard Rue. 2023. “Parallelized integrated nested Laplace approximations for fast Bayesian inference.” Statistics and Computing 33 (1): 25. Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, et al. 2023. viridis(Lite) - Colorblind-Friendly Color Maps for R. https://doi.org/10.5281/zenodo.4679423. Gärtner, Thomas, Peter A Flach, Adam Kowalczyk, and Alexander J Smola. 2002. “Multi-instance kernels.” In ICML, 2:7. 3. Gelfand, Alan E, Li Zhu, and Bradley P Carlin. 2001. “On the change of support problem for spatio-temporal data.” Biostatistics 2 (1): 31–45. Gelman, Andrew. 2005. “Analysis of variance—why it is more important than ever.” ———. 2006. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34. ———. 2007. “Struggles with survey weighting and regression modeling.” Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” Statistics and Computing 24 (6): 997–1016. Gelman, Andrew, and Donald B Rubin. 1992. “Inference from iterative simulation using multiple sequences.” Statistical Science, 457–72. Gelman, Andrew, Daniel Simpson, and Michael Betancourt. 2017. “The prior can often only be understood in the context of the likelihood.” Entropy 19 (10): 555. Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian workflow.” arXiv Preprint arXiv:2011.01808. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Giordano, Ryan, Tamara Broderick, and Michael I. Jordan. 2018. “Covariances, Robustness, and Variational Bayes.” Journal of Machine Learning Research 19 (51): 1–49. http://jmlr.org/papers/v19/17-670.html. Global Burden of Disease Collaborative Network. 2019. “Global Burden of Disease Study 2019 (GBD 2019) Results.” Institute for Health Metrics and Evaluation (IHME). https://vizhub.healthdata.org/gbd-results/. Glynn, Judith R, Ndoliwe Kayuni, Emmanuel Banda, Fiona Parrott, Sian Floyd, Monica Francis-Chizororo, Misheck Nkhata, et al. 2011. “Assessing the validity of sexual behaviour reports in a whole population survey in rural Malawi.” PLOS One 6 (7): e22840. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Godfrey-Faussett, Peter, Luisa Frescura, Quarraisha Abdool Karim, Michaela Clayton, Peter D Ghys, and 2025 prevention targets working group). 2022. “HIV Prevention for the Next Decade: Appropriate, Person-Centred, Prioritised, Effective, Combination Prevention.” PLOS Medicine 19 (9): e1004102. Goldstein, Michael. 2006. “Subjective Bayesian analysis: principles and practice.” Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Gómez-Rubio, Virgilio, and Håvard Rue. 2018. “Markov Chain Monte Carlo with the Integrated Nested Laplace Approximation.” Statistics and Computing 28: 1033–51. Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm. Gössl, Christoff, Dorothee P Auer, and Ludwig Fahrmeir. 2001. “Bayesian spatiotemporal inference in functional magnetic resonance imaging.” Biometrics 57 (2): 554–62. Gottlieb, Michael S, Howard M Schanker, Peng Thim Fan, Andrew Saxon, Joel D Weisman, Irving Pozalski, et al. 1981. “Pneumocystis pneumonia—Los Angeles.” Morbidity and Mortality Weekly Report 30 (21): 1–3. Grabowski, M Kate, David M Serwadda, Ronald H Gray, Gertrude Nakigozi, Godfrey Kigozi, Joseph Kagaayi, Robert Ssekubugu, et al. 2017. “HIV prevention efforts and incidence of HIV in Uganda.” New England Journal of Medicine 377 (22): 2154–66. Gray, Ronald H, Godfrey Kigozi, David Serwadda, Frederick Makumbi, Stephen Watya, Fred Nalugoda, Noah Kiwanuka, et al. 2007. “Male circumcision for HIV prevention in men in Rakai, Uganda: a randomised trial.” The Lancet 369 (9562): 657–66. Gregson, Simon, Geoffrey P Garnett, Constance A Nyamukapa, Timothy B Hallett, James JC Lewis, Peter R Mason, Stephen K Chandiwana, and Roy M Anderson. 2006. “HIV decline associated with behavior change in eastern Zimbabwe.” Science 311 (5761): 664–66. Gretton, Arthur, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex Smola. 2006. “A Kernel Method for the Two-Sample-Problem.” Advances in Neural Information Processing Systems 19. Grieve, Richard, Youqi Yang, Sam Abbott, Giridhara R Babu, Malay Bhattacharyya, Natalie Dean, Stephen Evans, et al. 2023. “The Importance of Investing in Data, Models, Experiments, Team Science, and Public Trust to Help Policymakers Prepare for the Next Pandemic.” PLOS Global Public Health 3 (11): e0002601. Haining, Robert P. 2003. Spatial data analysis: theory and practice. Cambridge University Press. Hájek, Jaroslav. 1971. “Discussion of ‘An essay on the logical foundations of survey sampling, part I’.” Foundations of Statistical Inference (Proc. Sympos., Univ. Waterloo, Ontario, 1970), 236. Hamelijnck, O, T Damoulas, K Wang, and MA Girolami. 2019. “Multi-resolution multi-task Gaussian processes.” Advances in Neural Information Processing Systems 32. Hastie, Trevor, and Robert Tibshirani. 1987. “Generalized additive models: some applications.” Journal of the American Statistical Association 82 (398): 371–86. Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57 (1): 97–109. http://www.jstor.org/stable/2334940. Helleringer, Stéphane, Hans-Peter Kohler, Linda Kalilani-Phiri, James Mkandawire, and Benjamin Armbruster. 2011. “The reliability of sexual partnership histories: implications for the measurement of partnership concurrency during surveys.” AIDS (London, England) 25 (4): 503. Hodgins, Caroline, James Stannah, Salome Kuchukhidze, Lycias Zembe, Jeffrey W Eaton, Marie-Claude Boily, and Mathieu Maheu-Giroux. 2022. “Population sizes, HIV prevalence, and HIV prevention among men who paid for sex in sub-Saharan Africa (2000–2020): A meta-analysis of 87 population-based surveys.” PLOS Medicine 19 (1): e1003861. Hoffman, Matthew D, Andrew Gelman, et al. 2014. “The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.” J. Mach. Learn. Res. 15 (1): 1593–623. Howes, Adam. 2023a. arealutils: Utility functions for beyond-borders. ———. 2023b. multi.utils: Utility functions for multi-agyw. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. ICAP. 2023. “Population-based HIV impact assessment: guiding the global HIV response.” https://phia.icap.columbia.edu. Jäckel, Peter. 2005. “A note on multivariate Gauss-Hermite quadrature.” London: ABN-Amro. Re. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Jin, Harry, Arjee Restar, and Chris Beyrer. 2021. “Overview of the Epidemiological Conditions of HIV Among Key Populations in Africa.” Journal of the International AIDS Society 24: e25716. Johnson, L, and RE Dorrington. 2020. “Thembisa version 4.3: A model for evaluating the impact of HIV/AIDS in South Africa.” View Article. Johnson, Olatunji, Peter Diggle, and Emanuele Giorgi. 2019. “A spatially discrete approximation to log-Gaussian Cox processes for modelling aggregated disease count data.” Statistics in Medicine 38 (24): 4871–87. Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Maintainer Alexandros Karatzoglou. 2019. “Package ‘Kernlab’.” CRAN R Project. Kassanjee, Reshma, Thomas A. McWalter, Till Bärnighausen, and Alex Welte. 2012. “A New General Biomarker-Based Incidence Estimator.” Epidemiology 23 (5). Kelsall, Julia, and Jonathan Wakefield. 2002. “Modeling spatial variation in disease risk: a geostatistical approach.” Journal of the American Statistical Association 97 (459): 692–701. Khoury, Muin J, Michael F Iademarco, and William T Riley. 2016. “Precision public health for the era of precision medicine.” American Journal of Preventive Medicine 50 (3): 398–401. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Konstantinoudis, Garyfallos, Dominic Schuhmacher, Håvard Rue, and Ben D Spycher. 2020. “Discrete versus continuous domain models for disease mapping.” Spatial and Spatio-Temporal Epidemiology 32: 100319. Kristensen, Kasper. 2021. “The comprehensive TMB documentation.” https://kaskr.github.io/adcomp/_book/Introduction.html. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Laplace, P. S. 1774. “Memoire sur la probabilite de causes par les evenements.” Memoire de l’Academie Royale Des Sciences. Law, Ho Chung, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. “Variational learning on aggregate outputs with Gaussian processes.” Advances in Neural Information Processing Systems 31. Lee, Duncan. 2011. “A comparison of conditional autoregressive models used in Bayesian disease mapping.” Spatial and Spatio-Temporal Epidemiology 2 (2): 79–89. Lenth, Russell. 2009. “Response-Surface Methods in R, Using rsm.” Journal of Statistical Software 32 (7): 1–17. https://doi.org/10.18637/jss.v032.i07. Leppik, IE, FE Dreifuss, T Bowman-Cloyd, N Santilli, M Jacobs, C Crosby, J Cloyd, et al. 1985. “A double-blind crossover evaluation of progabide in partial seizures.” Neurology 35 (4): 285. Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of disease rates in small areas: a new mixed model for spatial dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer. Li, Ye, Patrick Brown, Dionne C Gesink, and Håvard Rue. 2012. “Log Gaussian Cox processes and spatially aggregated disease incidence data.” Statistical Methods in Medical Research 21 (5): 479–507. https://doi.org/10.1177/0962280212446326. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Margossian, Charles C. 2019. “A review of automatic differentiation and its efficient implementation.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (4): e1305. Margossian, Charles C, and Andrew Gelman. 2023. “For How Many Iterations Should We Run Markov Chain Monte Carlo?” arXiv Preprint arXiv:2311.02726. Margossian, Charles, Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. “Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.” Advances in Neural Information Processing Systems 33: 9086–97. Martin, Gael M, David T Frazier, and Christian P Robert. 2023. “Computing Bayes: From then ‘til now.” Statistical Science 1 (1): 1–17. Martino, Sara, and Andrea Riebler. 2020. “Integrated Nested Laplace Approximations (INLA).” In Wiley StatsRef: Statistics Reference Online, 1–19. John Wiley & Sons, Ltd. https://doi.org/https://doi.org/10.1002/9781118445112.stat08212. Martino, Sara, and Håvard Rue. 2009. “Implementing approximate Bayesian inference using Integrated Nested Laplace Approximation: A manual for the inla program.” Department of Mathematical Sciences, NTNU, Norway. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. Mayala, Benjamin K., Samir Bhatt, and Peter Gething. 2020. “Predicting HIV/AIDS at Subnational Levels using DHS Covariates related to HIV.” DHS Spatial Analysis Reports 18. Rockville, Maryland, USA: ICF. McCullagh, Peter, and John A Nelder. 1989. Generalized linear models. Routledge. McElreath, Richard. 2020. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. McGillen, Jessica B, John Stover, Daniel J Klein, Sinokuthemba Xaba, Getrude Ncube, Mutsa Mhangara, Geraldine N Chipendo, et al. 2018. “The Emerging Health Impact of Voluntary Medical Male Circumcision in Zimbabwe: An Evaluation Using Three Epidemiological Models.” PLOS One 13 (7): e0199453. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Metropolis, Nicholas, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” J. Chem. Phys 21: 1087. Meyer-Rath, Gesine, Jessica B McGillen, Diego F Cuadros, Timothy B Hallett, Samir Bhatt, Njeri Wabiri, Frank Tanser, and Thomas Rehle. 2018. “Targeting the Right Interventions to the Right People and Places: The Role of Geospatial Analysis in HIV Program Planning.” AIDS (London, England) 32 (8): 957. Minka, Thomas P. 2001. “Expectation Propagation for approximate Bayesian inference.” In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 362–69. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Monod, Mélodie, Andrea Brizzi, Ronald M. Galiwango, Robert Ssekubugu, Yu Chen, Xiaoyue Xi, Edward Nelson Kankaka, et al. 2023. “Longitudinal Population-Level HIV Epidemiologic and Genomic Surveillance Highlights Growing Gender Disparity of HIV Transmission in Uganda.” Nature Microbiology. Morris, Mitzi, Katherine Wheeler-Martin, Dan Simpson, Stephen J. Mooney, Andrew Gelman, and Charles DiMaggio. 2019. “Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan.” Spatial and Spatio-Temporal Epidemiology 31: 100301. https://doi.org/https://doi.org/10.1016/j.sste.2019.100301. Nandi, Anita K, Tim CD Lucas, Rohan Arambepola, Peter Gething, and Daniel J Weiss. 2023. “disaggregation: An R Package for Bayesian Spatial Disaggregation Modeling.” Journal of Statistical Software 106: 1–19. Naylor, John C, and Adrian FM Smith. 1982. “Applications of a method for the efficient computation of posterior distributions.” Journal of the Royal Statistical Society Series C: Applied Statistics 31 (3): 214–25. Neal, Radford M. 2003. “Slice sampling.” The Annals of Statistics 31 (3): 705–67. Neal, Radford M et al. 2011. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (11): 2. Nguyen, Van Kính, and Jeffrey W. Eaton. 2022. “Trends and country-level variation in age at first sex in sub-Saharan Africa among birth cohorts entering adulthood between 1985 and 2020.” BMC Public Health 22 (1): 1120. https://doi.org/10.1186/s12889-022-13451-y. Nnko, Soori, J Ties Boerma, Mark Urassa, Gabriel Mwaluko, and Basia Zaba. 2004. “Secretive females or swaggering males?: An assessment of the quality of sexual partnership reporting in rural Tanzania.” Social Science & Medicine 59 (2): 299–310. Noor, Abdisalan Mohamed. 2022. “Country Ownership in Global Health.” PLOS Global Public Health 2 (2): e0000113. Okabe, Masataka, and Kei Ito. 2008. “Color Universal Design (CUD): How to Make Figures and Presentations That Are Friendly to Colorblind People.” 2008. http://jfly.iam.u-tokyo.ac.jp/color/. Openshaw, S, and P. J. Taylor. 1979. “A million or so correlation coefficients, three experiments on the modifiable areal unit problem.” Statistical Applications in the Spatial Science, 127–44. Ord, Toby. 2013. “The moral imperative toward cost-effectiveness in global health.” Center for Global Development 12. Organization, World Health et al. 2022. Consolidated Guidelines on HIV, Viral Hepatitis and STI Prevention, Diagnosis, Treatment and Care for Key Populations. World Health Organization. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Paciorek, Christopher J et al. 2013. “Spatial models for point and areal data using Markov random fields on a fine grid.” Electronic Journal of Statistics 7: 946–72. Paciorek, Christopher J., and Mark J. Schervish. 2006. “Spatial modelling using a new class of nonstationary covariance functions.” Environmetrics 17 (5): 483–506. https://doi.org/https://doi.org/10.1002/env.785. Parks, Robbie M, James E Bennett, Helen Tamura-Wicks, Vasilis Kontis, Ralf Toumi, Goodarz Danaei, and Majid Ezzati. 2020. “Anomalously warm temperatures are associated with increased injury deaths.” Nature Medicine 26 (1): 65–70. Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009. Pebesma, Edzer J. 2004. “Multivariable geostatistics in S: the gstat package.” Computers & Geosciences 30: 683–91. Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016. Pettit, LI. 1990. “The conditional predictive ordinate for the normal distribution.” Journal of the Royal Statistical Society: Series B (Methodological) 52 (1): 175–84. Pfeffermann, Danny et al. 2013. “New Important Developments in Small Area Estimation.” Statistical Science 28 (1): 40–68. Pisani, Elizabeth, Stefano Lazzari, Neff Walker, and Bernhard Schwartländer. 2003. “HIV surveillance: a global perspective.” Journal of Acquired Immune Deficiency Syndromes 32: S3–11. Porcu, Emilio, Reinhard Furrer, and Douglas Nychka. 2021. “30 Years of space–time covariance functions.” Wiley Interdisciplinary Reviews: Computational Statistics 13 (2): e1512. Press, William H, Teukolsky Saul A, William T Vetterling, and Brian P Flannery. 2007. Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press. R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. Rashid, T, JE Bennett, D Muller, A Cross, J Pearson-Stuttard, H Daby, D Fecht, B Davies, and M Ezzati. 2023. “Inequalities in mortality from leading cancers in districts of England from 2002 to 2019: population-based high-resolution spatiotemporal analysis of vital registration data.” The Lancet Oncology. http://hdl.handle.net/10044/1/107364. Riebler, Andrea, Sigrunn H Sørbye, Daniel Simpson, and Håvard Rue. 2016. “An intuitive Bayesian spatial model for disease mapping that accounts for scaling.” Statistical Methods in Medical Research 25 (4): 1145–65. Risher, Kathryn A, Anne Cori, Georges Reniers, Milly Marston, Clara Calvert, Amelia Crampin, Tawanda Dadirai, et al. 2021. “Age patterns of HIV incidence in eastern and southern Africa: a modelling analysis of observational population-based cohort studies.” The Lancet HIV 8 (7): e429–39. Robert, Christian P, and George Casella. 2005. “Monte Carlo Statistical Methods (Springer Texts in Statistics).” Springer. Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. “General state space Markov chains and MCMC algorithms.” Probability Surveys 1 (none): 20–71. https://doi.org/10.1214/154957804100000024. Roy, Vivekananda. 2020. “Convergence diagnostics for Markov chain Monte Carlo.” Annual Review of Statistics and Its Application 7: 387–412. Rue, Havard. 2023. “‘R-INLA‘ Project - FAQ.” https://www.r-inla.org/faq. Rue, Håvard. 2001. “Fast sampling of Gaussian Markov random fields.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 325–38. ———. 2020. “Comment on R-INLA Discussion Group thread.” Rue, Håvard, and Turid Follestad. 2001. “GMRFLib: a C-library for fast and exact simulation of Gaussian Markov random fields.” SIS-2002-236. Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Rue, Håvard, and Sara Martino. 2007. “Approximate Bayesian inference for hierarchical Gaussian Markov random field models.” Journal of Statistical Planning and Inference 137 (10): 3177–92. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Rue, Håvard, Andrea Riebler, Sigrunn H Sørbye, Janine B Illian, Daniel P Simpson, and Finn K Lindgren. 2017. “Bayesian computing with INLA: a review.” Annual Review of Statistics and Its Application 4: 395–421. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saracco, James F, J Andrew Royle, David F DeSante, and Beth Gardner. 2010. “Modeling spatial variation in avian survival and residency probabilities.” Ecology 91 (7): 1885–91. Saul, Janet, Gretchen Bachman, Shannon Allen, Nora F Toiv, Caroline Cooney, and Ta’Adhmeeka Beamon. 2018. “The DREAMS core package of interventions: a comprehensive approach to preventing HIV among adolescent girls and young women.” PLOS One 13 (12): e0208167. Saunders, Daniel. 2023. “The Besag-York-Mollie Model for Spatial Data.” In PyMC Examples, edited by PyMC Team. https://doi.org/10.5281/zenodo.5654871. Schad, Daniel J, Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103. Schlüter, Daniela K, Martial L Ndeffo-Mbah, Innocent Takougang, Tony Ukety, Samuel Wanji, Alison P Galvani, and Peter J Diggle. 2016. “Using community-level prevalence of Loa loa infection to predict the proportion of highly-infected individuals: statistical modelling to support lymphatic filariasis and onchocerciasis elimination programs.” PLOS Neglected Tropical Diseases 10 (12): e0005157. Schmid, Volker J, Brandon Whitcher, Anwar R Padhani, N Jane Taylor, and Guang-Zhong Yang. 2006. “Bayesian methods for pharmacokinetic models in dynamic contrast-enhanced magnetic resonance imaging.” IEEE Transactions on Medical Imaging 25 (12): 1627–36. Shapley, Lloyd S et al. 1953. “A value for n-person games.” Princeton University Press Princeton. Shumway, Robert H, and David S Stoffer. 2017. Time Series Analysis and Its Applications With R Examples. Springer. Siegfried, Nandi, Lize van der Merwe, Peter Brocklehurst, and Tin Tin Sint. 2011. “Antiretrovirals for reducing the risk of mother-to-child transmission of HIV infection.” Cochrane Database of Systematic Reviews, no. 7. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Sisson, Scott A, Yanan Fan, and Mark Beaumont. 2018. Handbook of approximate Bayesian computation. CRC Press. Skaug, Hans J. 2009. “Discussion of \"Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations\".” In Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:319–92. 2. Wiley Online Library. Slaymaker, Emma, Kathryn A. Risher, Ramadhani Abdul, Milly Marston, Keith Tomlin, Robert Newton, Anthony Ndyanabo, et al. 2020. “Risk factors for new HIV infections in the general population in sub-Saharan Africa.” Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” Annals of Mathematical Statistics 19 (2): 279–81. Smith, Nathaniel, and Stéfan van der Walt. 2015. “A Better Default Colormap for Matplotlib.” In Proceedings of the 14th Python in Science Conference (SciPy). Sørbye, Sigrunn Holbek, and Håvard Rue. 2014. “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics 8: 39–51. ———. 2017. “Penalised complexity priors for stationary autoregressive processes.” Journal of Time Series Analysis 38 (6): 923–35. Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Spiegelhalter, David, Andrew Thomas, Nicky Best, and Wally Gilks. 1996. “BUGS 0.5 Examples.” MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK 256. Stan Development Team. 2023. Stan Reference Manual. https://mc-stan.org/docs/reference-manual/index.html. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. Stover, John, Robert Glaubius, Lynne Mofenson, Caitlin M Dugdale, Mary-Ann Davies, Gabriela Patten, and Constantin Yiannoutsos. 2019. “Updates to the Spectrum/AIM model for estimating key HIV indicators at national and subnational levels.” AIDS (London, England) 33 (Suppl 3): S227. Stover, John, and Yu Teng. 2021. “The impact of condom use on the HIV epidemic.” Gates Open Research 5. Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tanaka, Yusuke, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. “Spatially aggregated Gaussian processes with multivariate areal outputs.” In Advances in Neural Information Processing Systems, 3005–15. Tanser, Frank, Tulio de Oliveira, Mathieu Maheu-Giroux, and Till Bärnighausen. 2014. “Concentrated HIV sub-epidemics in generalized epidemic settings.” Current Opinion in HIV and AIDS 9 (2): 115. Tatem, Andrew J. 2017. “WorldPop, open data for spatial demography.” Scientific Data 4 (1): 1–4. Teh, Yee Whye, Bryn Elesedy, Bobby He, Michael Hutchinson, Sheheryar Zaidi, Avishkar Bhoopchand, Ulrich Paquet, Nenad Tomasev, Jonathan Read, and Peter J. Diggle. 2022. “Efficient Bayesian inference of Instantaneous Reproduction Numbers at Fine Spatial Scales, with an Application to Mapping and Nowcasting the Covid-19 Epidemic in British Local Authorities.” Journal of the Royal Statistical Society Series A: Statistics in Society 185 (1): S65–85. https://doi.org/10.1111/rssa.12971. Thall, Peter F, and Stephen C Vail. 1990. “Some covariance models for longitudinal count data with overdispersion.” Biometrics, 657–71. The Global Fund. 2018. The Global Fund Measurement Framework for Adolescent Girls and Young Women Programs. https://www.theglobalfund.org/media/8076/me\\%5Fadolescentsgirlsandyoungwomenprograms\\%5Fframeworkmeasurement\\%5Fen.pdf. Thigpen, Michael C, Poloko M Kebaabetswe, Lynn A Paxton, Dawn K Smith, Charles E Rose, Tebogo M Segolodi, Faith L Henderson, et al. 2012. “Antiretroviral Preexposure Prophylaxis for Heterosexual HIV Transmission in Botswana.” New England Journal of Medicine 367 (5): 423–34. Thyng, Kristen M, Chad A Greene, Robert D Hetland, Heather M Zimmerle, and Steven F DiMarco. 2016. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29 (3): 9–13. Tierney, Luke, and Joseph B Kadane. 1986. “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association 81 (393): 82–86. Tobler, Waldo R. 1970. “A computer movie simulating urban growth in the Detroit region.” Economic Geography 46 (sup1): 234–40. Tokdar, Surya T, and Robert E Kass. 2010. “Importance sampling: a review.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (1): 54–60. UN General Assembly. 2016. “Political Declaration on HIV and AIDS: On the Fast Track to Accelerate the Fight Against HIV and to End the AIDS Epidemic by 2030.” In. UNAIDS. 2014. “90-90-90. An ambitious treatment target to help end the AIDS epidemic.” UNAIDS. 2021a. “2021 UNAIDS Global AIDS Update - Confronting Inequalities - Lessons for pandemic responses from 40 Years of AIDS.” Geneva, Switzerland. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” UNAIDS. 2022. “In Danger: UNAIDS Global AIDS Update 2022.” https://www.unaids.org/en/resources/documents/2022/in-danger-global-aids-update. ———. 2023a. “AIDSinfo: Global data on HIV epidemiology and response.” https://aidsinfo.unaids.org/. ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. UNAIDS and WHO. 2021. “Voluntary Medical Male Circumcision Progress Brief.” UNAIDS. https://hivpreventioncoalition.unaids.org/wp-content/uploads/2021/04/JC3022_VMMC_4-pager_En_v3.pdf. UNAIDS, WHO, et al. 2022. Using Recency Assays for HIV Surveillance: 2022 Technical Guidance. World Health Organization. UNICEF. 2019. “Adolescent & social norms situation in Mozambique.” https://www.unicef.org/mozambique/en/adolescent-social-norms. U.S. Department of State. 2022. “Latest Global Program Results.” https://www.state.gov/wp-content/uploads/2022/11/PEPFAR-Latest-Global-Results_December-2022.pdf. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. Utazi, C Edson, Julia Thorley, VA Alegana, MJ Ferrari, Kristine Nilsen, Saki Takahashi, CJE Metcalf, Justin Lessler, and AJ Tatem. 2019. “A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping.” Statistical Methods in Medical Research 28 (10-11): 3226–41. Valpine, Perry de, Daniel Turek, Christopher J Paciorek, Clifford Anderson-Bergman, Duncan Temple Lang, and Rastislav Bodik. 2017. “Programming with models: writing statistical algorithms for general model structures with NIMBLE.” Journal of Computational and Graphical Statistics 26 (2): 403–13. Van Niekerk, Janet, Elias Krainski, Denis Rustand, and Håvard Rue. 2023. “A new avenue for Bayesian inference with INLA.” Computational Statistics & Data Analysis 181: 107692. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys 6 (none): 142–228. https://doi.org/10.1214/12-SS102. Wakefield, J, and S Morris. 1999. “Spatial dependence and errors-in-variables in environmental epidemiology.” Bayesian Statistics 6: 657–84. Wakefield, Jonathan, and Hilary Lyons. 2010. “Spatial Aggregation and the Ecological Fallacy.” In Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 2010:541–58. https://doi.org/10.1201/9781420072884-c30. Ward, Brian. 2023. bridgestan: BridgeStan, Accessing Stan Model Functions in R. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Weiser, Constantin. 2016. mvQuad: Methods for Multivariate Quadrature. http://CRAN.R-project.org/package=mvQuad. Weiss, Daniel J, Bonnie Mappin, Ursula Dalrymple, Samir Bhatt, Ewan Cameron, Simon I Hay, and Peter W Gething. 2015. “Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach.” Malaria Journal 14 (1): 1–18. WHO and UNAIDS. 2007. “New Data on Male Circumcision and HIV Prevention: Policy and Programme Implications.” Geneva: World Health Organization. Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. Wilson, Katie, and Jon Wakefield. 2018. “Pointless spatial modeling.” Biostatistics 21 (2): e17–32. https://doi.org/10.1093/biostatistics/kxy041. Wolock, Timothy M, Seth Flaxman, Kathryn A Risher, Tawanda Dadirai, Simon Gregson, and Jeffrey W Eaton. 2021. “Evaluating distributional regression strategies for modelling self-reported sexual age-mixing.” Edited by Eduardo Franco, Talía Malagón, and Adam Akullian. eLife 10 (June): e68318. https://doi.org/10.7554/eLife.68318. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. Wringe, A, I Cremin, J Todd, N McGrath, I Kasamba, K Herbst, P Mushore, B Żaba, and E Slaymaker. 2009. “Comparative assessment of the quality of age-at-event reporting in three HIV cohort studies in sub-Saharan Africa.” Sexually Transmitted Infections 85 (Suppl 1): i56–63. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. Yousefi, Fariba, Michael T Smith, and Mauricio Alvarez. 2019. “Multi-task learning for aggregated data using Gaussian processes.” Advances in Neural Information Processing Systems 32. Zaba, Basia, Elizabeth Pisani, Emma Slaymaker, and J Ties Boerma. 2004. “Age at first sex: understanding recent trends in African demographic surveys.” Sexually Transmitted Infections 80 (suppl 2): ii28–35. References Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Kristensen, Kasper. 2021. “The comprehensive TMB documentation.” https://kaskr.github.io/adcomp/_book/Introduction.html. Stover, John, Robert Glaubius, Lynne Mofenson, Caitlin M Dugdale, Mary-Ann Davies, Gabriela Patten, and Constantin Yiannoutsos. 2019. “Updates to the Spectrum/AIM model for estimating key HIV indicators at national and subnational levels.” AIDS (London, England) 33 (Suppl 3): S227. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "Bayesian spatio-temporal methods for small-area estimation of HIV indicators Welcome Acknowledgements Abbreviations Notations", " Bayesian spatio-temporal methods for small-area estimation of HIV indicators Adam Thomas Howes Abstract Progress towards ending AIDS as a public health threat by 2030 is not being made fast enough. Effective public health response requires accurate, timely, high-resolution estimates of epidemic and demographic indicators. Limitations of available data and statistical methodology make obtaining these estimates difficult. I developed and applied Bayesian spatio-temporal methods to meet this challenge. First, I used scoring rules to compare models for area-level spatial structure with both simulated and real data. Second, I estimated district-level HIV risk group proportions, enabling behavioural prioritisation of prevention services, as put forward in the UNAIDS Global AIDS Strategy. Third, I developed a novel deterministic Bayesian inference method, combining adaptive Gauss-Hermite quadrature with principal component analysis, motivated by the Naomi district-level model of HIV indicators. In developing this method, I implemented integrated nested Laplace approximations using automatic differentiation, enabling use of this algorithm for a wider class of models. Together, the contributions in this thesis help to guide precision HIV policy in sub-Saharan Africa, as well as advancing Bayesian methods for spatio-temporal data. Welcome This is the e-book version of my PhD thesis, submitted to Imperial College London in accordance with the requirements of the degree of Doctor of Philosophy in Modern Statistics and Statistical Machine Learning. If you would prefer, you can view the PDF version. The associated GitHub repository for this thesis is athowes/thesis. A concise introduction to the work is available via my thesis defense slides, or (slightly less concise) longer slides for a lab group meeting. The corrections for this thesis are also available online. If you notice any typos or other issues with the work, feel free to open an issue on GitHub, or submit a pull request. Acknowledgements I would first like to express my gratitude to Seth Flaxman and Jeff Imai-Eaton for their mentorship. Their guidance has been crucial in shaping this thesis, and my development as a scientist. Thanks to the HIV Inference Group at Imperial for exposing me to impact driven research, helping me to learn to present my work, and tolerating a statistician. I am grateful to have been a part of the Modern Statistics and Statistical Machine Learning Centre for Doctoral Training at Imperial and Oxford, and the Machine Learning and Global Health Network. Thanks to Antoine, Chris, Enrico, Phil, Yanni, Tim, Liza, and Theo for conversations, some of which were about research. This work was made possible by funding provided by the EPSRC and Bill & Melinda Gates Foundation. There are many worse ways to spend billions of dollars than fighting poverty and disease. Thanks to Mike McLaren, Kevin Esvelt, the Nucleic Acid Observatory team, and the Sculpting Evolution lab for hosting my visit to the MIT Media Lab. I left Cambridge with appropriately raised aspirations, Google document templates, and only a little terrified about the future. Thanks to Trenton, Lenni, Lenny, Geetha, Janika, Simon, Phil, Frances, Leilani and Tammy. Thanks to Alex Stringer, and the Department of Statistics and Actuarial Science, for hosting my visit to the University of Waterloo. Without Alex, Chapter 6 would not have been possible, and I’d still be waiting Markov chains began in Chapter 4 to converge. Tim Lucas and Patrick Brown put me in touch with Alex, and Håvard Rue and Finn Lindgren gave helpful answers on the R-INLA discussion group. Thanks also to Kate, my tour guide in Waterloo, and Midtown Yoga for helping me stay balanced. My sense for what matters has been shaped, and arguably improved, by the Effective Altruism community. Thank you to the Meridian, Trajan, and LEAH offices for hosting me this final year. Thanks to my housemates in Hackney: August, Dewi, Henry, Jerome, Johnny, and Tamara. Not to be all Bay area, but I’m proud of the community we’ve built. Pınar believed in me and my research at times when I didn’t. Thanks to Mr Sam, and attendees of the Manshead grit salt, for conferring upon me the status of stats man. No thanks to Simon Marshall, he didn’t help, if anything he held me back. I extend my deepest thanks to my parents, Deborah and Karl, and my grandparents, Kath and Tony, whose love and support have granted me the privilege to pursue my interests. Abbreviations Abbreviation Definition AIDS Acquired ImmunoDeficiency Syndrome AIS AIDS Indicator Survey ANC Antenatal Clinic AGHQ Adaptive Gauss-Hermite Quadrature ART Antiretroviral Therapy BIC Bayesian Information Criterion BF Bayes Factor CAR Conditionally Auto-regressive CCD Central Composite Design CDC Centers for Disease Control and Prevention CPO Conditional Predictive Ordinate CRPS Continuous Ranked Probability Score DALY Disability Adjusted Life Year DDC Data Defect Correlation DHS Demographic and Health Surveys DIC Deviance Information Criterion EB Empirical Bayes ECDF Empirical Cumulative Difference Function ELGM Extended Latent Gaussian Model ESS Effective Sample Size FSW Female Sex Worker(s) GA Gaussian Process GLM Generalised Linear Model GLMM Generalised Linear Mixed effects Model GMRF Gaussian Markov Random Field Global Fund Global Fund to Fight AIDS, Tuberculosis, and Malaria HMC Hamiltonian Monte Carlo HIV Human Immunodeficiency Virus ICAR Intrinsic Conditionally Auto-regressive IID Independent and Identically Distributed INLA Integrated Nested Laplace Approximation LM Linear Model LGM Latent Gaussian Model LS Log Score MCMC Markov Chain Monte Carlo MSM Men who have Sex with Men NUTS No-U-Turn Sampler PEP Post-Exposure Prophylaxis PEPFAR President’s Emergency Plan for AIDS Relief PHIA Population-based HIV Impact Assessment PIT Probability Integral Transform PLHIV People Living with HIV PPL Probabilistic Programming Language PrEP Pre-Exposure Prophylaxis PMTCT Prevention of Mother-to-Child Transmission PWID People Who Inject Drugs SAE Small-Area Estimation SR Scoring Rule SPSR Strictly Proper Scoring Rule SSA Sub-Saharan Africa STI Sexually Transmitted Infection TGP Transgender People TaSP Treatment as Prevention UNAIDS The Joint United Nations Programme on HIV/AIDS VI Variational Inference VMMC Voluntary Medical Male Circumcision WAIC Watanabe-Akaike Information Criterion Notations Notation Definition \\(\\propto\\) Proportional to. \\(\\mathbb{R}\\) The set of real numbers. \\(\\mathbb{Z}\\) The set of integers. \\(\\mathbb{Z}^+\\) The set of positive integers. \\(\\rho\\) HIV prevalence. \\(\\lambda\\) HIV incidence. \\(\\alpha\\) ART coverage. \\(\\mathcal{S}\\) Spatial study region \\(\\mathcal{S} \\subseteq \\mathbb{R}^2\\). \\(s \\in \\mathcal{S}\\) Point location. \\(\\mathcal{T}\\) Temporal study period \\(\\mathcal{T} \\subseteq \\mathbb{R}\\). \\(t \\in \\mathcal{T}\\) Time. \\(\\mathbf{y}\\) Data, a \\(n\\)-vector \\((y_1, \\ldots, y_n)\\). \\(\\boldsymbol{\\phi}\\) Parameters, a \\(d\\)-vector \\((\\phi_1, \\ldots, \\phi_d)\\). \\(\\mathbf{x}\\) Latent field, a \\(N\\)-vector \\((x_1, \\ldots, x_N)\\). \\(\\boldsymbol{\\theta}\\) Hyperparameters, a \\(m\\)-vector \\((\\theta_1, \\ldots, \\theta_m)\\). \\(x \\sim p(x)\\) \\(x\\) has the probability distribution \\(p(x)\\). \\(A_i\\) Areal unit. \\(A_i \\sim A_j\\) Adjacency between areal units. \\(\\mathbf{u}\\) Random effects, often spatial. \\(\\mathbf{H}\\) Hessian matrix. \\(\\mathbf{R}\\) Structure matrix. \\(\\mathbf{Q}\\) Precision matrix. \\(\\boldsymbol{\\mathbf{\\Sigma}}\\) Covariance matrix. \\(\\mathbf{M}^{-}\\) The generalised inverse of a (potentially rank-deficient) matrix \\(\\mathbf{M}\\). \\(\\mathcal{N}\\) Gaussian distribution. \\(k: \\mathcal{X} \\times \\mathcal{X} \\to \\mathbb{R}\\) Kernel function on the space \\(\\mathcal{X}\\). \\(A_i \\sim A_j\\) Adjacency between areal units. \\(\\mathcal{Q}\\) A set of quadrature nodes. \\(\\omega: \\mathcal{Q} \\to \\mathbb{R}\\) A quadrature weighting function. \\(\\mathcal{Q}(m, k)\\) Gauss-Hermite quadrature points in \\(m\\) dimensions with \\(k\\) nodes per dimension, constructed according to a product rule. \\(\\varphi\\) A standard (multivariate) Gaussian density. "],["introduction.html", "1 Introduction 1.1 Chapter overview", " 1 Introduction This thesis is about applied and methodological Bayesian statistics. It is applied and methodological in that the primary concern is real-world questions and the means to answer them. The statistical approach is Bayesian because probability theory is used to arrive at conclusions based on models for observed data. The applied focus of this thesis is in obtaining the strategic information needed to plan the response to the HIV (human immunodeficiency virus) epidemic in sub-Saharan Africa (SSA). Over 40 years since the beginning of the epidemic, HIV is the largest annual cause of disability adjusted life years (DALYs) among non-infants in SSA [Global Burden of Disease Collaborative Network (2019); Figure 1.1]. Quantification of the epidemic using statistics is a crucial part of the public health response. Effective implementation of HIV prevention and treatment requires strategic information. However, producing suitable estimates of relevant indicators is complicated by a range of statistical challenges. Figure 1.1: HIV is the largest cause of annual DALYs among individuals aged >1 year in SSA (Global Burden of Disease Collaborative Network 2019). One DALY represents the loss of the equivalent of one year of full health, and is calculated by the sum of years of life lost and years lost due to disability. Weights used to account for disability vary between 0 (full health) and 1 (death) depending on the severity of the condition. The data used were gathered in national household surveys or routinely collected from healthcare facilities providing HIV services. An important feature of these data are the location and time at which observations were recorded. Spatio-temporal data have important recurring commonalities across a diverse range of application settings. The work conducted in this thesis uses and aspires to contribute to techniques from spatio-temporal statistics. Computation is an essential part of modern statistical practice. Each project in this thesis, and the thesis itself, is accompanied by R (R Core Team 2022) code, hosted on GitHub at https://github.com/athowes. To facilitate reproducible research, the R package orderly (FitzJohn et al. 2023) was used to structure code repositories. 1.1 Chapter overview This thesis is structured as follows: Chapter 2 provides an overview of the HIV/AIDS epidemic and describes the challenges faced by surveillance efforts. Chapter 3 introduces the statistical concepts and notation used throughout the thesis, focusing on Bayesian modelling and computation, spatio-temporal statistics, and survey methods. Chapter 4: The prevailing model for spatial structure used in small-area estimation (Besag, York, and Mollié 1991) was intended to analyse a grid of pixels. In disease mapping, areas correspond to the administrative divisions of a country, which are typically not a grid. I used simulation and survey data studies to evaluate the practical consequences of this concern. Chapter 5: Adolescent girls and young women are a demographic group at disproportionate risk of HIV infection. The Global AIDS Strategy recommends prioritising interventions on the basis of behaviour to prevent the most new infections using the limited available resources. I estimated the size of behavioural risk groups across priority countries to enable implementation of this strategy. Additionally, I assessed the potential benefits of the strategy in terms of numbers of new infections prevented. This work (Howes et al. 2023) was included in the UNAIDS (Joint United Nations Programme on HIV/AIDS) Global AIDS Update 2022 and 2023. Chapter 6: The Naomi small-area estimation model (Eaton et al. 2021) is used by countries to estimate district-level HIV indicators. First, to allow for compatibility with Naomi, I implemented the integrated nested Laplace approximations using automatic differentiation, opening the door to a new class of fast, flexible, and accurate Bayesian inference algorithms. The implementation was using models for a clinical trial of an epilepsy drug, and for the prevalence of the parasitic worm Loa loa. Second, I developed an approximate Bayesian inference method combining adaptive Gauss-Hermite quadrature with principal components analysis. I applied these methods to data from Malawi, and analysed the consequences of the inference method choice for policy relevant outcomes. Chapter 7: Finally, I discuss contributions of the research, avenues for future work, and some broader reflections. Though chronological order is recommended, Chapters 4, 5 and 6 may be read in any order, or as stand-alone studies, if preferred. References Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. FitzJohn, Rich, Robert Ashton, Alex Hill, Martin Eden, Wes Hinsley, Emma Russell, and James Thompson. 2023. Orderly: Lightweight Reproducible Reporting. Global Burden of Disease Collaborative Network. 2019. “Global Burden of Disease Study 2019 (GBD 2019) Results.” Institute for Health Metrics and Evaluation (IHME). https://vizhub.healthdata.org/gbd-results/. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. "],["hiv-aids.html", "2 The HIV/AIDS epidemic 2.1 Background 2.2 HIV surveillance", " 2 The HIV/AIDS epidemic 2.1 Background HIV is a retrovirus which infects humans. If untreated, infection with HIV can develop into a more advanced stage known as acquired immunodeficiency syndrome (AIDS). HIV primarily attacks a type of white blood cell vital for proper function of the immune system. As a result, AIDS is characterised by increased risk of developing opportunistic infections such as tuberculosis or Pneumocystis pneumonias, which can result in death. The first AIDS cases were reported in Los Angeles in the early 1980s (Gottlieb et al. 1981; Barré-Sinoussi et al. 1983). Since then, HIV has spread globally. Transmission occurs by exposure to specific bodily fluids of an infected person. The most common mode of transmission is via unprotected anal or vaginal sex. Transmission can also occur from a mother to her baby, or when drug injection equipment is shared. Approximately 86 million people have become infected with HIV, and of those 40 million have died of AIDS-related causes (UNAIDS 2023a). An ongoing global effort has been made to respond to the epidemic. The multifaceted response has been shaped by local communities, civil society organisations, national governments, research institutions, pharmaceutical companies, international agencies like the Joint United Nations Programme on HIV/AIDS (UNAIDS), and global health initiatives such like the President’s Emergency Plan for AIDS Relief (PEPFAR) and the Global Fund to Fight AIDS, Tuberculosis, and Malaria (the Global Fund). As an indication of the scale of the response, the investment of $100 billion by PEPFAR constitutes the “largest commitment by a single nation to address a single disease in history” (U.S. Department of State 2022). Figure 2.1: Globally, yearly new HIV infections peaked in 1995, and have since decreased by 59%. Yearly AIDS-related deaths peaked in 2004, and have since decreased by 68% (UNAIDS 2023a). Much of the global disease burden is concentrated in eastern and southern Africa, as well as western and central Africa. The unit “M” refers to millions. The colour palette used in this figure, and throughout the thesis, is that of Okabe and Ito (2008). It is designed to be colour-blind friendly, and the default used by Wilke (2019). Implementation of HIV prevention and treatment has significantly reduced the number of new HIV infections and AIDS-related deaths per year since their respective peaks (Figure 2.1). The most significant evidence-based interventions, in more or less chronological order of introduction, are described below: Condoms are an inexpensive and effective method for prevention of HIV and other sexually transmitted infections (STIs) such as Chlamydia trachomatis, Neisseria gonorrhoeae, syphilis, and Trichomonas vaginalis. Condom usage has increased significantly since 1990, which is estimated to have averted 117 million new HIV infections (Stover and Teng 2021). However, there remain significant but difficult to close gaps in condom usage. Antiretroviral therapy (ART) is a combination of drugs which stop the virus from replicating in the body. A person living with HIV who takes ART daily can live a full and healthy life, transforming what was once a terminal illness to a treatable chronic condition. Of the 39 million people living with HIV (PLHIV) in 2022, around 76% were accessing ART. The number of AIDS-related deaths, 21 million, estimated to have been averted by ART is staggering (UNAIDS 2023b). ART reduces the amount of virus in the blood and genital secretions. If the virus is undetectable then there is significant evidence that it cannot be transmitted sexually (Cohen et al. 2011; Broyles et al. 2023). For this reason, in addition to providing life saving treatment, ART also operates as prevention. Approaches to lowering risk of HIV transmission in this way are referred to as treatment as prevention (TaSP). Particular efforts have been made to provide pregnant women with ART to reduce the chance of mother-to-child transmission (MTCT) (Siegfried et al. 2011). Voluntary medical male circumcision (VMMC) partially protects against female-to-male HIV acquisition. Three landmark randomised control trials (RCTs) (Auvert et al. 2005; Gray et al. 2007; R. C. Bailey et al. 2007) found complete surgical removal of the foreskin to result in a reduction of HIV acquisition in men by 50-60%. Based on this evidence, VMMC has been recommended since 2007 by the World Health Organization (WHO) and UNAIDS as a key HIV intervention in high-prevalence settings (WHO and UNAIDS 2007). Scale up of VMMC across 15 priority countries between 2008 and 2019 is estimated to have already averted 340 thousand new HIV infections, though the future number of new HIV infections averted is likely to be much higher (McGillen et al. 2018; UNAIDS and WHO 2021). Pre-exposure prophylaxis (PrEP) and post-exposure prophylaxis (PEP) are antiretroviral drugs which can be taken before and after exposure to prevent transmission. PrEP has been shown to be effective at an individual level across a number of RCTs (Baeten et al. 2012; Thigpen et al. 2012), but there are few population-level studies. Though PEP cannot be studied with RCTs, observational studies indicate it is highly effective (Dominguez et al. 2016). These medical interventions are more costly than other prevention options, so are primarily useful in high risk settings. Though implementation of these interventions has enabled important progress, there remains much more to do. In 2022, 1.3 million people were newly infected with HIV and there were 630 thousand AIDS-related deaths, more than one death every minute (UNAIDS 2022). Bold fast-track targets have been set to accelerate the end of AIDS as a global public health threat by 2030 (UN General Assembly 2016). To meet these targets in the context of disruption to HIV services caused by the COVID-19 pandemic and a potential shortfall in HIV funding, renewed commitments are required (Economist Impact 2023). For available resources to have the greatest impact, it is important that the right HIV interventions are prioritised to the right populations, in the right place, and at the right time. By analogy to precision medicine, this paradigm has been termed precision public health (Khoury, Iademarco, and Riley 2016). While precision medicine tailors treatments to individuals, precision public health tailors treatments to populations. The importance of precision public health is underscored by the vast potential differences in the cost-effectiveness of any given intervention, with some interventions orders of magnitude more impactful than others (Ord 2013). Figure 2.2: Adult (15-49) HIV prevalence varies substantially both within and between countries in SSA. The estimates from 2023 were generated by country teams using the Naomi small-area estimation model in a process supported by UNAIDS, and are available from UNAIDS (2023a). White filled points are country-level estimates, and coloured points are district-level estimates. Results from Nigeria were not published. Data collection in the Cabo Delgado province of Mozambique was disrupted by conflict. Obtaining results for the Democratic Republic of the Congo required removing some districts from the model. Disease burden varies substantially across multiple spatial scales. In some countries, the epidemic is concentrated in small populations, and national HIV prevalence is low. In others, the epidemic is sustained by heterosexual transmission, and national HIV prevalence is higher (typically >1%). These two epidemic settings are sometimes described as concentrated and generalised, respectively. Most countries severely affected by HIV are in sub-Saharan Africa (SSA). It is estimated that 66% of the 39 million PLHIV worldwide live in SSA. HIV prevalence in adults aged 15-49 is above 10% in some countries in southern Africa. Some districts even exceed 20% (Figure 2.2). Indeed, just as there is variation between countries, there is variation within countries. As an illustration, adult HIV prevalence at the district municipality level in South Africa ranges from 6% in Namakwa to 30% in uMkhanyakude. Accordingly, the work in this thesis is centred on measurement of HIV at the district level in SSA. In all countries and contexts, some groups of people are at much higher risk than others. Groups of people at increased risk of HIV infection are known as key populations (KPs). Examples include men who have sex with men (MSM), female sex workers (FSW), people who inject drugs (PWID), and transgender people (TGP) (Stevens et al. 2023). KPs are often marginalised, and face legal and social barriers. Concentrated settings are defined by the majority of new HIV infections occurring in KPs and their sexual partners. In generalised settings like SSA, though concentrated subepidemics do occur (Tanser et al. 2014), risk is more diffuse across the population. In SSA adolescent girls and young women (AGYW) are a large demographic group at increased risk of HIV infection (Risher et al. 2021; Monod et al. 2023) but not typically considered a KP. Chapter 5 focuses on measurement of HIV for AGYW and FSW. There are a number of ways to practically implement differentiated HIV treatment and prevention services (Godfrey-Faussett et al. 2022). These include geographic and demographic prioritisation (Meyer-Rath et al. 2018), key population services (Organization et al. 2022), and risk screening based on individual-level risk characteristics (Jia et al. 2022). Each approach requires strategic information about HIV disease burden. This thesis focuses on using HIV surveillance to inform geographic and demographic prioritisation. 2.2 HIV surveillance HIV surveillance refers to the collection, analysis, interpretation and dissemination of data relating to HIV (Pisani et al. 2003). Surveillance can be used to track epidemic indicators, identify at-risk populations, uncover drivers of transmission, implement prevention and treatment programs, and assess their impact. Important indicators to measure include: HIV prevalence is the proportion \\(\\rho \\in [0, 1]\\) of a population who have HIV. The number of PLHIV is given by \\(N\\rho\\), where \\(N\\) is the (living) population size. Increases in HIV prevalence, and the number of PLHIV, can be caused either by new HIV infections or more PLHIV remaining alive by taking treatment. For this reason caution should be taken in directly interpreting changes in HIV prevalence. Nonetheless, as a primary measure of population disease burden, HIV prevalence is vital in calculating all of the other indicators given below. HIV incidence is the rate \\(\\lambda > 0\\) of new HIV infections. In writing, HIV incidence is often given as a number of new infections per 1000 person years. The number of new HIV infections that occur during a given time is the integral of the rate of HIV incidence over time \\(\\lambda_t\\) multiplied by the size of the susceptible population. Let \\(\\rho_t\\) be the HIV prevalence, and \\(N_t\\) be the population size, at time \\(t\\). Then the number of new HIV infections which occur during a given period of time are given by \\[ I = \\int \\lambda_t \\cdot (1 - \\rho_t) \\cdot N_t \\text{d}t. \\] Planning, delivery, and evaluation of prevention programming relies on estimates of HIV incidence and the number of new HIV infections. Knowing whether the rate of new infections is rising or declining within specific populations is crucial. ART coverage is the proportion \\(\\alpha \\in [0, 1]\\) of PLHIV who are on ART. The number of people taking ART is given by \\(N \\cdot \\rho \\cdot \\alpha\\). Estimates of ART coverage play a direct role in planning provision of treatment services, and finding unmet treatment need. Recent infection is the proportion \\(\\kappa \\in [0, 1]\\) of PLHIV who have been recently infected. Recency assays use biomarkers to distinguish between recent and longstanding infection, with varying sensitivity and specificity. Estimates of recent infection are primarily used to help estimate HIV incidence (Kassanjee et al. 2012; UNAIDS, WHO, et al. 2022). Awareness of status is the proportion \\(\\xi \\in [0, 1]\\) of PLHIV who have been diagnosed with HIV. Programming of HIV testing and diagnosis is informed by estimates of awareness of HIV status. HIV diagnosis allows for linkage to care and progression along the HIV treatment cascade and care continuum (CDC 2014). 2.2.1 Data Measuring the HIV indicators above requires data. To give the most complete picture of the epidemic, it is important to use multiple sources of data. The most prominent categories are: Household surveys are large, national, cross-sectional studies. The surveys conducted in the most countries are Demographic and Health Surveys [DHS ;USAID (2012)], which include a wide range of health related questions, and more HIV-specific Population-based HIV Impact Assessment [PHIA; ICAP (2023)] and AIDS Indicator Surveys (AIS). Some countries also implement their own survey series, such as the South Africa Behavioural, Sero-status and Media Impact Survey (SABSSM). Household surveys provide high quality standardised data about HIV, typically designed to furnish nationally-representative estimates. Both DHS and PHIA surveys collect demographic, behavioural, and clinical information. Additionally, HIV testing is conducted via home-based testing, with results returned immediately, or anonymous dried blood spot testing. Programmatic data refer to data routinely collected during delivery of health services. Examples include data from antenatal care (ANC), HIV testing, and ART service delivery. Due to their integration with regular service delivery, programmatic data are available at higher frequency than other data sources. However, in comparison with designed studies, less control can be exercised over collection of programmatic data. It is common to encounter issues of data quality and reliability, as well as bias, in working with programmatic data. Cohort studies follow a group of people over time. Outcomes may be measured more systematically in a cohort study than in other study designs. The data from cohort studies have particular use in informing otherwise difficult to estimate epidemiological parameters. Such parameters include disease progression and mortality rates, transmission dynamics, and treatment outcomes. Examples of population-based cohort studies in SSA include the Manicaland Project Open Cohort Study in Zimbabwe (Gregson et al. 2006), the Rakai Community Cohort Study in Uganda (Grabowski et al. 2017), and the Karonga Demographic Surveillance Site in Malawi (Crampin et al. 2012). 2.2.2 Challenges Obtaining reliable, timely estimates of the HIV indicators at an appropriate spatial resolution using the available data sources is challenging. The most significant difficulties faced are enumerated below, providing important context for the work in this thesis: Data sparsity: Collection of data is costly and time consuming. As a result, limited direct data might be available for the particular time, location, or population of interest. For example, in many countries the last conducted household survey is several years out of date. Furthermore, the sample sizes in household surveys are typically designed to be representative at a national-level. As a result, data for subpopulations are usually sparse. Missing data: The sampling frame of a survey may not correspond to the target population. For example, some KPs are difficult to reach, and may be omitted from sampling frames (Jin, Restar, and Beyrer 2021). Additionally, individuals included on the sampling frame may choose not to respond. Each of these issues can be characterised as being problems of missing data. Response and measurement biases: Individuals may be hesitant to disclose their HIV status, or report higher risk behaviours, due to social desirability bias or a fear of discrimination or stigma. Furthermore, individuals may be unaware of their HIV status. When available, biomarker data can be used to overcome under-reporting of HIV status, but still may be subject to measurement errors. Biases in behavioural data can be more difficult to disentangle. Denominators and demography: Many indicators are rates or proportions, which rely on estimates of the population at risk in the denominator. For example, HIV prevalence is a proportion of the population, and HIV incidence is a rate per person-years at risk. Accurately estimating population denominators over space, time, and demographies is itself a challenging task (Tatem 2017). Taking a ratio of uncertain quantities amplifies uncertainty, but is rarely properly accounted for. Inconsistent data collection and reporting: The sources of data that are collected might vary across space and time. Additionally, reporting protocols or definitions for the same data source can also change. Though household surveys tend to be more consistent than programmatic data, the questions included and design of the surveys do change. Reliance on epidemiological parameters: Indicators rely on estimates of epidemiological parameters such as rates of disease progression. These parameters may not generalise to the setting of interest. Further, they are typically applied coarsely, and without proper accounting for uncertainty. 2.2.3 Statistical approaches The challenges above make direct interpretation of the data often misleading or impossible. Careful statistical modelling is required to mitigate these limitations as effectively as possible. The most important statistical approaches for estimating HIV indicators used in this thesis are: Borrowing information: When little direct data are available, data judged to be indirectly related can be used to help improve estimation. For example, if limited data are available for individuals of a certain age, it is likely reasonable to make use of data for individuals of a similar age. As well as over age groups, information can be borrowed between and within countries, and across times. Chapter 4 discusses models for borrowing information over space. These models, along with others for borrowing information in other dimensions, are applied in Chapters 5 and 6. Evidence synthesis: Multiple sources of evidence can be combined to overcome the limitations of any one data source. For example, infrequently run household surveys can be complemented by more up-to-date programmatic data. Chapter 6 develops methods suitable for the complex statistical models required to integrate data sources. Multiple data sources are used in Chapter 5 to overcome the limitations of household surveys for measuring KP population sizes. Expert guidance: Expert epidemiological, demographic, and local stakeholder guidance can be used to improve estimates. Ensuring the quality of any data used in the estimation process is essential. Indeed, careful validation of data by country teams is a crucial part of the yearly UNAIDS HIV estimates process. Uncertainty quantification: Conclusions drawn by synthesising multiple incomplete data sources are unlikely to be firm and unanimous. It is therefore especially important that the uncertainties inherent to any statistical analysis are accurately and transparently presented. The Bayesian statistical paradigm introduced in Chapter 3 and used throughout this thesis is particularly well suited to handling of uncertainty. References Auvert, Bertran, Dirk Taljaard, Emmanuel Lagarde, Joelle Sobngwi-Tambekou, Rémi Sitta, and Adrian Puren. 2005. “Randomized, controlled intervention trial of male circumcision for reduction of HIV infection risk: the ANRS 1265 Trial.” PLOS Medicine 2 (11): e298. Baeten, Jared M, Deborah Donnell, Patrick Ndase, Nelly R Mugo, James D Campbell, Jonathan Wangisi, Jordan W Tappero, et al. 2012. “Antiretroviral Prophylaxis for HIV Prevention in Heterosexual Men and Women.” New England Journal of Medicine 367 (5): 399–410. Bailey, Robert C, Stephen Moses, Corette B Parker, Kawango Agot, Ian Maclean, John N Krieger, Carolyn FM Williams, Richard T Campbell, and Jeckoniah O Ndinya-Achola. 2007. “Male circumcision for HIV prevention in young men in Kisumu, Kenya: a randomised controlled trial.” The Lancet 369 (9562): 643–56. Barré-Sinoussi, Françoise, Jean-Claude Chermann, Fran Rey, Marie Therese Nugeyre, Sophie Chamaret, Jacqueline Gruest, Charles Dauguet, et al. 1983. “Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS).” Science 220 (4599): 868–71. Broyles, Laura N, Robert Luo, Debi Boeras, and Lara Vojnov. 2023. “The risk of sexual transmission of HIV in individuals with low-level HIV viraemia: a systematic review.” The Lancet. CDC. 2014. “Understanding the HIV Care Continuum.” CDC. http://www.cdc.gov/hiv/pdf/dhap_continuum.pdf. Cohen, Myron S, Ying Q Chen, Marybeth McCauley, Theresa Gamble, Mina C Hosseinipour, Nagalingeswaran Kumarasamy, James G Hakim, et al. 2011. “Prevention of HIV-1 infection with early antiretroviral therapy.” New England Journal of Medicine 365 (6): 493–505. Crampin, Amelia C, Albert Dube, Sebastian Mboma, Alison Price, Menard Chihana, Andreas Jahn, Angela Baschieri, et al. 2012. “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology 41 (3): 676–85. Dominguez, Kenneth L., Dawn K. Smith, Vasavi Thomas, Nicole Crepaz, Karen Lang, Walid Heneine, Janet M. McNicholl, et al. 2016. “Updated Guidelines for Antiretroviral Postexposure Prophylaxis After Sexual, Injection Drug Use, or Other Nonoccupational Exposure to HIV—United States, 2016.” https://stacks.cdc.gov/view/cdc/38856. Economist Impact. 2023. “A triple dividend: the health, social and economic gains from financing the HIV response in Africa.” Godfrey-Faussett, Peter, Luisa Frescura, Quarraisha Abdool Karim, Michaela Clayton, Peter D Ghys, and 2025 prevention targets working group). 2022. “HIV Prevention for the Next Decade: Appropriate, Person-Centred, Prioritised, Effective, Combination Prevention.” PLOS Medicine 19 (9): e1004102. Gottlieb, Michael S, Howard M Schanker, Peng Thim Fan, Andrew Saxon, Joel D Weisman, Irving Pozalski, et al. 1981. “Pneumocystis pneumonia—Los Angeles.” Morbidity and Mortality Weekly Report 30 (21): 1–3. Grabowski, M Kate, David M Serwadda, Ronald H Gray, Gertrude Nakigozi, Godfrey Kigozi, Joseph Kagaayi, Robert Ssekubugu, et al. 2017. “HIV prevention efforts and incidence of HIV in Uganda.” New England Journal of Medicine 377 (22): 2154–66. Gray, Ronald H, Godfrey Kigozi, David Serwadda, Frederick Makumbi, Stephen Watya, Fred Nalugoda, Noah Kiwanuka, et al. 2007. “Male circumcision for HIV prevention in men in Rakai, Uganda: a randomised trial.” The Lancet 369 (9562): 657–66. Gregson, Simon, Geoffrey P Garnett, Constance A Nyamukapa, Timothy B Hallett, James JC Lewis, Peter R Mason, Stephen K Chandiwana, and Roy M Anderson. 2006. “HIV decline associated with behavior change in eastern Zimbabwe.” Science 311 (5761): 664–66. ICAP. 2023. “Population-based HIV impact assessment: guiding the global HIV response.” https://phia.icap.columbia.edu. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Jin, Harry, Arjee Restar, and Chris Beyrer. 2021. “Overview of the Epidemiological Conditions of HIV Among Key Populations in Africa.” Journal of the International AIDS Society 24: e25716. Kassanjee, Reshma, Thomas A. McWalter, Till Bärnighausen, and Alex Welte. 2012. “A New General Biomarker-Based Incidence Estimator.” Epidemiology 23 (5). Khoury, Muin J, Michael F Iademarco, and William T Riley. 2016. “Precision public health for the era of precision medicine.” American Journal of Preventive Medicine 50 (3): 398–401. McGillen, Jessica B, John Stover, Daniel J Klein, Sinokuthemba Xaba, Getrude Ncube, Mutsa Mhangara, Geraldine N Chipendo, et al. 2018. “The Emerging Health Impact of Voluntary Medical Male Circumcision in Zimbabwe: An Evaluation Using Three Epidemiological Models.” PLOS One 13 (7): e0199453. Meyer-Rath, Gesine, Jessica B McGillen, Diego F Cuadros, Timothy B Hallett, Samir Bhatt, Njeri Wabiri, Frank Tanser, and Thomas Rehle. 2018. “Targeting the Right Interventions to the Right People and Places: The Role of Geospatial Analysis in HIV Program Planning.” AIDS (London, England) 32 (8): 957. Monod, Mélodie, Andrea Brizzi, Ronald M. Galiwango, Robert Ssekubugu, Yu Chen, Xiaoyue Xi, Edward Nelson Kankaka, et al. 2023. “Longitudinal Population-Level HIV Epidemiologic and Genomic Surveillance Highlights Growing Gender Disparity of HIV Transmission in Uganda.” Nature Microbiology. Okabe, Masataka, and Kei Ito. 2008. “Color Universal Design (CUD): How to Make Figures and Presentations That Are Friendly to Colorblind People.” 2008. http://jfly.iam.u-tokyo.ac.jp/color/. Ord, Toby. 2013. “The moral imperative toward cost-effectiveness in global health.” Center for Global Development 12. Organization, World Health et al. 2022. Consolidated Guidelines on HIV, Viral Hepatitis and STI Prevention, Diagnosis, Treatment and Care for Key Populations. World Health Organization. Pisani, Elizabeth, Stefano Lazzari, Neff Walker, and Bernhard Schwartländer. 2003. “HIV surveillance: a global perspective.” Journal of Acquired Immune Deficiency Syndromes 32: S3–11. Risher, Kathryn A, Anne Cori, Georges Reniers, Milly Marston, Clara Calvert, Amelia Crampin, Tawanda Dadirai, et al. 2021. “Age patterns of HIV incidence in eastern and southern Africa: a modelling analysis of observational population-based cohort studies.” The Lancet HIV 8 (7): e429–39. Siegfried, Nandi, Lize van der Merwe, Peter Brocklehurst, and Tin Tin Sint. 2011. “Antiretrovirals for reducing the risk of mother-to-child transmission of HIV infection.” Cochrane Database of Systematic Reviews, no. 7. Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. Stover, John, and Yu Teng. 2021. “The impact of condom use on the HIV epidemic.” Gates Open Research 5. Tanser, Frank, Tulio de Oliveira, Mathieu Maheu-Giroux, and Till Bärnighausen. 2014. “Concentrated HIV sub-epidemics in generalized epidemic settings.” Current Opinion in HIV and AIDS 9 (2): 115. Tatem, Andrew J. 2017. “WorldPop, open data for spatial demography.” Scientific Data 4 (1): 1–4. Thigpen, Michael C, Poloko M Kebaabetswe, Lynn A Paxton, Dawn K Smith, Charles E Rose, Tebogo M Segolodi, Faith L Henderson, et al. 2012. “Antiretroviral Preexposure Prophylaxis for Heterosexual HIV Transmission in Botswana.” New England Journal of Medicine 367 (5): 423–34. UN General Assembly. 2016. “Political Declaration on HIV and AIDS: On the Fast Track to Accelerate the Fight Against HIV and to End the AIDS Epidemic by 2030.” In. UNAIDS. 2022. “In Danger: UNAIDS Global AIDS Update 2022.” https://www.unaids.org/en/resources/documents/2022/in-danger-global-aids-update. ———. 2023a. “AIDSinfo: Global data on HIV epidemiology and response.” https://aidsinfo.unaids.org/. ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. UNAIDS and WHO. 2021. “Voluntary Medical Male Circumcision Progress Brief.” UNAIDS. https://hivpreventioncoalition.unaids.org/wp-content/uploads/2021/04/JC3022_VMMC_4-pager_En_v3.pdf. UNAIDS, WHO, et al. 2022. Using Recency Assays for HIV Surveillance: 2022 Technical Guidance. World Health Organization. U.S. Department of State. 2022. “Latest Global Program Results.” https://www.state.gov/wp-content/uploads/2022/11/PEPFAR-Latest-Global-Results_December-2022.pdf. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. WHO and UNAIDS. 2007. “New Data on Male Circumcision and HIV Prevention: Policy and Programme Implications.” Geneva: World Health Organization. Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. "],["bayes-st.html", "3 Bayesian spatio-temporal statistics 3.1 Bayesian statistics 3.2 Spatio-temporal statistics 3.3 Model structure 3.4 Model comparison 3.5 Survey methods", " 3 Bayesian spatio-temporal statistics 3.1 Bayesian statistics Bayesian statistics is a mathematical paradigm for learning from data. Two reasons stand out as to why it is especially well suited to facing the challenges presented in Section 2.2. First, it allows for principled and flexible integration of prior domain knowledge. Second, uncertainty over all unknown quantities is handled as an integral part of the Bayesian paradigm. This section provides a brief and at times opinionated overview of Bayesian statistics. For a more complete introduction, I recommend Gelman et al. (2013), McElreath (2020) or Gelman et al. (2020). 3.1.1 Bayesian modelling The Bayesian approach to data analysis is based on construction of a probability model for the observed data \\(\\mathbf{y} = (y_1, \\ldots, y_n)\\). Parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\phi_1, \\ldots, \\phi_d)\\) are used to describe features of the data. Both the data and parameters are assumed to be random variables, and their joint probability distribution is written as \\(p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\). Subsequent calculations, and the conclusions which follow from them, are made based on manipulating the model using probability theory. Models are most naturally constructed from two parts, known as the likelihood \\(p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}})\\) and the prior distribution \\(p(\\boldsymbol{\\mathbf{\\phi}})\\). The joint distribution is obtained by the product of these two parts \\[\\begin{equation} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) = p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}}). \\tag{3.1} \\end{equation}\\] The likelihood, as a function of \\(\\boldsymbol{\\mathbf{\\phi}}\\) with \\(\\mathbf{y}\\) fixed, reflects the probability of observing the data when the value of the parameters is \\(\\boldsymbol{\\mathbf{\\phi}}\\). The prior distribution encapsulates beliefs about the parameters \\(\\boldsymbol{\\mathbf{\\phi}}\\) before the data are observed. Recommendations for specifying prior distributions vary. The extent to which subjective information should be incorporated into the prior distribution is a central issue. Proponents of the objective Bayesian paradigm (Berger 2006) put forward that the prior distribution should be non-informative, so as not to introduce subjectivity into the analysis. Others see subjectivity as fundamental to scientific inquiry, with no viable alternative (Goldstein 2006). Though subjectivity is typically discussed with regard to the prior distribution, as we will in Section 3.3, the distinction between prior distribution and likelihood is not always clear. As such, it may be argued that issues of subjectivity are not unique to prior distribution specification, and ultimately that the challenge of specifying the data generating process – that is, \\(p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\) – is better thought of more holistically (Gelman, Simpson, and Betancourt 2017). The probability model can be simulated from to obtain samples \\((\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\sim p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})\\). If samples of the data \\(\\mathbf{y}\\) differ too greatly from what the analyst would expect to see in reality, then the model fails to capture their prior scientific understanding. Models that do not produce plausible data samples can be refined. Checks of this kind [Gelman et al. (2013); Chapter 6] can be used to help iteratively build models, gradually adding complexity as required. 3.1.2 Bayesian computation Figure 3.1: An example of Bayesian modelling and computation for a simple one parameter model. Here the likelihood is \\(y_i \\sim \\text{Poisson}(\\phi)\\) for \\(i = 1, 2, 3\\) and the prior distribution on the rate parameter \\(\\phi > 0\\) is \\(\\phi \\sim \\text{Gamma}(3, 1)\\). Observed data \\(\\mathbf{y} = (1, 2, 3)\\) was simulated from the distribution \\(\\text{Poisson}(2.5)\\). As such, the true data generating process is within the space of models being considered. This situation is sometimes known (Bernardo and Smith 2001) as the \\(\\mathcal{M}\\)-closed world, in contrast to the \\(\\mathcal{M}\\)-open world where the model is said to be misspecified. Further, the posterior distribution is available in closed form as \\(\\text{Gamma}(9, 4)\\). This is because the posterior distribution is in the same family of probability distributions as the prior distribution. Models of this kind are described as being conjugate. Conjugate models are often used because of their convenience. Though other models may be more suitable, Bayesian inference will typically be more computationally demanding than for conjugate models. The posterior distribution here is more tightly peaked than the prior distribution. Contraction of this kind is typical, but not always the case. Having constructed a model (Equation (3.1)), the primary goal in a Bayesian analysis is to obtain the posterior distribution \\(p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\). This distribution encapsulates probabilistic beliefs about the parameters given the observed data. As such, the posterior distribution has a central role in use of the statistical analysis for decision making. Using the eponymous Bayes’ theorem, the posterior distribution is obtained by \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\frac{p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}})}{p(\\mathbf{y})} = \\frac{p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}})}{p(\\mathbf{y})}. \\tag{3.2} \\end{equation}\\] Unfortunately, most of the time it is intractable to calculate the posterior distribution analytically. This is because of the potentially high-dimensional integral \\[\\begin{equation} p(\\mathbf{y}) = \\int p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}} \\tag{3.3} \\end{equation}\\] in the denominator of Equation (3.2). The result of this integral is known as the evidence \\(p(\\mathbf{y})\\), and quantifies the probability of obtaining the data under the model. Hence, although it is easy to evaluate a quantity proportional to the posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\propto p(\\mathbf{y} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}}), \\end{equation}\\] it is typically difficult to evaluate the posterior distribution itself. Further, even given a closed form expression for the posterior distribution, if \\(\\boldsymbol{\\mathbf{\\phi}}\\) is of moderate to high dimension, then it is not obvious how to evaluate expressions of interest, which usually themselves are integrals, or expectations, with respect to the posterior distribution. The difficulty in performing Bayesian inference may be thought of as analogous to the difficulty in calculating integrals. As with integration, in specific cases closed form analytic solutions are available. Figure 3.1 illustrates one such case, where the prior distribution and posterior distribution are in the same family of probability distributions. In the more general case, no analytic solution is available, and computational methods must be relied on. Broadly, computational strategies for approximating the posterior distribution (Martin, Frazier, and Robert 2023) may be divided into Monte Carlo algorithms and deterministic approximations. 3.1.2.1 Monte Carlo algorithms Monte Carlo algorithms (Robert and Casella 2005) aim to generate samples from the posterior distribution \\[\\begin{equation} \\boldsymbol{\\mathbf{\\phi}}_s \\sim p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}), \\quad s \\in 1, \\ldots S. \\tag{3.4} \\end{equation}\\] These samples may be used in any future computations involving the posterior distribution or functions of it. For example, if \\(G = G(\\boldsymbol{\\mathbf{\\phi}})\\) is a function, then the expectation of \\(G\\) with respect to the posterior distribution can be approximated by \\[\\begin{equation} \\mathbb{E}(G \\, | \\, \\mathbf{y}) = \\int G(\\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\phi}} \\approx \\frac{1}{S} \\sum_{s = 1}^S G(\\boldsymbol{\\mathbf{\\phi}}_s), \\end{equation}\\] using the samples from the posterior distribution in Equation (3.4). Most quantities of interest can be cast as posterior expectations, which may then be approximated empirically using samples in this way. Of course, it remains to discuss how the samples are obtained. Markov chain Monte Carlo (MCMC) methods (Roberts and Rosenthal 2004) are the most popular class of sampling algorithms. Using MCMC, posterior samples are generated by simulating from an ergodic Markov chain with the posterior distribution as its stationary distribution. The Metropolis-Hastings [MH; Metropolis et al. (1953); Hastings (1970)] algorithm uses a proposal distribution \\(q(\\boldsymbol{\\mathbf{\\phi}}_{s + 1} \\, | \\, \\boldsymbol{\\mathbf{\\phi}}_s)\\) to generate candidate parameters for the next step in the Markov chain. These candidate parameters are then accepted or rejected with some probability determined based on their log-posterior evaluation. Many MCMC algorithms, including the Gibbs sampler (Geman and Geman 1984), can be thought of as special cases of MH. Other notable classes of sampling algorithms include importance sampling [IS; Tokdar and Kass (2010)] methods, which uses weighted samples, sequential Monte Carlo [SMC; Chopin, Papaspiliopoulos, et al. (2020)] methods, which are based on sampling from a sequence of distributions, and approximate Bayesian computation [ABC; Sisson, Fan, and Beaumont (2018)], which works by comparing simulated data to observed data, and does not require evaluation of the log-posterior. Though these methods have found applications in specific domains, MCMC is currently more widely used. The most important benefits of MCMC are its generality, theoretical reliability, and implementation in accessible software packages. Illustrating the use of MCMC being supported by software, this thesis uses the No-U-Turn sampler [NUTS; Hoffman, Gelman, et al. (2014)], a Hamiltonian Monte Carlo [HMC; Duane et al. (1987); Neal et al. (2011)] algorithm, as implemented in the Stan (Carpenter et al. 2017) probabilistic programming language (PPL). HMC uses derivatives of the posterior distribution to generate efficient MH proposal distributions based on Hamiltonian dynamics. Three tuning parameters control the behaviour of the HMC algorithm [Section 15.2; Stan Development Team (2023)]. NUTS automatically adapts these parameters based on local properties of the posterior distribution. Though not a one-size-fits-all solution, NUTS has been shown empirically to be a good choice for sampling from a range of posterior distributions. Figure 3.2 shows an example of using the NUTS MCMC algorithm to sample from a posterior distribution. After running an MCMC sampler, it is important that diagnostic checks are used to evaluate whether the Markov chain has reached its stationary distribution. If so, the Markov chain is said to have converged, and its samples may be used to compute posterior quantities. Though it is possible to check poor convergence in some cases, we may never be sure that a Markov chain has converged, and thus that results computed from MCMC will be accurate. Panel 3.2B shows the traceplot for a Markov chain which appears to have converged, and moves freely through the range of plausible parameter values. A range of convergence diagnostics have been developed for MCMC (Roy 2020; C. C. Margossian and Gelman 2023). Two widely used examples are the potential scale reduction factor \\(\\hat R\\) (Gelman and Rubin 1992), which compares the variance between and within parallel Markov chains, and the effective sample size (ESS), which measures the efficiency of samples drawn from MCMC. Figure 3.2: NUTS can be used to sample from the posterior distribution described in Figure 3.1. Panel A shows a histogram of the NUTS samples as compared to the true posterior. The visual appearance of a histogram depends highly on the number of bins chosen, though it does not depend on tuning parameters like kernel density estimation. Other visualisations, such as empirical cumulative difference function plots, though less initially intuitive, are preferred for accurate distributional sample comparisons. Panel B is a traceplot showing the path of the Markov chain \\(\\{\\phi_s\\}_{s = 1}^{1000}\\) as it explores the posterior distribution. In this case, the Markov chain moves freely throughout the posterior distribution, without getting stuck in any one location for long, indicating good performance of the sampler. Panel C shows convergence of the empirical posterior mean \\(\\frac{1}{s} \\sum_{l \\leq s} \\phi_l\\) to the true value of \\(\\mathbb{E}(\\phi)\\) as more iterations of the Markov chain are included in the sum. In this case, the samples from NUTS are highly accurate in estimating this posterior expectation. 3.1.2.2 Deterministic approximations The Monte Carlo methods discussed in Section 3.1.2.1 make use of stochasticity to generate samples from the posterior distribution. Deterministic approximations offer an alternative approach, often focused more directly on approximating the posterior distribution or posterior normalising constant. These approaches can be faster than Monte Carlo methods, especially for large datasets or models. That said, they lack strong theoretical guarantees of accuracy. One prominent deterministic approximation is the Laplace approximation. It involves approximating the posterior normalising constant using Laplace’s method of integration. This is equivalent to approximating the posterior distribution by a Gaussian distribution. Numerical integration, or quadrature, is another deterministic approach in which the posterior normalising constant is approximated using a weighted sum of evaluations of the unnormalised posterior distribution. The integrated nested Laplace approximation [INLA; Håvard Rue, Martino, and Chopin (2009)] combines quadrature with the Laplace approximation. These methods are used throughout this thesis. In depth discussion is left to Chapter 6. Variational inference [VI; Blei, Kucukelbir, and McAuliffe (2017)] is another important deterministic approximation. The well-known expectation maximisation [EM; Dempster, Laird, and Rubin (1977)] and expectation propagation [EP; Minka (2001)] algorithms are closely related to VI. In VI, the approximate posterior distribution is assumed to belong to a particular family of functions. Optimisation algorithms are then used to choose the best member of that family, typically by minimising the Kullback-Leibler divergence to the posterior distribution. VI lacks theoretical guarantees and is known to often inaccurately estimate posterior variances (Giordano, Broderick, and Jordan 2018). As such, statisticians tend to approach VI with caution, despite its relative widespread acceptance within the machine learning community. Developing diagnostics to evaluate the accuracy of VI is an important area of ongoing research (Yao et al. 2018). 3.1.3 Interplay between modelling and computation Modern computational techniques and software like PPLs have succeeded in abstracting away calculation of the posterior distribution from the analyst for many models. However, computation remains intractable in, depending on the measure used, what can be argued to be the majority of cases. The analyst needs therefore not only to be concerned with choosing a model suitable for the data, but with choosing a model for which the posterior distribution may tractably be calculated in reasonable time. As such, there is an important interplay between modelling and computation, wherein models are bound by the limits of computation. As computational techniques and tools improve, the space of models available to the analyst expands. Exactly the focus of Chapter 6 is on expanding the space of models practically available to analysts. 3.2 Spatio-temporal statistics Space and time are important features of infectious disease data, including those related to HIV. The field of spatio-temporal statistics (Cressie and Wikle 2015) is concerned with such observations, indexed by spatial and temporal location. It unifies the fields of spatial statistics (Bivand et al. 2008), concerned with observations indexed by space, and time series analysis (Shumway and Stoffer 2017), concerned with observations indexed by time. First, Section 3.2.1 characterises the shared properties of spatio-temporal data. Then, Section 3.2.2 describes how these properties facilitate the class of small-area estimation methods used in this thesis. 3.2.1 Properties of spatio-temporal data Three important properties are discussed in this section: scale, correlation structure, and size. 3.2.1.1 Scale Figure 3.3: In Panel A, the spatial location of Cape Town in South Africa can be considered a point, and the ZF Mgcawu District Municipality (DM) can be considered as an an area. In Panel B, World AIDS Day, designated on the 1st of December every year, can be considered a point in time, whereas the second fiscal quarter, running through April, May and June, and denoted by Q2 represents a period of time. In reality, both Cape Town and World AIDS Day are areas, rather than true point locations. Instances of infinitesimal point locations in everyday life, outside of mathematical abstraction, are rare. The scale of spatio-temporal data refers to its extent and resolution. Its extent is the size of the spatial study region and length of time over which data was collected. Its resolution is how fine-grained those observations were. In this thesis, the spatial study region \\(\\mathcal{S} \\subseteq \\mathbb{R}^2\\) used is typically a country or collection of countries. It is assumed to have two dimensions, corresponding to latitude and longitude. Observations may be associated to a point \\(s \\in \\mathcal{S}\\) or area \\(A \\subseteq \\mathcal{S}\\) in the spatial study region, illustrated in Panel A3.3. The temporal study period \\(\\mathcal{T} \\subseteq \\mathbb{R}\\) can more generally be assumed to be one-dimensional. This feature, together with the fact that time only moves forward, is what distinguishes space and time. As with space, observations may be associated to a point \\(t \\in \\mathcal{T}\\) or period of time \\(T \\subseteq \\mathcal{T}\\), illustrated in Panel B3.3. The change-of-support problem (Gelfand, Zhu, and Carlin 2001) occurs when data are modelled at a scale different to the one it was observed at. For example, in this thesis, particularly Chapter 4, point data are modelled at an area-level. Special cases of the change-of-support problem include downscaling, upscaling, and dealing with so-called misaligned data. It is also possible that spatio-temporal observations of the same process are made at multiple scales. Jointly modelling data at different scales simultaneously is another closely related challenge to the change-of-support problem. 3.2.1.2 Correlation structure In “The Design of Experiments” Fisher (1936) observed that neighbouring crops were more likely to have similar yields than those far apart. This observation was later termed Tobler’s first law of geography: “everything is related to everything else, but near things are more related than distant things” (Tobler 1970). As well as space, Tobler’s first law applies to time, in that observations made close together in time tend to be similar. This law can be formalised using space-time covariance functions, measuring the dependence of observations across their spatial and temporal dimensions. A space-time covariance structure (Porcu, Furrer, and Nychka 2021) is said to be separable when it can be factorised as a product of individual spatial and temporal covariances, and nonseparable when it can’t. A separable space-time covariance could have spatial and temporal components which are either independent and identically distributed (IID) or structured (Knorr-Held 2000). Spatial covariance functions are called isotropic when they apply equally in all directions, and stationary when they are invariant over space. Temporal covariance structures are often periodic, corresponding to daily, weekly, monthly, quarterly, or yearly cycles. That spatio-temporal data are rarely IID is a statistically important point. The consequence is that it is rare to have true replicates available. Typically, only a single instance of a spatio-temporal can ever be realised. 3.2.1.3 Size Data with both spatial and temporal dimensions are often large. For example, observations collected every week across a number of sites in a country can easily number in the thousands. Storage and mathematical operations with large spatio-temporal data can be challenging. Further, models for spatio-temporal data typically require many parameters. Whereas large IID data can be modelled using a small number of parameters, each observation in a spatio-temporal dataset may need to be characterised by its own parameters. In combination, large data (big \\(n\\)) and models with a large number of parameters (big \\(d\\)) make Bayesian inference, and other complex mathematical operations, challenging for spatio-temporal data. 3.2.2 Small-area estimation Figure 3.4: Simulation of a simple random sample \\(y_i \\sim \\text{Bin}(m, p_i)\\) with varying sample size \\(m = 5, 25, 125\\) in each of the \\(i = 1, \\ldots, 156\\) constituencies of Zambia. Direct estimates were obtained by the empirical ratio of data to sample size. Modelled estimates were obtained using a logistic regression with linear predictor given by an intercept and a spatial random effect. Estimates of HIV indicators for Zambia have previously been generated at the district-level, comprising 116 spatial units. Moving forward, there is interest in generating estimates at the higher-resolution constituency level, as program planning is devolved locally. The viridis colour palette, as implemented by the viridis R package (Garnier et al. 2023), was used in this figure. It is used often throughout this thesis because it is perceptually uniform and accessible to colourblind viewers (Smith and Walt 2015). This figure was adapted from a presentation given for the Zambia HIV Estimates Technical Working Group, available from https://github.com/athowes/zambia-unaids. Figure 3.5: The setting of this figure matches that of Figure 3.4. Estimates from surveys with higher sample size have higher sample Pearson correlation coefficient \\(R\\) with the underlying truth, illustrating the benefit of collecting more data. For a fixed sample size however, correlation can be improved by using modelled estimates to borrow information across spatial units, rather than using the higher variance direct estimates. Points along the dashed diagonal line correspond to agreement between the estimate obtained from the survey and the underlying truth used to generate the data. For each sample size, using a spatial model increases the correlation between the estimates and underlying truth. The effect is more pronounced for lower sample sizes. Data always has some cost to collect. This cost can be significant and prohibitive. Especially for data relating to people, where collection is difficult to automate. In spatio-temporal statistics, there are a large number of possible locations in space and time. Given the cost of data collection, often no or limited direct observations may be available for any given space-time location. Direct estimates of indicators of interest are either impossible or inaccurate in this setting. Small-area estimation [SAE; Pfeffermann et al. (2013)] methods aim to overcome the limitations of small data by sharing information. In the spatio-temporal setting sharing of information occurs across space and time. Prior knowledge that observations in one spatio-temporal location are correlated with those at another (Section 3.2.1.2) can be used to improve estimates. Figures 3.4 and 3.5 illustrate the unreliability of direct estimates from small sample sizes, and the benefit of using a spatial model to overcome this limitation. The effect is most pronounced for the sample size of 5, where the only possible direct estimates are 0, 0.2, 0.4, 0.6, 0.8 and 1. Using a spatial model to borrow information across space in this case results in improvement of the Pearson correlation coefficient between the estimates and the true underlying values from 0.34 to 0.53. SAE methods are not only useful in the spatio-temporal setting. More generally, they apply in any situation where data are limited for subpopulations of interest. Just as these subpopulations can be generated by spatio-temporal variables, they can be generated by other variables. One such example is demographic variables. Analogous to spatio-temporal correlation structure, we also can often expect there to be demographic correlation structure. For example, those of the same sex are more likely to be similar, as are those of similar ages or socio-economic strata. 3.3 Model structure The spatio-temporal data used in this thesis are not IID (Section 3.2.1.2). This section discusses ways to use statistical models to encode more complex relations between observations mathematically. Simple structures are discussed first, beginning with the linear model. Extensions are introduced one at a time, culminating in the model structures used throughout the thesis. 3.3.1 Linear model In a linear model, each observation \\(y_i\\) with \\(i \\in [n]\\) is modelled using a Gaussian distribution \\[\\begin{equation} y_i \\sim \\mathcal{N}(\\mu_i, \\sigma). \\end{equation}\\] The conditional mean \\(\\mu_i\\) is assumed to be linearly related to a collection of \\(p\\) covariates \\(z_{1i}, \\ldots, z_{pi}\\) \\[\\begin{align} \\mu_i &= \\eta_i \\\\ \\eta_i &= \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li}. \\tag{3.5} \\end{align}\\] Priors may be placed on the regression coefficients, as well as the observation standard deviation \\[\\begin{align} \\beta_l &\\sim p(\\beta_l), \\quad l = 0, \\ldots, p, \\\\ \\sigma &\\sim p(\\sigma). \\end{align}\\] While the linear model provides a useful foundation, its strong assumptions and limited flexibility call for careful use. 3.3.2 Generalised linear model Generalised linear models (GLMs) extend the linear model by allowing the conditional mean \\(\\mu_i\\) to be connected to the linear predictor \\(\\eta_i\\) via a link function \\(g\\) as follows \\[\\begin{align} y_i &\\sim p(y_i \\, | \\, \\eta_i), \\\\ \\mu_i &= \\mathbb{E}(y_i \\, | \\, \\eta_i) = g(\\eta_i). \\end{align}\\] The logistic function \\(g(\\eta) = \\exp(\\eta) / (1 + \\exp(\\eta))\\) is commonly used as a link function to ensure that the conditional mean is in the range \\([0, 1]\\). Similarly, the exponential function \\(g(\\eta) = \\exp(\\eta)\\) can be used to ensure the conditional mean is positive. The linear model is a special case of a GLM where the link function \\(g\\) is the identity. As well, GLMs admit a wider range of likelihoods \\(p(y_i \\, | \\, \\eta_i)\\) than linear models, typically restricted to the so-called exponential family of distributions. The equation for the linear predictor is the same as the linear model case in Equation (3.5). 3.3.3 Generalised linear mixed effects model In a generalised linear mixed effects model (GLMM) the linear predictor of the GLM is extended as follows \\[\\begin{equation} \\eta_i = \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li} + \\sum_{k = 1}^{r} u_k(w_{ki}). \\tag{3.6} \\end{equation}\\] The terms \\(\\beta_l\\) are referred to as fixed effects. The terms \\(u_k\\) are called random effects, of additional covariates \\(w_{ki}\\). The words fixed and random effects have notoriously many different and incompatible definitions which unfortunately can cause confusion (Gelman 2005). Random effects allow for more complex sharing of information between observations. To demonstrate this fact, first consider the model \\[\\begin{equation} \\eta_i = \\beta_0. \\end{equation}\\] In this model all observations are assumed to be equivalent, and as such information is said to be completely pooled together. Second, consider the so-called no pooling model \\[\\begin{equation} \\eta_i = \\beta_0 + \\beta_1 z_i, \\end{equation}\\] with \\(z_i \\in \\{0, 1\\}\\) a binary covariate. Now, there are two groups of observations, each of which with its own mean: \\(\\beta_0\\) for the first group and \\(\\beta_0 + \\beta_1\\) for the second. No amount of information is shared between the two groups. Finally, consider an intermediate between these two extremes, known as the partial pooling model. In the partial pooling model, the extent to which information is shared between groups is learnt rather than fixed to either extreme at the outset, as with the complete or no pooling models. The parameter \\(\\beta_0\\) applies to all groups, and each group is differentiated by a specific value of the random effects \\(u_i\\). Random effects can be structured to share information between some observations more than others. In spatio-temporal statistics, structured spatial and temporal random effects are often used to encode smoothness in space or time. In contrast, unstructured random effects treat groups of observations as being exchangeable. Generalised additive models [GAMs; Wood (2017); Hastie and Tibshirani (1987)] are another class of models which extend GLMs. Though GAMs place more of a focus on using \\(u_k\\) to model non-linear relationships between covariates and the response variable, they can also be cast to fit into the GLMM framework. 3.3.4 Latent Gaussian model Latent Gaussian models [LGMs; Håvard Rue, Martino, and Chopin (2009)] are a type of GLMMs in which Gaussian priors are used for many of the models parameters. More specifically, these parameters are \\(\\beta_0\\), \\(\\{\\beta_j\\}\\), \\(\\{u_k(\\cdot)\\}\\), and can be collected into a vector \\(\\mathbf{x} \\in \\mathbb{R}^N\\) called the latent field. The Gaussian prior distribution is then \\[\\begin{equation} \\mathbf{x} \\sim \\mathcal{N}(\\mathbf{0}, \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}}_2)^{-1}), \\end{equation}\\] where \\(\\boldsymbol{\\mathbf{\\theta}}_2 \\in \\mathbb{R}^{s_2}\\) are hyperparameters, with \\(s_2\\) assumed small. The vector \\(\\boldsymbol{\\mathbf{\\theta}}_1 \\in \\mathbb{R}^{s_1}\\), with \\(s_1\\) assumed small, are additional parameters of the likelihood. Let \\(\\boldsymbol{\\mathbf{\\theta}} = (\\boldsymbol{\\mathbf{\\theta}}_1, \\boldsymbol{\\mathbf{\\theta}}_2) \\in \\mathbb{R}^m\\) with \\(m = s_1 + s_2\\) be all hyperparameters, with prior distribution \\(p(\\boldsymbol{\\mathbf{\\theta}})\\). The posterior distribution under an LGM is then \\[\\begin{equation} p(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\propto p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] with the complete set of parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\), and \\(N + m = d\\). In an LGM, like the more general GLMM case as given in Equation (3.6), there is a one-to-one correspondence between observations \\(y_i\\) and elements of the linear predictor \\(\\eta_i\\). 3.3.5 Extended latent Gaussian model Extended latent Gaussian models [ELGMs; Stringer, Brown, and Stafford (2022)] facilitate modelling of data with greater non-linearities than an LGM. In an ELGM, the structured additive predictor is redefined as \\[\\begin{equation} \\boldsymbol{\\mathbf{\\eta}} = (\\eta_1, \\ldots \\eta_{N_n}), \\end{equation}\\] where \\(N_n \\in \\mathbb{N}\\) is a function of \\(n\\). Unlike in the LGM case, it is possible that \\(N_n \\neq n\\). Each mean response \\(\\mu_i\\) now depends on some subset \\(\\mathcal{J}_i \\subseteq [N_n]\\) of indices of \\(\\boldsymbol{\\mathbf{\\eta}}\\), with \\(\\cup_{i = 1}^n \\mathcal{J}_i = [N_n]\\) and \\(1 \\leq |\\mathcal{J}_i| \\leq N_n\\), where \\([N_n] = \\{1, \\ldots, N_n\\}\\). The inverse link function \\(g(\\cdot)\\) is redefined for each observation to be a possibly many-to-one mapping \\(g_i: \\mathbb{R}^{|\\mathcal{J}_i|} \\to \\mathbb{R}\\), such that \\(\\mu_i = g_i(\\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i})\\). Put together, ELGMs are of the form \\[\\begin{align*} y_i &\\sim p(y_i \\, | \\, \\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}, \\boldsymbol{\\mathbf{\\theta}}_1), \\quad i = 1, \\ldots, n, \\\\ \\mu_i &= \\mathbb{E}(y_i \\, | \\, \\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}) = g_i(\\boldsymbol{\\mathbf{\\eta}}_{\\mathcal{J}_i}), \\\\ \\eta_j &= \\beta_0 + \\sum_{l = 1}^{p} \\beta_l z_{li} + \\sum_{k = 1}^{r} u_k(w_{ki}), \\quad j \\in [N_n]. \\end{align*}\\] The latent field and hyperparameter prior distributions are equivalent to the LGM case. Though the ELGM model class was only introduced recently, it connects much of the work done in this thesis. While it can be transformed to an LGM using the Poisson-multinomial transformation (Baker 1994), the multinomial logistic regression model used in Chapter 5 is most naturally written as an ELGM, where each observation depends on the set of structured additive predictors corresponding to the set of multinomial observations. In Chapter 6, the Naomi small-area estimation model used to produce estimates of HIV indicators is shown to have ELGM-like features. 3.4 Model comparison Many models can be fit to the same data during the course of an analysis. Model comparison methods are used to determine which is the most suitable for use. This section focuses on measuring suitability via the model’s predictive performance (Vehtari and Ojanen 2012). Ideally, new data \\(\\tilde{\\mathbf{y}} = (\\tilde{y}_1, \\ldots, \\tilde{y}_n)\\) drawn from the true data generating process would be available to test predictive performance. The log predictive density for new data (LPD) (Gelman, Hwang, and Vehtari 2014) is one measure of out-of-sample predictive performance given by \\[\\begin{equation} \\text{lpd} = \\sum_{i = 1}^n \\log p(\\tilde y_i \\, | \\, \\mathbf{y}) = \\sum_{i = 1}^n \\log p(\\tilde y_i \\, | \\, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\tag{3.7} \\end{equation}\\] The expected LPD (ELPD) integrates the LPD over the data generating process to give a measure of expected performance \\[\\begin{equation} \\text{elpd} = \\sum_{i = 1}^n \\log \\int p(\\tilde y_i \\, | \\, \\mathbf{y}) p(\\tilde y_i) \\text{d} \\tilde y_i. \\tag{3.8} \\end{equation}\\] In reality, such data are not usually available, and instead the ELPD must be approximated using the available data. 3.4.1 Information criteria Information criteria can be constructed to approximate the ELPD using adjusted within-sample predictive performance. The Akaike [AIC; Akaike (1973)] and deviance [DIC; D. J. Spiegelhalter et al. (2002)] information criteria estimate ELPD by \\[\\begin{equation} \\text{elpd}_\\texttt{IC} = \\log p(\\mathbf{y} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\phi}}}), \\tag{3.9} \\end{equation}\\] where \\(\\hat{\\boldsymbol{\\mathbf{\\phi}}}\\) is a maximum likelihood estimate (AIC) or Bayesian point estimate (DIC). The widely applicable information criteria [WAIC; Watanabe (2013)] improves upon Equation (3.9) by instead using the predictive density of the data \\[\\begin{align} \\text{elpd}_\\texttt{WAIC} = \\sum_{i = 1}^n \\log p(y_i \\, | \\, \\mathbf{y}). \\tag{3.10} \\end{align}\\] As both Equations (3.9) and (3.10) are based on within-sample measures, they overestimate the ELPD. As such, they are adjusted downward by a complexity penalty \\(p_{\\texttt{IC}}\\). The particular penalty varies depending on the particular information criteria. 3.4.2 Cross-validation Cross-validation is an alternative way to estimate the ELPD. Rather than use a complexity penalty, as in Section 3.4.1, to adjust a within-sample estimate, cross-validation (CV) partitions the data into training and held-out sets of data. For example, in a leave-one-out (LOO) CV there are \\(n\\) partitions, where each held-out set is a single observation. The LOO-CV estimate of ELPD is \\[\\begin{align} \\text{elpd}_\\texttt{LOO-CV} = \\sum_{i = 1}^n \\log p(y_i \\, | \\, \\mathbf{y}_{-i}), \\tag{3.11} \\end{align}\\] where the subscript \\(-i\\) refers to all elements of the vector excluding \\(i\\). Naively, computing \\(\\text{elpd}_\\texttt{LOO-CV}\\) requires refitting the model \\(n\\) times. This can be computationally costly, and so approximation strategies have been developed. Importance sampling methods using the the full posterior as a proposal are a notable example, including Pareto-smoothed importance sampling [PSIS; Vehtari, Gelman, and Gabry (2017)]. Equation (3.11) is additive, and treats each observation as an independent unit of information. Special care is therefore required in applying cross-validation techniques to dependent (Section 3.2.1.2) spatio-temporal data. For example, Bürkner, Gabry, and Vehtari (2020) and Cooper et al. (2024) use “leave-future-out” (LFO) cross-validation in the time-series context. Similarly, in Chapter 4 I apply a spatial-leave-one-out (SLOO) cross-validation scheme. 3.4.3 Scoring rules Scoring rules [SR; Gneiting and Raftery (2007)] measure the quality of probabilistic forecasts. The log score, used above in the ELPD, is one example of a scoring rule. However, it is by no means the only possibility. Any information criterion (Section 3.4.1) or cross-validation strategy (Section 3.4.2) can be redefined using a different scoring rule, or utility function more broadly. Possible examples include the root mean square error (RMSE), variance explained (\\(R^2\\)) or classification accuracy. The log score (LS) is popular, in part because it is an example of a strictly proper scoring rule (SPSR). A scoring rule is strictly proper when the forecaster gains maximum expected reward by reporting their true probability distribution. Any scoring rule which does not admit this property is susceptible to manipulation, in some sense. The continuous ranked probability score [CRPS; Matheson and Winkler (1976)], which generalises the Brier score (Brier 1950) beyond binary classification, is another example of a SPSR. Ideally, the correct scoring rule to use in an analysis should be determined based upon the application setting. 3.4.4 Bayes factors Finally, the evidence \\(p(\\mathbf{y})\\), given in Equation (3.3), can also be used as a measure of model performance. If \\(\\mathcal{M}_0\\) and \\(\\mathcal{M}_1\\) are two competing models, then the Bayes factor comparing \\(\\mathcal{M}_0\\) to \\(\\mathcal{M_1}\\) is \\[\\begin{equation} B_{01} = \\frac{p(\\mathbf{y} \\, | \\, \\ \\mathcal{M}_0)}{p(\\mathbf{y} \\, | \\, \\ \\mathcal{M}_1)}, \\end{equation}\\] where \\(p(\\mathbf{y} \\, | \\, \\ \\mathcal{M})\\) denotes the evidence under model \\(\\mathcal{M}\\). The Bayes factor can be interpreted as supporting the maximum a posteriori model. If \\(B_{01} > 1\\) then support is provided for \\(\\mathcal{M}_0\\) and if \\(B_{01} < 1\\) then support is provided for \\(\\mathcal{M}_1\\). Bayes factors can also be framed as predictive criteria according to the decomposition \\[\\begin{equation} p(\\mathbf{y}) = p(y_1) p(y_2 \\, | \\, y_1) \\cdots p(y_n \\, | \\, y_{n - 1}, \\ldots, y_1). \\end{equation}\\] 3.5 Survey methods Large national household surveys (Section 2.2.1) provide the highest quality population-level information about HIV indicators in SSA. Demographic and Health Surveys [DHS; USAID (2012)] are funded by the United States Agency for International Development (USAID) and run every three to five years in most countries. Population-based HIV Impact Assessment (PHIA) surveys are funded by PEPFAR and run every four to five years in high HIV burden countries. Analysis of responses from surveys can require specific methods. This section provides required background, before describing the survey design approach used by household surveys in SSA, and the methods used to analyse this data in this thesis. 3.5.1 Background Consider a population of \\(N\\) individuals, indexed by \\(i\\), with outcomes of interest \\(y_i\\). If a census were run, with all responses recorded, then any population quantities of interest could be directly calculated. However, running a census is usually too expensive or otherwise impractical. As such, in a survey only a subset of individuals are sampled: let \\(S_i\\) be an indicator for whether or not individual \\(i\\) is sampled. Furthermore, only a subset of those sampled have their outcome recorded, due to nonresponse or otherwise: let \\(R_i\\) be an indicator for whether or not \\(y_i\\) is recorded. If \\(S_i = 0\\) then \\(R_i = 0\\), and if \\(S_i = 1\\) then individual \\(i\\) may not respond such that \\(R_i = 0\\). Consider a function \\(G_i = G(y_i)\\). The population mean of \\(G\\) is \\[\\begin{equation} \\bar G = \\frac{1}{N} \\sum_{i = 1}^N G(y_i), \\end{equation}\\] and a direct estimate of \\(\\bar G\\) based on the recorded subset of the population is \\[\\begin{equation} \\bar G_R = \\frac{\\sum_{i = 1}^N R_i G(y_i)}{\\sum_{i = 1}^N R_i}, \\tag{3.12} \\end{equation}\\] where \\(m_R = \\sum_{i = 1}^N R_i\\) is the recorded sample size. In a probability sample, individuals are selected to be included in the survey at random. On the other hand, in a non-probability sample, inclusion or exclusion from the survey is deterministic. A simple random sample (SRS) is a probability sample where the sampling probability for each individual is equal \\(\\mathbb{P}(S_i = 1) = 1 / N\\). A survey design is called complex when the sampling probabilities for each individual vary, such that \\(\\mathbb{P}(S_i = 1) = \\pi_i\\), with \\(\\sum_{i = 1}^N \\pi_i = 1\\) and \\(\\pi_i > 0\\). Complex survey designs can offer both greater practicality and statistical efficiency than a SRS. However, care is required in analysing data collected using complex survey designs. Under a complex design, not accounting for unequal sampling probabilities will result in bias. That said, even under SRS, nonresponse analogous bias can be caused by non-response. 3.5.2 Survey design The DHS employs a two-stage sampling procedure, outlined here following USAID (2012). In the first stage, enumeration areas from a recently conducted census are typically used as the primary sampling unit, or cluster. Each cluster is assigned to a strata \\(h\\) by region, as well as by urban-rural status. After appropriate strata sample sizes \\(n_h\\) are determined, EAs are sampled with probability proportional to number of households \\[\\begin{equation} \\pi_{1hj} = n_h \\times \\frac{N_{hj}}{\\sum_j N_{hj}}, \\end{equation}\\] where \\(N_{hj}\\) is the number of households in strata \\(h\\) and cluster \\(j\\). In the second stage, the secondary sampling units are households. All households in the selected cluster are listed, before being sampled systematically at a regular interval, with equal probability \\[\\begin{equation} \\pi_{2hj} = \\frac{n_{hj}}{N_{hj}}, \\end{equation}\\] where \\(n_{hj}\\) is the number of households selected in cluster \\(j\\) and stratum \\(h\\). All adults are interviewed in each selected household. As a result, the probability an individual is sampled is equal to the probability their household is sampled \\(\\pi_{hj} = \\pi_{1hj} \\times \\pi_{2hj}\\). 3.5.3 Survey analysis Suppose a survey is run with complex design, and sampling probabilities \\(\\pi_i\\). Some individuals are more likely to be included in the survey than others. By over-weighting the responses of those unlikely to be included, and under-weighting the responses of those likely to be included, this feature can be taken into account. Design weights \\(\\delta_i = 1 / \\pi_i\\) can be thought of as the number of individuals in the population represented by the \\(i\\)th sampled individual. Let \\[\\begin{equation} \\mathbb{P}(R_i = 1 \\, | \\, S_i = 1) = \\upsilon_i \\end{equation}\\] be the probability of response for sampled individual \\(i\\). Nonresponse can be handled using nonresponse weights \\(\\gamma_i = 1 / \\upsilon_i\\), which analogously can be thought of as the number of sampled individuals represented by the \\(i\\)th recorded individual. Multiplying the design and nonresponse weights gives survey weights \\(\\omega_i = \\delta_i \\times \\gamma_i\\). Extending Equation (3.12), a weighted estimate (Hájek 1971) of the population mean using the survey weights \\(\\omega_i\\) is \\[\\begin{equation} \\bar G_\\omega = \\frac{\\sum_{i = 1}^N \\omega_i R_i G(y_i)}{\\sum_{i = 1}^N \\omega_i R_i}. \\tag{3.13} \\end{equation}\\] Following Meng (2018) and Bradley et al. (2021), decomposing the additive error \\(\\bar G_\\omega - \\bar G\\) of Equation (3.13) provides useful intuition as to the benefits of survey weighting (M. A. Bailey 2023). Under SRS then, the error is a product of three terms \\[\\begin{align} \\bar G_\\omega - \\bar G &= \\frac{\\mathbb{E}(\\omega_i R_i G_i)}{\\mathbb{E}(\\omega_i R_i)} - \\mathbb{E}(G_i) = \\frac{\\mathbb{C}(\\omega_i R_i G_i)}{\\mathbb{E}(\\omega_i R_i)} \\\\ &= \\rho_{R_\\omega, G} \\times \\sqrt{\\frac{N - m_{R_\\omega}}{m_{R_\\omega}}} \\times \\sigma_G, \\end{align}\\] where \\(R_\\omega = \\omega R\\). The first term is called the data defect correlation (DDC), and measures the correlation between the weighted recording mechanism and given function of the outcome of interest. The DDC is minimised when \\(G \\perp \\!\\!\\! \\perp R_\\omega\\). The second term is the data scarcity, and measures the effective proportion of the population who have been recorded. Finally, the third term is the problem difficulty, and measures the intrinsic difficulty of the estimation problem. This term is independent of the sampling or analysis method used. This thesis uses hierarchical Bayesian models defined using weighted direct survey estimates (Fay and Herriot 1979). Following Chen, Wakefield, and Lumely (2014), the sampling distribution of these direct estimates is arrived at by estimating the variance of Equation (3.13). Although this approach acknowledges the complex survey design, it has some important limitations. Importantly, it ignores clustering structure within the observations \\(i\\). Furthermore, as a two-step procedure, it fails to fully propagate uncertainty from a Bayesian perspective. While progress has been made in dealing with survey data, the Gelman (2007) claim that “survey weighting is a mess” still holds some weight. References Akaike, Hirotugu. 1973. “Information theory as an extension of the maximum likelihood principle–In: Second International Symposium on Information Theory (Eds) BN Petrov, F.” Csaki. BNPBF Csaki Budapest: Academiai Kiado. Bailey, Michael A. 2023. “A New Paradigm for Polling.” Harvard Data Science Review 5 (3). Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Berger, James. 2006. “The Case for objective Bayesian analysis.” Bayesian Analysis 1 (3): 385–402. Bernardo, José M, and Adrian FM Smith. 2001. Bayesian theory. John Wiley & Sons. Bivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied spatial data analysis with R. Springer. Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational inference: A review for statisticians.” Journal of the American Statistical Association 112 (518): 859–77. Bradley, Valerie C, Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-Li Meng, and Seth Flaxman. 2021. “Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake.” Nature 600 (7890): 695–700. Brier, Glenn W. 1950. “Verification of forecasts expressed in terms of probability.” Monthly Weather Review 78 (1): 1–3. Bürkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. “Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (14): 2499–2523. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chopin, Nicolas, Omiros Papaspiliopoulos, et al. 2020. An introduction to sequential Monte Carlo. Vol. 4. Springer. Cooper, Alex, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari. 2024. “Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors.” Bayesian Analysis 1 (1): 1–25. Cressie, Noel, and Christopher K Wikle. 2015. Statistics for spatio-temporal data. John Wiley & Sons. Dempster, Arthur P, Nan M Laird, and Donald B Rubin. 1977. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological) 39 (1): 1–22. Duane, Simon, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195 (2): 216–22. Fay, Robert E, and Roger A Herriot. 1979. “Estimates of income for small places: an application of James-Stein procedures to census data.” Journal of the American Statistical Association 74 (366a): 269–77. Fisher, Ronald Aylmer. 1936. “Design of experiments.” British Medical Journal 1 (3923): 554. Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, et al. 2023. viridis(Lite) - Colorblind-Friendly Color Maps for R. https://doi.org/10.5281/zenodo.4679423. Gelfand, Alan E, Li Zhu, and Bradley P Carlin. 2001. “On the change of support problem for spatio-temporal data.” Biostatistics 2 (1): 31–45. Gelman, Andrew. 2005. “Analysis of variance—why it is more important than ever.” ———. 2007. “Struggles with survey weighting and regression modeling.” Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” Statistics and Computing 24 (6): 997–1016. Gelman, Andrew, and Donald B Rubin. 1992. “Inference from iterative simulation using multiple sequences.” Statistical Science, 457–72. Gelman, Andrew, Daniel Simpson, and Michael Betancourt. 2017. “The prior can often only be understood in the context of the likelihood.” Entropy 19 (10): 555. Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian workflow.” arXiv Preprint arXiv:2011.01808. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Giordano, Ryan, Tamara Broderick, and Michael I. Jordan. 2018. “Covariances, Robustness, and Variational Bayes.” Journal of Machine Learning Research 19 (51): 1–49. http://jmlr.org/papers/v19/17-670.html. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Goldstein, Michael. 2006. “Subjective Bayesian analysis: principles and practice.” Hájek, Jaroslav. 1971. “Discussion of ‘An essay on the logical foundations of survey sampling, part I’.” Foundations of Statistical Inference (Proc. Sympos., Univ. Waterloo, Ontario, 1970), 236. Hastie, Trevor, and Robert Tibshirani. 1987. “Generalized additive models: some applications.” Journal of the American Statistical Association 82 (398): 371–86. Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57 (1): 97–109. http://www.jstor.org/stable/2334940. Hoffman, Matthew D, Andrew Gelman, et al. 2014. “The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.” J. Mach. Learn. Res. 15 (1): 1593–623. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Margossian, Charles C, and Andrew Gelman. 2023. “For How Many Iterations Should We Run Markov Chain Monte Carlo?” arXiv Preprint arXiv:2311.02726. Martin, Gael M, David T Frazier, and Christian P Robert. 2023. “Computing Bayes: From then ‘til now.” Statistical Science 1 (1): 1–17. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. McElreath, Richard. 2020. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Metropolis, Nicholas, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” J. Chem. Phys 21: 1087. Minka, Thomas P. 2001. “Expectation Propagation for approximate Bayesian inference.” In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 362–69. Neal, Radford M et al. 2011. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (11): 2. Pfeffermann, Danny et al. 2013. “New Important Developments in Small Area Estimation.” Statistical Science 28 (1): 40–68. Porcu, Emilio, Reinhard Furrer, and Douglas Nychka. 2021. “30 Years of space–time covariance functions.” Wiley Interdisciplinary Reviews: Computational Statistics 13 (2): e1512. Robert, Christian P, and George Casella. 2005. “Monte Carlo Statistical Methods (Springer Texts in Statistics).” Springer. Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. “General state space Markov chains and MCMC algorithms.” Probability Surveys 1 (none): 20–71. https://doi.org/10.1214/154957804100000024. Roy, Vivekananda. 2020. “Convergence diagnostics for Markov chain Monte Carlo.” Annual Review of Statistics and Its Application 7: 387–412. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Shumway, Robert H, and David S Stoffer. 2017. Time Series Analysis and Its Applications With R Examples. Springer. Sisson, Scott A, Yanan Fan, and Mark Beaumont. 2018. Handbook of approximate Bayesian computation. CRC Press. Smith, Nathaniel, and Stéfan van der Walt. 2015. “A Better Default Colormap for Matplotlib.” In Proceedings of the 14th Python in Science Conference (SciPy). Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Stan Development Team. 2023. Stan Reference Manual. https://mc-stan.org/docs/reference-manual/index.html. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tobler, Waldo R. 1970. “A computer movie simulating urban growth in the Detroit region.” Economic Geography 46 (sup1): 234–40. Tokdar, Surya T, and Robert E Kass. 2010. “Importance sampling: a review.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (1): 54–60. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys 6 (none): 142–228. https://doi.org/10.1214/12-SS102. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. "],["beyond-borders.html", "4 Models for areal spatial structure 4.1 Models based on adjacency 4.2 Models using kernels 4.3 Simulation study 4.4 HIV prevalence study 4.5 Discussion", " 4 Models for areal spatial structure This chapter is about spatial random effect model specifications for areal data. A simple model based on the adjacency structure between areas is popular in HIV small-area estimation and beyond. The analysis aimed to determine if using a more complex model would result in more accurate predictions. Modelling spatial correlation is particularly important for small-area estimation of HIV because the covariates most strongly associated with HIV, such as sexual behaviour and STI status (Mayala, Bhatt, and Gething 2020), are difficult to measure. As a result, previous small-area models of HIV have found including covariates only modestly improve predictive performance (Supplementary Figure 20, Dwyer-Lindgren et al. 2019). The lack of predictive covariates emphasises the role of modelling spatial variation. For mapping of other infectious diseases, such as Malaria where transmission is driven by more predictive and easily measurable environmental factors, explanatory covariates are more easily available and directly modelling spatial correlation is less important (Weiss et al. 2015; Bhatt et al. 2015). Spatial variation in areal data is often modelled using spatial random effects (Haining 2003; Cramb et al. 2018). The most common class of models used to specify spatial random effects are Gaussian Markov random fields [GMRFs; Havard Rue and Held (2005)]. These models combine a Gaussian distribution with Markov conditional independence assumptions between areas. Observations in areas close together are assumed to be related, with more distant relationships not directly accounted for. Perhaps the simplest GMRF model is that of Besag, York, and Mollié (1991) in which information is borrowed equally from each adjacent area, based on a binary relationship. The Besag model is attractive as it requires minimal additional modelling choices and is accessibly implemented in software such as R-INLA (Blangiardo et al. 2013), rstan (Morris et al. 2019; Donegan 2022), NIMBLE [Chapter 9; de Valpine et al. (2023)] and PyMC (Saunders 2023), among others. As a result, it has been widely used, including to model bird population dynamics from capture-recapture data (Saracco et al. 2010); for the analysis of magnetic resonance images (Gössl, Auer, and Fahrmeir 2001; Schmid et al. 2006); to map mortality from cancers (Rashid et al. 2023), injuries (Parks et al. 2020), and air pollution (Bennett et al. 2019); and to model alcohol use patterns (Dwyer-Lindgren et al. 2015). The Besag model was designed for image analysis, on a regular grid. However, for more irregular geometries, the assumptions made are unrealistic and appear to be violated. The administrative divisions of a country used in small-area estimation are one example of a more irregular geometry. This chapter tests the hypothesis that using more realistic assumptions about spatial structure improves the performance of small-area estimation models. Performance in this context refers to accurate forecasts of parameters as measured by scoring rules. In doing so, practical recommendations for modelling areal spatial structure are offered. Code for the analysis in this chapter is available from https://github.com/athowes/beyond-borders, and supported by the arealutils R package (Howes 2023a). 4.1 Models based on adjacency This section discusses spatial random effect models based on a symmetric adjacency relation \\(i \\sim j\\) between areas \\(A_i\\) and \\(A_j\\). Adjacency is typically defined by a shared border, though other choices are possible (Christopher J. Paciorek et al. 2013). 4.1.1 The Besag model Figure 4.1: Panel A shows the districts of Zimbabwe. Panel B shows the corresponding adjacency graph \\(\\mathcal{G}\\) with vertices positioned at the centre of the area they correspond to, and edges between adjacent areas. The Besag model (Besag, York, and Mollié 1991) is an improper conditional auto-regressive (ICAR) model where the conditional mean of the random effect \\(u_i\\) is the average of its neighbours \\(\\{u_j\\}_{j \\sim i}\\) and the precision is proportional to the number of neighbours. The full conditional distribution of the \\(i\\)th spatial random effect is given by \\[\\begin{equation} u_i \\, | \\, \\mathbf{u}_{-i} \\sim \\mathcal{N} \\left(\\frac{1}{n_{\\delta i}} \\sum_{j: j \\sim i} u_j, \\frac{1}{n_{\\delta i}\\tau_u}\\right), \\tag{4.1} \\end{equation}\\] where \\(\\delta i\\) is the set of neighbours of \\(A_i\\) with cardinality \\(n_{\\delta i} = |\\delta i|\\) and \\(\\mathbf{u}_{-i}\\) is the vector of spatial random effects with the \\(i\\)th entry removed. By Brook’s lemma (Havard Rue and Held 2005) the set of full conditionals of the Besag model is equivalent to the Gaussian Markov random field (GMRF) given by \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N}(\\mathbf{0}, \\tau_u^{-1} \\mathbf{R}^{-}). \\tag{4.2} \\end{equation}\\] The matrix \\(\\mathbf{R}^{-}\\) is the generalised inverse of the rank-deficient structure matrix \\(\\mathbf{R}\\) with entries \\[\\begin{equation} R_{ij} = \\begin{cases} n_{\\delta i}, & i = j \\\\ -1, & i \\sim j \\\\ 0, & \\text{otherwise.} \\end{cases} \\end{equation}\\] The Markov property arises due to the conditional independence structure \\(p(u_i \\, | \\, \\mathbf{u}_{-i}) = p(u_i \\, | \\, \\mathbf{u}_{\\delta i})\\) whereby each area only depends on its neighbours. This is reflected in the sparsity of \\(\\mathbf{R}\\) such that \\(u_i \\perp u_j \\, | \\, \\mathbf{u}_{-ij}\\) if and only if \\(R_{ij} = 0\\). The structure matrix \\(\\mathbf{R}\\) may also be expressed as the Laplacian matrix of the adjacency graph \\(\\mathcal{G} = (\\mathcal{V}, \\mathcal{E})\\) with vertices \\(v \\in \\mathcal{V}\\) corresponding to each area and edges \\(e \\in \\mathcal{E}\\) between vertices \\(i\\) and \\(j\\) when \\(i \\sim j\\). Figure 4.1 shows the districts of Zimbabwe with corresponding adjacency graph. Rewriting Equation (4.2), the probability density function of \\(\\mathbf{u}\\) is \\[\\begin{equation} p(\\mathbf{u}) \\propto \\tau_u^{\\frac{n - n_c}{2}} \\times \\exp \\left( -\\frac{\\tau_u}{2} \\mathbf{u}^\\top \\mathbf{R} \\mathbf{u} \\right) \\propto \\exp \\left( -\\frac{\\tau_u}{2} \\sum_{i \\sim j} (u_i - u_j)^2 \\right). \\tag{4.3} \\end{equation}\\] This density is a function of the pairwise differences \\(u_i - u_j\\) and so is invariant to the addition of a constant \\(c\\) to each entry \\(p(\\mathbf{u}) = p(\\mathbf{u} + c\\mathbf{1})\\). As a result, there is an improper uniform distribution on the average of the \\(u_i\\). If \\(\\mathcal{G}\\) is connected, in that by traversing the edges, any vertex can be reached from any other vertex, then there is only one impropriety in the model and \\(\\text{rank}(\\mathbf{R}) = n - 1\\), while if \\(\\mathcal{G}\\) is disconnected, and composed of \\(n_c \\geq 2\\) connected components with index sets \\(I_1, \\ldots, I_{n_c}\\), then the corresponding structure matrix \\(\\mathbf{R}\\) has rank \\(n - n_c\\) and the density is invariant to the addition of a constant to each of the connected components \\(p(\\mathbf{u}_{I}) = p(\\mathbf{u}_{I} + c\\mathbf{1})\\) where \\(I = I_1, \\ldots, I_{n_c}\\). 4.1.2 Best practises for the Besag model Directly using the Besag model as described in Section 4.1.1 has several practical limitations in applied settings. To overcome these limitations, Freni-Sterrantino, Ventrucci, and Rue (2018) recommend three best practices: The structure matrix \\(\\mathbf{R}\\) should be rescaled to have generalised variance equal to one. The generalised variance of a matrix is defined by the geometric mean of the diagonal elements of its generalised inverse. For the structure matrix that is \\[\\begin{equation} \\sigma^2_{\\text{GV}}(\\mathbf{R}) = \\prod_{i = 1}^n (\\mathbf{R}^-_{ii})^{1/n} = \\exp \\left( \\frac{1}{n} \\sum_{i = 1}^n \\log (R^-_{ii}) \\right). \\end{equation}\\] The scaled structure matrix \\(\\mathbf{R}^\\star\\) is given by \\[\\begin{equation} \\mathbf{R}^\\star = \\mathbf{R} / \\sigma^2_{\\text{GV}}(\\mathbf{R}). \\end{equation}\\] As the diagonal elements \\(R^-_{ii}\\) correspond to marginal variances, the generalised variance gives a measure of the average marginal variance. This measure, introduced by Sørbye and Rue (2014), ignores off-diagonal entries. More broadly, other measures of typical variance could be used. Scaling mitigates the influence of the adjacency graph on the variance of \\(\\mathbf{u}\\). For consistent and interpretable prior distribution selection, it is important to allow the variance to be controlled by \\(\\tau_u\\) alone. When the adjacency graph is disconnected it is not appropriate to scale the structure matrix \\(\\mathbf{R}\\) uniformly. This is because, given the precision \\(\\tau_u\\), local smoothing operates on each connected component independently. As such, each connected component \\(I = I_1, \\ldots, I_{n_c}\\) should be scaled independently to have generalised variance one \\[\\begin{equation} \\mathbf{R}^\\star_I = \\mathbf{R}_I / \\sigma^2_{\\text{GV}}(\\mathbf{R}_I) \\end{equation}\\] where \\(\\mathbf{R}_I\\) is the sub-matrix of the structure matrix corresponding to index set \\(I\\). When one of the connected components is a single area, known either as a singleton or an island, the probability density \\[\\begin{equation} p(u_i) \\propto \\exp \\left( -\\frac{\\tau_u}{2} \\sum_{i \\sim j} (u_i - u_j)^2 \\right) \\propto 1 \\end{equation}\\] has no dependence on \\(u_i\\). This is equivalent to using an improper prior. To avoid this, each singleton should be set to have independent Gaussian noise \\(p(u_i) \\sim \\mathcal{N}(0, \\tau_u^{-1})\\). To avoid confounding of the spatial random effects with the intercept, it is recommended to place a sum-to-zero constraint on each non-singleton connected component. In other words, \\[\\begin{equation} \\sum_{i \\in I} u_i = 0, \\quad |I| > 1. \\end{equation}\\] As such, in total the number of sum-to-zero constraints equals to the number of non-singleton connected components. 4.1.3 Concerns about the Besag model The Besag model was originally proposed by Besag, York, and Mollié (1991) for use in image analysis. In this setting, areas correspond to pixels arranged in a regular lattice structure. In an image, the data point at each pixel can be thought of as an average of the intensity or colour over the space represented by the pixel. Since its original proposal, the Besag model has seen wider use. However, for small-area estimation of HIV, the spatial structure corresponds to administrative units. These administrative units may have a more irregular spatial structure than a lattice. Furthermore, data points may not come about by uniform averaging over a space. For example, population density may vary across the area. These considerations raise concerns about the Besag model’s applicability to the small-area estimation setting, which we explore in this section. The discussion is closely linked to the modifiable areal unit problem (Openshaw and Taylor 1979), whereby statistical conclusions change as a result of seemingly arbitrary changes in data aggregation, and the challenge of ecological inference and the ecological fallacy (Jonathan Wakefield and Lyons 2010). 4.1.3.1 Compression to adjacency Figure 4.2: Though they are quite different, the geometries shown in panels A, B, C, and D each have the same adjacency graph. Therefore, each geometry would have the same distribution under the Besag model. A fundamental objection is that summarising a geometry by an adjacency graph represents a loss of information. Many geometries share the same adjacency graph, and as such are isomorphic under the Besag model (Figure 4.2). Though not in itself a problem, this fact prompts consideration whether the class of geometries with the same adjacency graph is sufficiently similar to merit identical models. Intuitively, the more regular the spatial structure, the less information is lost in compression to an adjacency graph. In image analysis, very little spatial information is lost in compression of a lattice structure to an adjacency graph. On the other hand, the regions of a country, determined by political and geographic forces, tend to display greater irregularity. The appropriateness of adjacency compression varies therefore by the type of geometry common to the application setting. The regularity of realistic geometries may help to constrain each class to be similar. In other words, although pathological geometries can be constructed, they might be implausible in statistical practice and so of limited concern. 4.1.3.2 Mean structure In the Besag model all adjacent areas count equally in the equation for the conditional mean. This assumption is unsatisfying, as for most geometries we expect different amounts of correlation between neighbouring areas. Figure 4.2 illustrates a number of heuristic features for neighbour importance. In Panel 4.2C, the area with a longer shared border would be expected to be more highly correlated. In Panel 4.2D, the area with a closer centre would be expected to be more highly correlated. 4.1.3.3 Variance structure Figure 4.3: A sequence of geometries where the number of neighbours of area one grows by one at each iteration, as the shaded area is split into more areas. In the limit, the precision of the spatial random effect in the first area tends to infinity. This is not reasonable behaviour if the amount of information being shared is not also increasing. In Equation (4.1) the precision of \\(u_i\\) is proportional to its number of neighbours \\(n_{\\delta i}\\). It follows that as \\(n_{\\delta i} \\to \\infty\\) then \\(\\text{Var}(u_i) \\to 0\\). This is illustrated by Figure 4.3 where the area on the right is repeatedly divided such that its number of neighbours increases. This property is a consequence of averaging the conditional mean over a greater number of areas, which, in certain situations, can correspond to a greater amount of information. However, if the amount of information in the shaded area remains fixed, it is inappropriate that \\(\\text{Var}(u_1)\\) should tend to zero as a result of drawing additional, arbitrary, boundaries. In the image analysis setting this modelling assumption is reasonable: each pixel represents a fixed amount of information and a higher pixel density represents a greater amount of information. On the other hand, in public health and epidemiology, drawing boundaries to create additional areas is not expected to correspond to a greater amount of information. Figure 4.4: Each of the shaded areas in the geometry in Panel A are split into two in Panel B. As a second example of undesirable behaviour, suppose we fit a Besag model upon identical data using each of the two geometries in Figure 4.4. If the spatial variation is relatively smooth, dividing the shaded areas into two will result in a lower estimated variance \\(\\sigma^2_u\\) in Panel 4.4B as compared with Panel 4.4A because there will appear to be less variation between neighbouring areas. This problem does not only apply locally: since the effect of \\(\\sigma^2_u\\) applies everywhere, the smoothing will change even in unaltered parts of the study region. 4.1.4 Weighted ICAR models The Besag model is a special case of a more general class of (zero-mean) weighted ICAR models. These models can be specified in terms of scaled weights \\(\\{b_{ij}\\}_{j \\sim i}\\) and a precision vector \\(\\boldsymbol{\\mathbf{\\kappa}} = (\\kappa_i)_{i \\in [n]}\\). The full conditionals are then \\[\\begin{equation} u_i \\, | \\, \\mathbf{u}_{-i} \\sim \\mathcal{N} \\left( \\sum_{j: j \\sim i} b_{ij} u_j, \\frac{1}{\\kappa_i \\tau_u} \\right). \\tag{4.4} \\end{equation}\\] Setting \\(b_{ij} = 1 / n_{\\delta i}\\) and \\(\\kappa_i = n_{\\delta i}\\) recovers the Besag model in Equation (4.1). The structure matrix \\(\\mathbf{R}\\) corresponding to the more general full conditionals in Equation (4.4) is \\[\\begin{equation} \\mathbf{R} = \\mathbf{D}_\\kappa(\\mathbf{I} - \\mathbf{B}), \\end{equation}\\] where the unscaled weights matrix \\(\\mathbf{B}\\) has elements \\[\\begin{equation} \\mathbf{B}_{ij} = \\begin{cases} b_{ij}, & \\text{for } i \\sim j, \\\\ 0, & \\text{for } i = j, i \\nsim j. \\end{cases}, \\end{equation}\\] and the matrix \\(\\mathbf{D}_\\kappa\\) is given by \\(\\text{diag}(\\kappa_1, \\ldots, \\kappa_n)\\). Ensuring that the structure matrix is symmetric requires that for all \\(i, j \\in [n]\\) \\[\\begin{equation} - b_{ij} \\kappa_i = - b_{ji} \\kappa_j. \\end{equation}\\] To meet this condition, it can be simpler to directly consider symmetry of the unscaled weights matrix \\[\\begin{equation} \\mathbf{W} = \\mathbf{D}_\\kappa \\mathbf{B}, \\end{equation}\\] such that \\(\\mathbf{R} = \\mathbf{D}_\\kappa - \\mathbf{W}\\). For the Besag model the unscaled weights matrix \\(\\mathbf{W}\\) corresponds to the adjacency matrix. Scaled weights can be recovered by \\(b_{ij} = w_{ij} / \\kappa_i\\) where \\(\\kappa_i = \\sum_{k: k \\sim i} w_{ik}\\). Duncan, White, and Mengersen (2017) provide discussion of methods for specifying \\(\\mathbf{W}\\), including \\[\\begin{align} w_{ij} &= \\left( \\frac{1}{d_{ij}} \\right), \\\\ w_{ij} &= \\exp (-d_{ij}). \\end{align}\\] Weighted ICAR models appear to overcome some of the limitations discussed in Section 4.1.3. 4.1.5 The reparameterised Besag-York-Mollié model Often, as well as spatial correlation, there exists IID over-dispersion in the residuals and it is inappropriate to use purely spatially structured random effects in the model. The Besag-York-Mollié (BYM) model of Besag, York, and Mollié (1991) accounts for this in a natural way by decomposing the spatial random effect \\(\\mathbf{u} = \\mathbf{v} + \\mathbf{w}\\) into a sum of an unstructured IID component \\(\\mathbf{v}\\) and a spatially structured Besag component \\(\\mathbf{w}\\). Each component has its own respective precision parameter \\(\\tau_v\\) and \\(\\tau_w\\). The resulting distribution is \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N}(0, \\tau_v^{-1} \\mathbf{I} + \\tau_w^{-1} \\mathbf{R}^{-}) \\tag{4.5}. \\end{equation}\\] Including both \\(\\mathbf{v}\\) and \\(\\mathbf{w}\\) is intended to enable the model to learn the relative extent of the unstructured and structured components via \\(\\tau_v\\) and \\(\\tau_w\\). However, in the BYM model, scaling of the Besag precision matrix \\(\\mathbf{Q}\\) is not taken into account despite this issue being particularly pertinent when dealing with multiple sources of noise. In particular, placing a joint prior distribution \\[\\begin{equation} (\\tau_v, \\tau_w) \\sim p(\\tau_v, \\tau_w) \\end{equation}\\] which does not privilege either component is more easily accomplished if \\(\\mathbf{Q}\\) and \\(\\mathbf{I}\\) have the same scale. Additionally, supposing one has a prior belief that the over-dispersion is primarily IID and \\(\\mathbf{v}\\) accounts for the majority of the dispersion, then it is not immediately obvious how to represent this belief, without inadvertently altering the prior distribution on the amount of overall variation. This highlights identifiability issues of the parameters \\((\\tau_v, \\tau_w)\\) resulting from their non-orthogonality. Building on the models of Leroux, Lei, and Breslow (2000) and Dean, Ugarte, and Militino (2001) which tackle this identifiability problem, but do not scale the spatially structured noise, Simpson et al. (2017) propose a reparameterisation \\((\\tau_v, \\tau_w) \\mapsto (\\tau_u, \\phi)\\) of the BYM model. This is known as the BYM2 model and given by \\[\\begin{align} \\mathbf{u} = \\frac{1}{\\tau_u} \\left( \\sqrt{1- \\phi} \\, \\mathbf{v} + \\sqrt{\\phi} \\, \\mathbf{w}^\\star \\right), \\tag{4.6} \\end{align}\\] where \\(\\tau_u\\) is the marginal precision of \\(\\mathbf{u}\\), \\(\\phi \\in [0, 1]\\) gives the proportion of the marginal variance explained by each component, and \\(\\mathbf{w}^\\star\\) is a scaled version of \\(\\mathbf{w}\\) with precision matrix given by the scaled structure matrix \\(\\mathbf{R}^\\star\\). When \\(\\phi = 0\\) the random effects are IID, and when \\(\\phi = 1\\) the random effects follow the Besag model. To borrow an analogy (Håvard Rue 2020) the parameterisation \\((\\tau_v, \\tau_w)\\) is like having one hot water and one cold water tap, whereas the parameterisation \\((\\tau_u, \\phi)\\) is like a mixer tap where the amount of water and its temperature can be adjusted separately. Although the BYM and BYM2 models were originally proposed using the Besag model as the spatially structured component, this need not be the case. Indeed, more broadly it is reasonable to consider convolved random effects (of a form analogous to that in Equation (4.5) or (4.6)) with any model for spatially structured noise. Any limitations of the model for spatially structured random effects are inherited by the convolved random effects. 4.2 Models using kernels Section 4.1 reviewed ways to construct spatial random effect precision matrices using an adjacency relation. An alternate approach is to define the covariance matrix using an areal kernel function which gives a measure of similarity between two areas. Such a function may be specified as \\[\\begin{equation} K: \\mathcal{P}(\\mathcal{S}) \\times \\mathcal{P}(\\mathcal{S}) \\to \\mathbb{R}, \\tag{4.7} \\end{equation}\\] where \\(\\mathcal{P}\\) denotes the power set such that \\(\\mathcal{P}(\\mathcal{S})\\) is the space of subsets of the study region. If the function \\(K\\) is positive semi-definite, then define areal kernel spatial random effects by \\[\\begin{equation} \\mathbf{u} \\sim \\mathcal{N} \\left( 0, \\frac{1}{\\tau_u} \\mathbf{K} \\right), \\tag{4.8} \\end{equation}\\] where the \\(n \\times n\\) Gram matrix \\(\\mathbf{K}\\) with entries \\(K_{ij} = K(A_i, A_j)\\) is a valid covariance matrix. The precision parameter \\(\\tau_u\\) is placed outside of the Gram matrix, analogous to the relation of the precision and structure matrices, but could be omitted. Areal kernels may be thought of as a type of kernels on sets (Gärtner et al. 2002). It is challenging to think directly about the correlation structure between areas. Instead, most well-known spatial process models define the correlation structure between points using a kernel function \\[\\begin{equation} k: \\mathcal{S} \\times \\mathcal{S} \\to \\mathbb{R}. \\tag{4.9} \\end{equation}\\] A simple method, and the one considered here henceforth, is to construct \\(K\\) (Equation (4.7)) from \\(k\\) (Equation (4.9)) by averaging the kernel \\(k\\) computed on some number of points representing each area. In Section 4.2.1 one point is used, and in Section 4.2.2 multiple points are used. 4.2.1 Centroid kernel The simplest approach is to use a single point to represent each area such that \\[\\begin{equation} K(A_i, A_j) = k(p_i, p_j). \\end{equation}\\] A natural choice is the centroid \\(p_i = c_i\\), given by the arithmetic mean of the latitude and longitude. (Note that it is not guaranteed for the centroid to lie within the area i.e. it is possible \\(c_i \\notin A_i\\), and more generally points representing an area may not be contained by that area.) This choice results in the centroid kernel \\[\\begin{equation} K(A_i, A_j) = k(c_i, c_j). \\tag{4.10} \\end{equation}\\] The centroid kernel has been used in environmental epidemiology (J. Wakefield and Morris 1999), for US election modelling (Flaxman, Wang, and Smola 2015), and to model the reproduction number of COVID-19 (Teh et al. 2022). In a model comparison study Nicky Best, Richardson, and Thomson (2005) (Section 3) simulated data representing heterogeneous exposure to air pollution, including elevated rates of exposure near two hypothetical point source locations, and found that the centroid kernel tended to over-smooth the high-risk areas. That said, it is unsurprising that a stationary covariance function would struggle to recover non-stationary structure. 4.2.2 Integrated kernel Rather than choosing a single representative point, an alternative is to more completely represent the area by integrating the kernel over the areas of interest (Kelsall and Wakefield 2002; Follestad and Rue 2003). This results in the integrated kernel \\[\\begin{equation} K(A_i, A_j) = \\frac{1}{|A_i||A_j|} \\int_{A_i} \\int_{A_j} k(s, s') \\text{d} s \\text{d} s'. \\tag{4.11} \\end{equation}\\] Unlike for the centroid kernel where \\(K_{ii} = 1\\) for all \\(i\\), the marginal variance of the \\(i\\)th spatial random effect \\(K_{ii} = K(A_i, A_i)\\) varies depending on the area: becoming smaller for more compact areas and larger for areas which are of greater extent or more spread out. This covariance structure is equivalent to that obtained by aggregating a spatially continuous Gaussian process with kernel \\(k\\) over the areal partition. In the machine learning literature, models of this kind have been studied under the name aggregated Gaussian processes (Law et al. 2018; Tanaka et al. 2019; Yousefi, Smith, and Alvarez 2019; Hamelijnck et al. 2019; Chau, Bouabid, and Sejdinovic 2021). Examples of use of this model in statistical practice are rare. 4.2.2.1 Accounting for heterogeneity Additional information accounting for heterogeneity over \\(A_i\\) may be incorporated into the integrated kernel. This can be accomplished using weighting distributions \\(\\{w_i\\}\\) which represent an unequal contribution of each point to the similarity measure. The weighted integrated kernel is given by \\[\\begin{equation} K(A_i, A_j) = \\frac{1}{|A_i||A_j|}\\int_{A_i} \\int_{A_j} w_i(s) w_j(s') k(s, s') \\text{d} s \\text{d} s'. \\tag{4.12} \\end{equation}\\] This areal kernel may be useful in disease mapping. For example, areas with populations who live close to a shared border are likely to be more strongly correlated than areas whose populations live far apart. This detail could be accounted for by weighting according to a high resolution measure of population density. Though e.g. weighted centroids may also be used in Equation (4.10), accounting for heterogeneity over an area is more natural within the integrated kernel than the centroid kernel. 4.2.2.2 Computation Figure 4.5: The \\(n = 33\\) districts of Malawi. Panel A shows the centroids as in Section 4.2.1. Panel B shows \\(L_i = 10\\) randomly chosen points, Panel C hexagonal points, and Panel D grid points in each area, each generated using the sf::st_sample function (E. Pebesma 2018). Most of the time it is not possible to calculate Equation (4.12) analytically. Instead, consider \\(n\\) collections of \\(L_i\\) samples \\(\\{s^{(i)}_l \\}_{l = 1}^{L_i} \\sim \\mathcal{U}(A_i)\\) drawn uniformly from each area. Then the integral may be approximated using Monte Carlo by the double sum \\[\\begin{equation} K(A_i, A_j) \\approx \\frac{1}{L_i L_j} \\sum_{l = 1}^{L_i} \\sum_{m = 1}^{L_j} w_i \\left( s^{(i)}_l \\right) w_j \\left( s^{(j)}_m \\right) k \\left( s^{(i)}_l, s^{(j)}_m \\right). \\tag{4.13} \\end{equation}\\] Equivalently, samples drawn from \\(W_i\\) may be used without weighting by \\(w_i(s)\\). Nodes may also be selected deterministically to give a numerical quadrature estimate of the kernel. Figure 4.5 shows three possible ways of choosing points \\(s^{(i)}_l\\), together with the centroids approach. Computing the \\(n \\times n\\) Gram matrix \\(K\\) requires \\[\\begin{equation} \\mathcal{O}(\\sum_{i = 1}^n \\sum_{j = 1}^n L_i L_j) \\end{equation}\\] evaluations of the kernel \\(k\\). This imposes a significant computational cost if the Gram matrix is often recomputed during inference. For example, during MCMC when the kernel has hyperparameters which are learnt then the Gram matrix is recomputed for each proposed set of hyperparameters. As such, there is a limit on the size of \\(L_i\\) which it is feasible to use. Kelsall and Wakefield (2002) encounter this challenge, and take the approach of using a discrete hyperparameter prior to reduce the number of Gram matrix constructions and inversions required. 4.2.2.3 Connection to log-Gaussian Cox processes The log-Gaussian Cox Process framework (Diggle et al. 2013) arrives naturally at the integrated kernel formulation (Li et al. 2012). A Cox process is an inhomogeneous Poisson process with a continuous stochastic intensity function \\(\\{ x(s), s \\in \\mathcal{S} \\}\\) such that conditional on the realisation of \\(x(s)\\) the number of points in any area \\(A_i\\) follows a Poisson distribution. The rate parameter of this Poisson distribution is explicitly aggregated as follows \\[\\begin{equation} y_i \\, | \\, x(s) \\sim \\text{Poisson} \\left(\\int_{s \\in A_i} x(s) \\text{d}s \\right). \\end{equation}\\] In a LGCP the log intensity \\(\\log x(s) = \\eta(s)\\) is modelled using a Gaussian process prior \\(\\eta(s) \\sim \\mathcal{GP}(\\mu(s), k(s, s'))\\). O. Johnson, Diggle, and Giorgi (2019) obtain Equation (4.12) by considering a discrete Poisson log-linear mixed model approximation to a continuous LGCP, whereby \\(\\eta(s)\\) is approximated by a piecewise constant \\(\\eta_i = \\mu_i + u_i\\) in each area \\(A_i\\). The \\(i\\)th discrete spatial random effect is then \\(u_i = \\int_{A_i} w_i(s) u(s) \\text{d}s\\), with covariance structure \\[\\begin{equation} \\text{Cov} \\left( \\int_{A_i} w_i(s) u(s) \\text{d}s, \\int_{A_j} w_j(s') u(s') \\text{d}s' \\right) = \\int_{A_i} \\int_{A_j} w_i(s) w_j(s') k(s, s') \\text{d}s\\text{d}s', \\end{equation}\\] corresponding to an areal integrated kernel with a logarithmic link function and Poisson likelihood. 4.2.2.4 Connection to disaggregation regression Disaggregation regression, also known as downscaling or interpolation, is another closely related approach. Rather than focusing on the aggregate nature of areal observations as a route towards better area-level estimates, disaggregation regression aims to produce high-resolution or point-level estimates from areal observations (Utazi et al. 2019; Arambepola et al. 2022; Nandi et al. 2023). These two tasks are similar, and indeed it could be argued that accurate point-level estimates are a necessary intermediate step towards accurate area-level estimates. However, disaggregation regression is challenging without auxiliary covariate information, and therefore unlikely to be applicable to small-area estimation of HIV. 4.3 Simulation study This simulation study tests the ability of inferential models with varying spatial random effect specifications to accurately recover small-area quantities. The data and modelling choices were designed with a spatial epidemiology application in mind. 4.3.1 Synthetic data Table 4.1: The three spatial random effect models used to generate synthetic data in the simulation study (Section 4.3). Model Details IID \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\mathbf{I}_n)\\) Besag \\(\\mathbf{u} \\sim \\mathcal{N}(0, {\\mathbf{R}^\\star}^{-})\\) as in Section 4.1.1 Integrated kernel (IK) \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\mathbf{K}^\\star)\\) as in Section 4.2.2 with Matérn kernel, \\(\\nu = 3/2, l = 2.5\\) and \\(L_i = 100\\) points per area Data \\(\\mathbf{y} = (y_i)_{i \\in [n]}\\) were simulated from a binomial likelihood \\(y_i \\sim \\text{Bin}(m_i, \\rho_i)\\). The probabilities \\(\\rho_i \\in [0, 1]\\) were linked to linear predictors \\(\\eta_i \\in \\mathbb{R}\\) via \\[\\begin{equation} \\log \\left( \\frac{\\rho_i}{1 - \\rho_i} \\right) = \\eta_i = \\beta_0 + u_i, \\quad i \\in [n]. \\end{equation}\\] Spatial random effects were generated according to three different models (Table 4.1). Sample sizes were fixed as \\(m_i = 25\\) for all \\(i \\in [n]\\), the intercept parameter as \\(\\beta_0 = -2\\) and the spatial random effect precision parameter as \\(\\tau_u = 1\\). Figure 4.6: Seven geometries were considered in the simulation study. These were the four geometries from Figure 4.2 shown in Panel A, B, C and D, and three more realistic geometries shown in Panel E, F and G. Seven geometries were considered (Figure 4.6). These included the four vignette geometries from Figure 4.2 which share an adjacency graph. Three more realistic geometries were included to represent plausible variation over spatial regularity for the small-area estimation setting. From the most to the least spatially regular, these geometries were: a \\(6 \\times 6\\) lattice grid; the 33 districts of Côte d’Ivoire; and the 36 congressional districts of Texas. For each of the three spatial random effect models and seven geometries 250 synthetic data were generated, resulting in a total of 5250 synthetic data. 4.3.2 Inferential models Table 4.2: The spatial random effect models used for inference. Each model is implemented in the arealutils R package (Howes 2023a). The BYM2 model was implemented using the sparsity preserving parameterisation described in Section 3.2 of Riebler et al. (2016). Model Details IID \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{I}_n)\\) Besag \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} {\\mathbf{R}^\\star}^{-})\\) as in Section 4.1.1 BYM2 \\(\\mathbf{u} = \\tau_u^{-1} ( \\sqrt{1 - \\pi} \\, \\mathbf{v} + \\sqrt{\\pi} \\, \\mathbf{w}^\\star)\\) as in Section 4.1.5 with \\(\\pi \\sim \\text{Beta}(0.5, 0.5)\\) FCK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = k(c_i, c_j)\\) as in Section 4.2.1 with fixed length-scale \\(l\\) CK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = k(c_i, c_j)\\) as in Section 4.2.1 with length-scale prior distribution \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\) with \\(a, b\\) set based on the geometry FIK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = K(A_i, A_j)\\) as in Section 4.2.2 with hexagonal points (Panel 4.5C), \\(L_i = 10\\), and fixed length-scale \\(l\\) IK \\(\\mathbf{u} \\sim \\mathcal{N}(0, \\tau_u^{-1} \\mathbf{K})\\) with \\(K_{ij} = K(A_i, A_j)\\) as in Section 4.2.2 with hexagonal points (Panel 4.5C), \\(L_i = 10\\), and length-scale prior distribution \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\) with \\(a, b\\) set based on the geometry Eight inferential models were fit to the synthetic data (Table 4.2). Apart from the spatial random effect specification, each inferential model corresponded exactly to the simulation model. 4.3.2.1 Kernels Gram matrices were computed using the Matérn kernel \\(k: \\mathcal{S} \\times \\mathcal{S} \\to \\mathbb{R}\\) (Stein 1999) given by \\[\\begin{equation} k(s, s') = \\frac{1}{2^{\\nu - 1}\\Gamma(\\nu)} \\left(\\frac{\\sqrt{2\\nu}\\lvert s - s' \\rvert}{l}\\right)^\\nu B_\\nu\\left(\\frac{\\sqrt{2\\nu}\\lvert s - s' \\rvert}{l}\\right). \\tag{4.14} \\end{equation}\\] In Equation (4.14): \\(B_\\nu\\) is the modified Bessel function of the second kind; \\(|s - s'|\\) is the Euclidean distance between the point locations \\(s\\) and \\(s'\\); \\(\\nu\\) is the smoothness hyperparameter; \\(l\\) is the length-scale hyperparameter on the latitude-longitude scale. We fixed the smoothness hyperparameter \\(\\nu\\) to \\(3/2\\) to avoid concerns regarding the joint identifiability of the smoothness and lengthscale hyperparameters. This value matches that used to simulate data, and simplifies Equation (4.14) as follows \\[\\begin{equation} k(s, s') = \\left(1 + \\frac{\\sqrt{3} \\lvert s - s' \\rvert}{l} \\right) \\exp \\left(- \\frac{\\sqrt{3} \\lvert s - s' \\rvert}{l} \\right). \\end{equation}\\] The number of points per area \\(L_i\\) was set to 10 with a hexagonal spacing structure (Panel 4.5C). The actual values of \\(L_i\\) sometimes differed from 10 because sf::st_sample with type = \"hexagonal\" does not guarantee exactly the specified number of samples are returned (E. Pebesma 2018). 4.3.2.2 Prior distributions A weakly informative half-Gaussian prior was placed on the standard deviation such that \\(\\sigma_u \\sim \\mathcal{N}_+(0, 2.5^2)\\) (Gelman 2006). The value 2.5 avoids placing significant prior density on the region \\(\\sigma_u > 5\\), which after logistic transformation would facilitate undesirable variation on the probability scale very close to zero or one. A weakly informative \\(\\mathcal{N}(-2, 1)\\) prior was placed on \\(\\beta_0\\), setting most of the prior probability density for \\(\\text{logit}^{-1}(\\beta_0)\\) within a range typical for a disease prevalence. In cases where the length-scale \\(l\\) was fixed, it was set based on the geometry such that points an average distance apart had 1% correlation (N. Best et al. 1999). In cases where a prior distribution was set on the length-scale it was \\(l \\sim \\mathrm{Inv{\\text-}Gamma}(a, b)\\), with \\(a\\) and \\(b\\) chosen for each geometry such that 5% of the prior mass was below the 5% quantile for distance between points and 5% of the prior mass was above the 95% quantile (Betancourt 2017). The sensitivity analysis in Appendix A.2 illustrates the extent to which six possible lengthscale prior distributions (Figure A.9) affect the lengthscale posterior distribution (Figure A.10). 4.3.2.3 Inference Approximate Bayesian inference was conducted using adaptive Gauss-Hermite quadrature [AGHQ; Stringer, Brown, and Stafford (2022)] with \\(k = 3\\) quadrature points over a marginal Laplace approximation via the aghq package (Stringer 2021). Models were implemented using a Template Model Builder C++ template for the log-posterior via the TMB package (Kristensen et al. 2016). Appendix A.1 compares posterior mean and standard deviations from AGHQ to those obtained using the No-U-Turn Sampler (NUTS) Hamiltonian Monte Carlo (HMC) algorithm run using Stan (Carpenter et al. 2017) via the tmbstan package (Monnahan and Kristensen 2018). 4.3.3 Model assessment Let the parameter \\(\\phi\\) have posterior marginal \\(f(\\phi) = p(\\phi \\, | \\, \\mathbf{y})\\) with cumulative distribution function \\(F\\). Let \\(\\phi_s\\) be samples \\(s \\in [S]\\) from \\(f\\). Here, the number of samples per posterior marginal was \\(S = 200\\). Let \\(\\omega\\) be the true value of \\(\\phi\\) used in the simulation. The accuracy of latent field parameter and hyperparameter posterior marginals from each model were assessed using three methods. These were the mean squared error (MSE), the continuous ranked probability score [CRPS; Matheson and Winkler (1976)], and the probability integral transform (PIT; Dawid (1984)) values. The MSE is a simple and popular measure, calculated using samples as \\[\\begin{equation} \\text{MSE}(f, \\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S (\\phi_s - \\omega)^2. \\end{equation}\\] The CRPS is a strictly proper scoring rule (SPSR) which has favourable properties and is often regarded as a default choice (Gneiting and Raftery 2007). Any scoring rule which is not strictly proper rewards a misrepresentation of beliefs. The CRPS is \\[\\begin{equation} \\text{CRPS}(f, \\omega) = \\int_{-\\infty}^{\\infty} (F(\\phi) - \\mathbb{I} \\{\\phi \\geq \\omega \\} )^2 \\text{d}\\phi. \\tag{4.15} \\end{equation}\\] The CRPS may be estimated using samples by \\[\\begin{equation} \\text{CRPS}(f, \\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S | \\phi_s - \\omega | - \\frac{1}{2S^2} \\sum_{s = 1}^S \\sum_{l = 1}^S | \\phi_s - \\phi_l |. \\tag{4.16} \\end{equation}\\] A posterior marginal is calibrated if over repeated simulations the quantile of the true value, known as the PIT value, is uniformly distributed such that \\[\\begin{equation} F(\\omega) \\approx \\frac{1}{S} \\sum_{s = 1}^S \\mathbb{I} \\{\\phi_i \\geq \\omega \\} = q \\sim \\mathcal{U}[0, 1]. \\tag{4.17} \\end{equation}\\] If Equation (4.17) holds then at any given nominal coverage \\(1 - \\alpha\\) the proportion of quantile-based credible intervals containing \\(\\omega\\) is also \\(1 - \\alpha\\). Uniformity was assessed using PIT histograms (Dawid 1984) and empirical cumulative distribution function (ECDF) difference plots (Aldor-Noiman et al. 2013) with simultaneous confidence bands as described in Säilynoja, Bürkner, and Vehtari (2022). 4.3.4 Results 4.3.4.1 Vignette geometries As each geometry only had three areas, the sample size of 250 synthetic data was insufficient to distinguish between inferential models for the vignette geometries. Figures A.13, A.14, A.15 and A.16 show that almost all 95% credible intervals for the mean CRPS in estimating \\(\\rho_i\\) overlap. Additionally, for the vignette geometries, both the heuristic method for fixing a lengthscale, and lengthscale prior distribution, were misspecified. Three points was insufficient to learn the lengthscale, and as such misspecification of the prior distribution propagated to the posterior distribution (Figure A.11). To produce higher resolution and more meaningful results, the simulation study for the vignette geometries should be rerun. Two changes should be made. First, an increase to the sample size. Second, more careful specification of study with regard to the lengthscale. 4.3.4.2 Realistic geometries Figure 4.7: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the grid geometry (Panel 4.6E). The mean value averages over both areas and simulation runs. Figure 4.8: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the Côte d’Ivoire geometry (Panel 4.6F). The mean value averages over both areas and simulation runs. Figure 4.9: The mean CRPS in estimating \\(\\rho_i\\) and its standard error for each inferential model and simulation model on the Texas geometry (Panel 4.6G). The mean value averages over both areas and simulation runs. The two problems with the vignette geometry study did not apply to the more realistic geometries. Figures 4.7, 4.8 and 4.9 show mean CRPS values in estimating \\(\\rho_i\\) with 95% credible intervals which rarely overlap, and hence provide meaningful findings. Mean MSE and CRPS values are provided in Tables A.2 and A.3. The mean values are an average over both the number of areas in each geometry and the number of simulations run. The mean CRPS varied substantially between the three models (Table 4.1) used to simulate synthetic data. IID structure is harder to predict than spatial structure, and to a lesser extent, Besag structure is harder to predict than IK. This observation is explained by correlation structure making forecasting easier. For IID synthetic data, the IID and BYM2 models performed well. The BYM2 model also performed almost as well as the Besag model on the spatially structured synthetic data. Appendix A.3.2 shows that the BYM2 proportion parameter successfully recovers either IID or spatial structure. Meanwhile, the IID model performed poorly on spatially structured synthetic data. The performance of kernel models on IID and Besag synthetic data diminished with increasingly spatially irregular geometry. For the most part, differences between the centroid and integrated kernel models were small, even for synthetic data generated from the IK model. Only for the IK simulated data there was a significant difference between the kernel models with a fixed lengthscale and prior distribution set on the lengthscale. Interpretation of CRPS choropleths (Figures 4.7, 4.8 and 4.9) was challenging primarily due to two factors: varying scores by simulation model, and limited sample size at the area-level. It would be relatively simple to remedy these challenges, such that figures of this kind could help to uncover precise findings about spatial random effect models. For IID synthetic data, spatial models tend to produce “U”-shaped ECDF difference plots (Figures A.28, A.29 and A.30). In other words, the quantile of the true value is too often near zero or one. This pattern corresponds to over-smoothing. 4.4 HIV prevalence study Simulation studies are a valuable tool for experimenting on models in controlled environments. However, it is difficult to capture the complexity of a realistic applied scenario using simulation. Therefore, it is important to complement simulation studies with studies conducted on real data. To this end, model performance was compared in estimating district-level HIV prevalence \\(\\rho_i \\in [0, 1]\\) in adults aged 15-49. Household survey data was used from across four countries in sub-Saharan Africa (Table 4.3, Figure 4.10). Table 4.3: The four PHIA household surveys included in the HIV prevalence study (Section 4.4). Country Survey Number of areas Analysis level Côte d’Ivoire PHIA 2017 33 Regions Malawi PHIA 2016 31 Health districts and cities, with islands removed Tanzania PHIA 2017 26 Regions, with islands removed Zimbabwe PHIA 2016 60 Districts 4.4.1 Household survey data Figure 4.10: Adult (15-49) HIV prevalence from the most recent PHIA survey conducted in Côte d’Ivoire (Panel A), Malawi (Panel B), Tanzania (Panel C), and Zimbabwe (Panel D). These estimates are survey weighted according to Equation (4.18). Data from the most recent publicly available Population Health Impact Assessment (PHIA) survey were used in each country. Let \\(y_{ij} \\in \\{0, 1\\}\\) be the survey response for individual \\(j\\) in area \\(i\\). The survey designs used were complex in that each individual had potentially unequal probabilities \\(\\pi_{ij}\\) of being included in the survey. Sampling weights \\[\\begin{equation} w_{ij} = \\frac{1}{\\pi_{ij}} \\end{equation}\\] were used to account for the complex survey design. The survey weighted prevalence in area \\(i\\) is \\[\\begin{equation} \\rho_i^\\star = \\frac{\\sum_{j} w_{ij} y_{ij}}{\\sum_{j} w_{ij}}. \\tag{4.18} \\end{equation}\\] The effective number of cases \\(y_i^\\star = \\rho_i^\\star \\cdot m_i^\\star\\) is given by the product of the weighted prevalence, and the Kish effective sample size (Kish 1965) \\[\\begin{equation} m_i^\\star = \\frac{(\\sum_j w_{ij})^2}{\\sum_j w_{ij}^2}, \\end{equation}\\] and may be intuitively thought of as what would have been observed had the survey been a simple random sample. 4.4.2 Inferential models The inferential models used correspond to those in Section 4.3 with a small modification. As before, prevalences \\(\\rho_i\\) were modelled via \\(\\text{logit}(\\rho_i) = \\beta_0 + u_i\\) with spatial random effect specification varied according to Table 4.2. Due to survey weighting, the effective number of cases \\(y_i^\\star \\in \\mathbb{R}\\) and effective sample size \\(m_i^\\star \\in \\mathbb{R}\\) may not be integers. Following Chen, Wakefield, and Lumely (2014) a generalised binomial distribution \\(y_i^\\star \\sim \\text{xBin}(m_i^\\star, \\rho_i)\\) was used, with working likelihood for \\(m^\\star_i \\geq y^\\star_i\\) given by \\[\\begin{equation} p(y_i^\\star \\, | \\, m_i^\\star, \\rho_i) = \\frac{\\Gamma(m_i^\\star + 1)}{\\Gamma(y_i^\\star + 1) \\Gamma(m_i^\\star - y_i^\\star + 1)} \\rho_i ^{y_i^\\star} (1 - \\rho_i)^{(m_i^\\star - y_i^\\star)}. \\tag{4.19} \\end{equation}\\] 4.4.3 Model comparison Figure 4.11: In leave-one-out (LOO) cross-validation, one observation is left out of the training data and predicted upon in each fold. The spatial-leave-one-out (SLOO) cross-validation scheme considered here is similar, only differing in that observations corresponding to adjacent areas are also left out of the training data. Each model was assessed using (Figure 4.11): a regular leave-one-out cross-validation (LOO-CV); a spatial leave-one-out cross-validation (SLOO-CV). At each fold the CRPS, MSE and quantile (as in Section 4.3.3) of posterior predictive samples as compared with the observed data were computed. In this section, the number of samples per posterior marginal was \\(S = 1000\\). 4.4.4 Results Figure 4.12: The mean pointwise leave-one-out and spatial leave-one-out CRPS in estimating \\(\\rho_i\\) using each inferential model for the four PHIA surveys described in Table 4.3. The 95% credible intervals shown are generated using 1.96 times the standard error. #jhmfoggtba table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #jhmfoggtba thead, #jhmfoggtba tbody, #jhmfoggtba tfoot, #jhmfoggtba tr, #jhmfoggtba td, #jhmfoggtba th { border-style: none; } #jhmfoggtba p { margin: 0; padding: 0; } #jhmfoggtba .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #jhmfoggtba .gt_caption { padding-top: 4px; padding-bottom: 4px; } #jhmfoggtba .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #jhmfoggtba .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #jhmfoggtba .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #jhmfoggtba .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #jhmfoggtba .gt_column_spanner_outer:first-child { padding-left: 0; } #jhmfoggtba .gt_column_spanner_outer:last-child { padding-right: 0; } #jhmfoggtba .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #jhmfoggtba .gt_spanner_row { border-bottom-style: hidden; } #jhmfoggtba .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #jhmfoggtba .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #jhmfoggtba .gt_from_md > :first-child { margin-top: 0; } #jhmfoggtba .gt_from_md > :last-child { margin-bottom: 0; } #jhmfoggtba .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #jhmfoggtba .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #jhmfoggtba .gt_row_group_first td { border-top-width: 2px; } #jhmfoggtba .gt_row_group_first th { border-top-width: 2px; } #jhmfoggtba .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #jhmfoggtba .gt_first_summary_row.thick { border-top-width: 2px; } #jhmfoggtba .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #jhmfoggtba .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #jhmfoggtba .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #jhmfoggtba .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #jhmfoggtba .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #jhmfoggtba .gt_left { text-align: left; } #jhmfoggtba .gt_center { text-align: center; } #jhmfoggtba .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #jhmfoggtba .gt_font_normal { font-weight: normal; } #jhmfoggtba .gt_font_bold { font-weight: bold; } #jhmfoggtba .gt_font_italic { font-style: italic; } #jhmfoggtba .gt_super { font-size: 65%; } #jhmfoggtba .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #jhmfoggtba .gt_asterisk { font-size: 100%; vertical-align: 0; } #jhmfoggtba .gt_indent_1 { text-indent: 5px; } #jhmfoggtba .gt_indent_2 { text-indent: 10px; } #jhmfoggtba .gt_indent_3 { text-indent: 15px; } #jhmfoggtba .gt_indent_4 { text-indent: 20px; } #jhmfoggtba .gt_indent_5 { text-indent: 25px; } Table 4.4: The mean pointwise leave-one-out and spatial leave-one-out CRPS in estimating \\(\\rho_i\\) for each inferential model across the four considered PHIA surveys. The units used in this table are thousandths. For standard errors, see Figure 4.12. PHIA survey Continuous ranked probability score (units: 1/1000) IID Besag BYM2 FCK CK FIK IK LOO Côte d’Ivoire, 2017 6.6 6.6 6.7 6.7 6.9 6.9 6.9 Malawi, 2016 31.7 19.5 19.6 22.7 22.8 21.4 21.0 Tanzania, 2017 14.9 12.1 13.4 10.7 9.5 10.3 10.6 Zimbabwe, 2016 28.9 20.8 20.9 21.7 21.6 21.4 22.0 SLOO Côte d’Ivoire, 2017 6.5 6.6 6.6 6.4 6.9 6.4 6.8 Malawi, 2016 31.6 19.3 19.9 26.5 29.0 25.1 28.3 Tanzania, 2017 14.9 12.1 18.1 16.0 17.6 15.4 16.9 Zimbabwe, 2016 29.1 20.8 25.2 26.7 26.2 26.1 26.3 The results (Figure 4.12, Table 4.4, Table A.4) for each survey were as follows: For the 2017 PHIA survey in Côte d’Ivoire, all of the models performed similarly, using both LOO- and SLOO-CV (Figure A.37). The pointwise CRPS for all models was high at one outlying district in the survey, Grand-Ponts. It is difficult to see how any spatial random model would perform well in this situation, without additional covariates or using a distribution with heavier tails than the Gaussian. The CK and IK models had lengthscale posterior distributions largely unchanged from their prior distribution (Figure A.31). This uncertainty in lengthscale resulted in wide prevalence 95% credible intervals for the CK and IK models in Figure A.33. This example shows the importance of being careful using kernel models, and the prior distributions set on their hyperparameters. It is surprisingly that this behaviour appears not to have resulted in poor LOO or SLOO performance. For this survey the BYM2 proportion posterior distribution was also similar to its prior distribution, in contrast to each of the other surveys which had BYM2 proportion posteriors peaked at one, corresponding to spatially structured noise (Figure A.32). For the 2017 PHIA survey in Malawi the Besag and BYM2 models performed the best, followed by the kernel models, and then the IID model (Figure A.38). While the LOO and SLOO CRPS values for IID, Besag and BYM2 models were similar, for the kernel models forecasting performance was substantially reduced by leaving out adjacent districts. This finding is surprising, as the kernel models make use of more distant correlations, and it is the adjacency-based models that one would intuitively expect to be hampered more by the SLOO-CV. For the IID model, that LOO and SLOO performance are similar is no surprise as in all cases the IID model should be predicting the mean. Though less data are available in the SLOO case, this should be of little consequence. For the 2017 PHIA survey in Tanzania (Figure A.39), under LOO-CV the kernel models performed better, but under SLOO-CV there was a significant drop in performance. Finally, for the 2016 PHIA survey in Zimbabwe, performance for each of the spatially structured models was similar (Figure A.40). Again, under SLOO-CV, performance of the BYM2 and kernel-based models dropped. Differences within the kernel-based models for this survey, and indeed across all four surveys, were limited. 4.5 Discussion 4.5.1 Modelling Though there are situations where other models perform better, on the whole this study supports the use of adjacency-based spatial random effect models. For the study on HIV survey data, adjacency-based models performed well, if not the best, in all cases. That is not to say that under data truly generated from a kernel model, there isn’t significant benefit to using the corresponding kernel model for inference. However, the transferability of this finding to applied settings is limited by the following factors. First and foremost, it is usually impossible to know that real data was generated from any particular process. Second, the synthetic data study used the same kernel, Matérn with \\(\\nu = 3/2\\) (Equation (4.14)), for both simulation and inference, and as such represents a best-case. Third, specification of the lengthscale prior distribution is challenging, and easy to do badly. Finally, aggregation via the integrated kernel occurred at the level of the latent field, despite the fact that most of the time we expect aggregation to occur at the level of the data. If the link function \\(g\\) is the identity or linear then the two are equivalent, but non-linear link functions create a discrepancy, which this study did not address. This chapter did not consider use of the stochastic partial differential equation (SPDE) approximation of Lindgren, Rue, and Lindström (2011) as a potentially more computationally efficient way to implement integrated kernel models (Wilson and Wakefield 2018). Though the underlying models are ultimately similar, that is a continuous Matérn random field over space aggregated at an area-level, the findings from this work are likely to apply to use of the SPDE approximation. Nonetheless, it would be of value to confirm this empirically. This chapter used area-level models to for data which arises by aggregation of point-level data. However, Konstantinoudis et al. (2020) found that using a point-level LGCP model rather than an area-level BYM model may have significant benefits. The work in this chapter does not address the broader question of under which circumstances use of an area or point-level model is sensible. The adjacency-based models considered in this study were limited to the Besag and BYM2 model. Although these are perhaps the most widely used adjacency-based models, others could have been considered. Examples include the more general weighted ICAR model discussed in Section 4.1.4. Additionally, it would be of interest to implement the integrated kernel model with population-based weighting (Section 4.2.2.1). The models used for spatial structure in this chapter were all stationary. Although stationarity assumptions may be violated by HIV survey data, it remains challenging to estimate non-stationary spatial structure (Christopher J. Paciorek and Schervish 2006). 4.5.2 Model comparison Previous spatial random effect comparison studies (Nicky Best, Richardson, and Thomson 2005; Lee 2011) were limited to the DIC measure of model performance. Use of the DIC is strongly discouraged by Vehtari, Gelman, and Gabry (2017). This study used less flawed measures of model performance, such as the cross-validated CRPS. It would be beneficial to compute the DIC and WAIC in Section 4.4 as a comparison. Additionally, the measures used in this study were computed and presented by individual area. With refinements to the sample sizes used, these area-specific measures of performance could enable more nuanced conclusions about the use of spatial random effect models. Cross-validation was performed using \\(\\rho\\) as the forecasting target, rather than \\(y\\) as is typical. This decision was made because applied interest is in forecasting HIV prevalence at a district level, not forecasting the outcome of a household survey. It could be argued that a district does not become more important to forecast well by virtue of surveying a larger sample size in that district. That said, an alternative viewpoint is that forecast accuracy should be incentivised in proportion to district population size, such that PLHIV is accurately estimated. If sample size is proportional to population size, then forecasting \\(y\\) could be a useful proxy. Choice of the particular parameter, or transformation of that parameter (Nikos I. Bosse et al. 2023), to score is an ongoing topic of research. The CRPS was used in preference to the log-score. Whereas the log-score requires a kernel density estimate of the posterior distribution, and is therefore sensitive to tuning parameters, the CRPS can be estimated from samples alone. A downside of use of the CRPS and MSE is their relative lack of interpretability. For example, it is difficult to determine whether a forecast is good, or suitable for practical use, on the basis of its CRPS or MSE. Measures such as the skill score have been used to contrast forecast performance with some baseline. A constant model, with no random effects, could be used as such a baseline. 4.5.3 Inference A strength of this work is that all of the inferential models (Table 4.2) in this chapter were implemented in TMB. Inference was then conducted using AGHQ over the marginal Laplace approximation using the aghq package. The accuracy of inferences was compared to gold-standard results from NUTS obtained using the tmbstan package. An earlier version of this study used R-INLA. Not all of the inferential models were compatible with R-INLA, so rstan was used in some cases. However, due to the difference in inference algorithm, this study design conflated statistical models with inference algorithms. Consistent use of TMB, a fast and flexible tool for spatial modelling (Osgood-Zimmerman and Wakefield 2023), overcame this limitation. Chapter 6 extends TMB to implement the INLA algorithm of R-INLA. References Aldor-Noiman, Sivan, Lawrence D Brown, Andreas Buja, Wolfgang Rolke, and Robert A Stine. 2013. “The power to see: A new graphical test of normality.” The American Statistician 67 (4): 249–60. Arambepola, Rohan, Tim CD Lucas, Anita K Nandi, Peter W Gething, and Ewan Cameron. 2022. “A simulation study of disaggregation regression for spatial disease mapping.” Statistics in Medicine 41 (1): 1–16. Bennett, James E, Helen Tamura-Wicks, Robbie M Parks, Richard T Burnett, C Arden Pope III, Matthew J Bechle, Julian D Marshall, Goodarz Danaei, and Majid Ezzati. 2019. “Particulate matter air pollution and national and county life expectancy loss in the USA: A spatiotemporal analysis.” PLOS Medicine 16 (7): e1002856. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Best, Nicky, Sylvia Richardson, and Andrew Thomson. 2005. “A comparison of Bayesian spatial models for disease mapping.” Statistical Methods in Medical Research 14 (1): 35–59. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. Bhatt, Samir, DJ Weiss, E Cameron, D Bisanzio, B Mappin, U Dalrymple, KE Battle, et al. 2015. “The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015.” Nature 526 (7572): 207–11. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Bosse, Nikos I, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, and Sebastian Funk. 2023. “Scoring epidemiological forecasts on transformed scales.” PLOS Computational Biology 19 (8): e1011393. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Chau, Siu Lun, Shahine Bouabid, and Dino Sejdinovic. 2021. “Deconditional downscaling with Gaussian processes.” Advances in Neural Information Processing Systems 34: 17813–25. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Cramb, SM, EW Duncan, PD Baade, and KL Mengersen. 2018. “Investigation of Bayesian spatial models.” Cancer Council Queensland; Queensland University of Technology (QUT). Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dean, CB, MD Ugarte, and AF Militino. 2001. “Detecting interaction between random region and fixed age effects in disease mapping.” Biometrics 57 (1): 197–202. Diggle, Peter J, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. “Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm.” Statistical Science 28 (4): 542–63. Donegan, Connor. 2022. “geostan: An R package for Bayesian spatial analysis.” The Journal of Open Source Software 7 (79): 4716. https://doi.org/10.21105/joss.04716. Duncan, Earl W, Nicole M White, and Kerrie Mengersen. 2017. “Spatial smoothing in Bayesian models: a comparison of weights matrix specifications and their impact on inference.” International Journal of Health Geographics 16 (1): 1–16. Dwyer-Lindgren, Laura, Michael A Cork, Amber Sligar, Krista M Steuben, Kate F Wilson, Naomi R Provost, Benjamin K Mayala, et al. 2019. “Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017.” Nature 570 (7760): 189–93. Dwyer-Lindgren, Laura, Abraham D Flaxman, Marie Ng, Gillian M Hansen, Christopher JL Murray, and Ali H Mokdad. 2015. “Drinking patterns in US counties from 2002 to 2012.” American Journal of Public Health 105 (6): 1120–27. Flaxman, Seth R, Yu-Xiang Wang, and Alexander J Smola. 2015. “Who supported Obama in 2012? Ecological inference through distribution regression.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 289–98. Follestad, Turid, and Håvard Rue. 2003. “Modelling spatial variation in disease risk using Gaussian Markov random field proxies for Gaussian random fields.” Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. “A note on intrinsic conditional autoregressive models for disconnected graphs.” Spatial and Spatio-Temporal Epidemiology 26: 25–34. Gärtner, Thomas, Peter A Flach, Adam Kowalczyk, and Alexander J Smola. 2002. “Multi-instance kernels.” In ICML, 2:7. 3. ———. 2006. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Gössl, Christoff, Dorothee P Auer, and Ludwig Fahrmeir. 2001. “Bayesian spatiotemporal inference in functional magnetic resonance imaging.” Biometrics 57 (2): 554–62. Haining, Robert P. 2003. Spatial data analysis: theory and practice. Cambridge University Press. Hamelijnck, O, T Damoulas, K Wang, and MA Girolami. 2019. “Multi-resolution multi-task Gaussian processes.” Advances in Neural Information Processing Systems 32. Howes, Adam. 2023a. arealutils: Utility functions for beyond-borders. Johnson, Olatunji, Peter Diggle, and Emanuele Giorgi. 2019. “A spatially discrete approximation to log-Gaussian Cox processes for modelling aggregated disease count data.” Statistics in Medicine 38 (24): 4871–87. Kelsall, Julia, and Jonathan Wakefield. 2002. “Modeling spatial variation in disease risk: a geostatistical approach.” Journal of the American Statistical Association 97 (459): 692–701. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Konstantinoudis, Garyfallos, Dominic Schuhmacher, Håvard Rue, and Ben D Spycher. 2020. “Discrete versus continuous domain models for disease mapping.” Spatial and Spatio-Temporal Epidemiology 32: 100319. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Law, Ho Chung, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. “Variational learning on aggregate outputs with Gaussian processes.” Advances in Neural Information Processing Systems 31. Lee, Duncan. 2011. “A comparison of conditional autoregressive models used in Bayesian disease mapping.” Spatial and Spatio-Temporal Epidemiology 2 (2): 79–89. Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of disease rates in small areas: a new mixed model for spatial dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer. Li, Ye, Patrick Brown, Dionne C Gesink, and Håvard Rue. 2012. “Log Gaussian Cox processes and spatially aggregated disease incidence data.” Statistical Methods in Medical Research 21 (5): 479–507. https://doi.org/10.1177/0962280212446326. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. Mayala, Benjamin K., Samir Bhatt, and Peter Gething. 2020. “Predicting HIV/AIDS at Subnational Levels using DHS Covariates related to HIV.” DHS Spatial Analysis Reports 18. Rockville, Maryland, USA: ICF. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Morris, Mitzi, Katherine Wheeler-Martin, Dan Simpson, Stephen J. Mooney, Andrew Gelman, and Charles DiMaggio. 2019. “Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan.” Spatial and Spatio-Temporal Epidemiology 31: 100301. https://doi.org/https://doi.org/10.1016/j.sste.2019.100301. Nandi, Anita K, Tim CD Lucas, Rohan Arambepola, Peter Gething, and Daniel J Weiss. 2023. “disaggregation: An R Package for Bayesian Spatial Disaggregation Modeling.” Journal of Statistical Software 106: 1–19. Openshaw, S, and P. J. Taylor. 1979. “A million or so correlation coefficients, three experiments on the modifiable areal unit problem.” Statistical Applications in the Spatial Science, 127–44. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Paciorek, Christopher J et al. 2013. “Spatial models for point and areal data using Markov random fields on a fine grid.” Electronic Journal of Statistics 7: 946–72. Paciorek, Christopher J., and Mark J. Schervish. 2006. “Spatial modelling using a new class of nonstationary covariance functions.” Environmetrics 17 (5): 483–506. https://doi.org/https://doi.org/10.1002/env.785. Parks, Robbie M, James E Bennett, Helen Tamura-Wicks, Vasilis Kontis, Ralf Toumi, Goodarz Danaei, and Majid Ezzati. 2020. “Anomalously warm temperatures are associated with increased injury deaths.” Nature Medicine 26 (1): 65–70. Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009. Rashid, T, JE Bennett, D Muller, A Cross, J Pearson-Stuttard, H Daby, D Fecht, B Davies, and M Ezzati. 2023. “Inequalities in mortality from leading cancers in districts of England from 2002 to 2019: population-based high-resolution spatiotemporal analysis of vital registration data.” The Lancet Oncology. http://hdl.handle.net/10044/1/107364. Riebler, Andrea, Sigrunn H Sørbye, Daniel Simpson, and Håvard Rue. 2016. “An intuitive Bayesian spatial model for disease mapping that accounts for scaling.” Statistical Methods in Medical Research 25 (4): 1145–65. ———. 2020. “Comment on R-INLA Discussion Group thread.” Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saracco, James F, J Andrew Royle, David F DeSante, and Beth Gardner. 2010. “Modeling spatial variation in avian survival and residency probabilities.” Ecology 91 (7): 1885–91. Saunders, Daniel. 2023. “The Besag-York-Mollie Model for Spatial Data.” In PyMC Examples, edited by PyMC Team. https://doi.org/10.5281/zenodo.5654871. Schmid, Volker J, Brandon Whitcher, Anwar R Padhani, N Jane Taylor, and Guang-Zhong Yang. 2006. “Bayesian methods for pharmacokinetic models in dynamic contrast-enhanced magnetic resonance imaging.” IEEE Transactions on Medical Imaging 25 (12): 1627–36. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Sørbye, Sigrunn Holbek, and Håvard Rue. 2014. “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics 8: 39–51. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tanaka, Yusuke, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. “Spatially aggregated Gaussian processes with multivariate areal outputs.” In Advances in Neural Information Processing Systems, 3005–15. Teh, Yee Whye, Bryn Elesedy, Bobby He, Michael Hutchinson, Sheheryar Zaidi, Avishkar Bhoopchand, Ulrich Paquet, Nenad Tomasev, Jonathan Read, and Peter J. Diggle. 2022. “Efficient Bayesian inference of Instantaneous Reproduction Numbers at Fine Spatial Scales, with an Application to Mapping and Nowcasting the Covid-19 Epidemic in British Local Authorities.” Journal of the Royal Statistical Society Series A: Statistics in Society 185 (1): S65–85. https://doi.org/10.1111/rssa.12971. Utazi, C Edson, Julia Thorley, VA Alegana, MJ Ferrari, Kristine Nilsen, Saki Takahashi, CJE Metcalf, Justin Lessler, and AJ Tatem. 2019. “A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping.” Statistical Methods in Medical Research 28 (10-11): 3226–41. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Wakefield, J, and S Morris. 1999. “Spatial dependence and errors-in-variables in environmental epidemiology.” Bayesian Statistics 6: 657–84. Wakefield, Jonathan, and Hilary Lyons. 2010. “Spatial Aggregation and the Ecological Fallacy.” In Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 2010:541–58. https://doi.org/10.1201/9781420072884-c30. Weiss, Daniel J, Bonnie Mappin, Ursula Dalrymple, Samir Bhatt, Ewan Cameron, Simon I Hay, and Peter W Gething. 2015. “Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach.” Malaria Journal 14 (1): 1–18. Wilson, Katie, and Jon Wakefield. 2018. “Pointless spatial modeling.” Biostatistics 21 (2): e17–32. https://doi.org/10.1093/biostatistics/kxy041. Yousefi, Fariba, Michael T Smith, and Mauricio Alvarez. 2019. “Multi-task learning for aggregated data using Gaussian processes.” Advances in Neural Information Processing Systems 32. "],["multi-agyw.html", "5 A model for risk group proportions 5.1 Background 5.2 Data 5.3 Model for risk group proportions 5.4 Prevalence and incidence by risk group 5.5 Results 5.6 Discussion", " 5 A model for risk group proportions This chapter describes an application of Bayesian spatio-temporal statistics to small-area estimation of HIV risk group proportions. This work was conducted in collaboration with colleagues from the MRC Centre for Global Infectious Disease Analysis and UNAIDS. I developed the statistical model, building upon an earlier version of the analysis conducted by Dr. Kathryn Risher. The model and results for 13 countries are presented in Howes et al. (2023). Outputs are implemented in a spreadsheet tool (https://hivtools.unaids.org/shipp/) for use in national HIV response planning. The tool is being updated by inclusion of more countries to the analysis, and extension of the methodology, including to additional risk groups. Code for the analysis in this chapter is available from https://github.com/athowes/multi-agyw and supported by the multi.utils R package (Howes 2023b). 5.1 Background Figure 5.1: Risk of acquiring HIV depends on both individual-level risk behaviour and population-level HIV incidence. It is assumed here that with no individual-level risk behaviour, there is no risk of acquiring HIV, independent of the population-level HIV incidence. The risk scale is intended to be illustrative, rather than interpreted quantitatively. In SSA, adolescent girls and young women (AGYW) aged 15-29 are at increased risk of HIV infection. AGYW account for only 28% of the population, but comprise 44% of new infections (UNAIDS 2021a). HIV incidence for AGYW is 2.4 times higher than for similarly aged (15-29) males. The social and biological reasons for this disparity include structural vulnerabilities and power imbalances, age patterns of sexual mixing, a younger age at first sex, and increased susceptibility to HIV infection. On this basis, AGYW have been identified as a priority population for HIV prevention services. Significant investments, including by the Global Fund (The Global Fund 2018) and the DREAMS (Determined, Resilient, Empowered, AIDS-free, Mentored, and Safe) partnership (Saul et al. 2018), have been made to support prevention programming. The Global AIDS Strategy 2021-2026 (UNAIDS 2021b) was adopted by the United Nations (UN) General Assembly in June 2021, and “outlines the strategic priorities and actions to be implemented by global, regional, country and community partners to get on-track to ending AIDS”. It proposed stratifying HIV prevention packages to AGYW based on two factors: local population-level HIV incidence, and individual-level sexual risk behaviour. Risk of acquiring HIV depends importantly on both factors. As such, prioritisation of prevention services is more efficient if both are taken into account. Figure 5.1 illustrates this fact stylistically. The strategy encourages programmes to define targets for the proportion of AGYW to be reached with a range of interventions (Table B.2) based on prioritisation strata which incorporate behavioural risk (Table B.1). Implementation of the strategy by national HIV programmes and stakeholders requires estimates of the population size and HIV incidence in each risk group by location. In this chapter, I used a Bayesian spatio-temporal model (Section 5.3) of behavioural data from household surveys (Section 5.2) to estimate HIV risk group proportions. To then estimate risk group specific HIV prevalence and HIV incidences (Section 5.4), I combined the proportion estimates with population size, HIV prevalence and HIV incidence estimates, as well as risk group specific HIV incidence rate ratios, and HIV prevalence rate ratios. Finally, by ordering district, age, risk group strata by HIV incidence, I estimated an upper bound for the number of new HIV infections that could be averted under different risk prioritisation strategies (Section 5.4.3). 5.2 Data 5.2.1 Behavioural data from household surveys I used household survey data from 13 countries identified by the Global Fund (The Global Fund 2018) as priority countries for implementation of AGYW HIV prevention. These countries were Botswana, Cameroon, Kenya, Lesotho, Malawi, Mozambique, Namibia, South Africa, Eswatini, Tanzania, Uganda, Zambia and Zimbabwe. Surveys conducted in these countries between 1999 and 2018 were included in which both women were interviewed about their sexual behaviour, and sufficient geographic data were available to locate survey clusters to health districts. There were 46 suitable surveys (Figure 5.2), with a total sample size of 274,970 women aged 15-29 years. Of the respondents, 103,063 were aged 15-19 years, 92,173 were aged 20-24 years, and 79,734 were aged 25-29 years. The median number of surveys per country was four, ranging from one in both Botswana and South Africa to six in Uganda. Figure 5.2: Surveys conducted 1999-2018 that were used in the analysis by year, survey type, sample size, and whether the survey included a specific question about transactional sex. That is, whether the respondent had “had sex in return for gifts, cash or anything else in the past 12 months”. Survey type included AIDS Indicator Surveys (AIS), Demographic and Health Surveys (DHS), the Botswana AIDS Impact Survey 2013 (BAIS), and Population-based HIV Impact Assessment (PHIA) surveys. For each survey, respondents were classified into one of four behavioural risk groups according to reported sexual risk behaviour in the past 12 months (Figure 5.3), which I index by \\(k\\). In increasing order of HIV acquisition risk, these risk groups were: \\(k = 1\\): Not sexually active \\(k = 2\\): One cohabiting sexual partner \\(k = 3\\): Non-regular or multiple sexual partner(s), and \\(k = 4\\): Reporting transactional sex. Table 5.1: HIV risk groups and HIV incidence rate ratios relative to AGYW with one cohabiting sexual partner. The incidence rate ratio for women with non-regular or multiple sexual partner(s) was derived from analysis of longitudinal data by Slaymaker et al. (2020). Among female sex workers (FSW), the incidence rate ratio (25.0, 13.0, 9.0, 6.0, 3.0) depended on the level of HIV incidence among the general population (<0.1%, 0.1-0.3%, 0.3-1.0%, 1.0-3.0%, >3.0%), such that higher local HIV incidence in the general population corresponded to a lower incidence rate ratio for FSW. Estimates of HIV incidence rate ratios for FSW were derived by UNAIDS based on patterns of relative HIV prevalence among FSW compared to general population prevalence. Risk group Description Incidence rate ratio None Not sexually active 0.0 Low One cohabiting sexual partner 1.0 (baseline) High Non-regular or multiple partner(s) 1.72 Very High Reporting transactional sex (later adjusted to correspond to FSW) 3.0-25.0 (varied depending on local HIV incidence) The HIV incidence rate ratio \\(\\text{RR}_k\\), used to calculate HIV incidence, was assumed to vary by risk group (Table 5.1). The one cohabiting partner risk group was set as baseline such that \\(\\text{RR}_2 = 1\\). For the \\(k = 4\\) risk group, the HIV incidence ratio ratio was further assumed to vary by local HIV incidence among the general population. Exact survey questions varied slightly across survey types and between survey phases. Questions captured information about whether the respondent had been sexually active in the past twelve months, and if so with how many partners. For their three most recent partners, respondents were also asked about the type of partnership. Possible partnership types included spouse, cohabiting partner, partner not cohabiting with respondent, friend, sex worker, sex work client, and other. The survey questions used are in Appendix B.4. In the case of inconsistent responses, women were categorised according to the highest risk group they fell into, ensuring that the categories were mutually exclusive. Some surveys included a specific question asking if the respondent had received or given money or gifts for sex in the past twelve months. In these surveys, 2.64% of women reported transactional sex. In surveys without such a question, women almost never (0.01%) answered that one of their three most recent partners was a sex work client. This incomparability made it inappropriate to include surveys without a specific transactional sex question when estimating the proportion of the population who engaged in transactional sex. Of the total 46 surveys included in the analysis, 12 had a specific transactional sex question, with a total sample size of 62,853 (28,753 aged 15-19 years, 26,324 aged 20-24 years, and 7,776 aged 25-29 years). The sample size for women aged 25-29 is smaller because there were 6 DHS surveys which excluded women 25-29 from the transactional sex survey question. Table B.3 gives the sample size by age group for every survey included in the analysis. Figure 5.3: Flowchart describing how respondents were classified to HIV risk groups based on their survey responses. 5.2.2 Other data In addition to the household survey behavioural data, I used estimates of population, PLHIV and new HIV infections stratified according to district and age group from HIV estimates published by UNAIDS that were developed using the Naomi model (Eaton et al. 2021). I used the most recent 2022 estimates for all countries, apart from Mozambique where, due to data accuracy concerns, I used the 2021 estimates (in which the Cabo Delgado province is excluded due to disruption by conflict). I used administrative area hierarchy and geographic boundaries corresponding to those used for health service planning by countries (Table B.5). Exceptions were Cameroon and Kenya, where I conducted analyses one level higher at the department and county levels, respectively. 5.3 Model for risk group proportions Owing to the incomparability in estimating the \\(k = 4\\) risk group across surveys, I took a two-stage modelling approach to estimate the four risk group proportions. Denote being in either the third or fourth risk group as \\(k = 3^{+}\\). First, using all the surveys, I used a spatio-temporal multinomial logistic regression model to estimate the proportion of AGYW in the risk groups \\(k \\in \\{1, 2, 3^{+}\\}\\). This model is described in Section 5.3.1. Then, using only those surveys with a specific transactional sex question, I fit a spatial logistic regression model to estimate the proportion of those in the \\(k = 3^{+}\\) risk group that were in the \\(k = 3\\) and \\(k = 4\\) risk groups respectively. This model is described in Section 5.3.2. 5.3.1 Spatio-temporal multinomial logistic regression Let \\(i \\in \\{1, \\ldots, n\\}\\) denote districts partitioning the 13 studied AGYW priority countries \\(c[i] \\in \\{1, \\ldots, 13\\}\\). Consider the years 1999-2018 denoted as \\(t \\in \\{1, \\ldots, T\\}\\), and age groups \\(a \\in \\{\\text{15-19}, \\text{20-24}, \\text{25-29}\\}\\). Let \\(p_{itak} > 0\\) with \\(\\sum_{k = 1}^{3^{+}} p_{itak} = 1\\), be the probabilities of membership of risk group \\(k\\). 5.3.1.1 Multinomial logistic regression A standard multinomial logistic regression model (e.g. Gelman et al. 2013) is specified by \\[\\begin{align} \\mathbf{y}_{ita} &= (y_{ita1}, \\ldots, y_{ita3^{+}})^\\top \\sim \\text{Multinomial}(m_{ita}; \\, p_{ita1}, \\ldots, p_{ita3^{+}}), \\tag{5.1} \\\\ \\log \\left( \\frac{p_{itak}}{p_{ita1}} \\right) &= \\eta_{itak}, \\quad k = 2, 3^{+}, \\tag{5.2} \\end{align}\\] where the number in risk group \\(k\\) is \\(y_{itak}\\), the fixed sample size is \\(m_{ita} = \\sum_{k = 1}^{3^{+}} y_{itak}\\), and \\(k = 1\\) is chosen as the baseline category. This model is not a latent Gaussian model [LGM; Håvard Rue, Martino, and Chopin (2009)] because each observation \\(y_{itak}\\) for \\(k \\in \\{1, 2, 3^{+}\\}\\) depends non-linearly on multiple structured additive predictors \\(\\{\\eta_{itak}, k = 1, 2, 3^{+}\\}\\). The model, defined over 940 districts, 20 years, 3 age groups, and 3 risk groups, is too large for MCMC to be tractable in reasonable time. To recast this model as an LGM, I used the multinomial-Poisson transformation (detailed in Section 5.3.1.2). This modification allowed inference to be performed using the INLA (Håvard Rue, Martino, and Chopin 2009) algorithm via the R-INLA package (Martins et al. 2013). 5.3.1.2 The multinomial-Poisson transformation The multinomial-Poisson transformation (Baker 1994) reframes a given multinomial logistic regression model, like that described in Equations (5.1) and (5.2), as an equivalent Poisson log-linear model. The equivalent model is of the form \\[\\begin{align} y_{itak} &\\sim \\text{Poisson}(\\kappa_{itak}), \\tag{5.3} \\\\ \\log(\\kappa_{itak}) &= \\eta_{itak}. \\tag{5.4} \\end{align}\\] The basis of the transformation is that conditional on their sum Poisson counts are jointly multinomially distributed (McCullagh and Nelder 1989) as follows \\[\\begin{equation} \\mathbf{y}_{ita} \\, | \\, m_{ita} \\sim \\text{Multinomial} \\left( m_{ita}; \\frac{\\kappa_{ita1}}{\\kappa_{ita}}, \\ldots, \\frac{\\kappa_{ita3^{+}}}{\\kappa_{ita}} \\right), \\tag{5.5} \\end{equation}\\] where \\(\\kappa_{ita} = \\sum_{k = 1}^{3^{+}} \\kappa_{itak}\\). The probabilities \\(p_{itak}\\) may then be obtained using the softmax function \\[\\begin{equation} p_{itak} = \\frac{\\exp(\\eta_{itak})}{\\sum_{k = 1}^{3^{+}} \\exp(\\eta_{itak})} = \\frac{\\kappa_{itak}}{\\sum_{k = 1}^{3^{+}} \\kappa_{itak}} = \\frac{\\kappa_{itak}}{\\kappa_{ita}}. \\end{equation}\\] Under the equivalent model, in Equation (5.3) the sample sizes \\(m_{ita}\\) are treated as random rather than fixed such that \\[\\begin{equation} m_{ita} = \\sum_k y_{itak} \\sim \\text{Poisson} \\left( \\sum_k \\kappa_{itak} \\right) = \\text{Poisson} \\left( \\kappa_{ita} \\right). \\tag{5.6} \\end{equation}\\] Using Equations (5.5) for \\(p(\\mathbf{y}_{ita} \\, | \\, m_{ita})\\) and Equation (5.6) for \\(p(m_{ita})\\), the joint distribution is given by \\[\\begin{align} p(\\mathbf{y}_{ita}, m_{ita}) &= \\exp(-\\kappa_{ita}) \\frac{(\\kappa_{ita})^{m_{ita}}}{m_{ita}!} \\times \\frac{m_{ita}!}{\\prod_k y_{itak}!} \\prod_k \\left( \\frac{\\kappa_{itak}}{\\kappa_{ita}} \\right)^{y_{itak}} \\\\ &= \\prod_k \\left( \\frac{\\exp(-\\kappa_{itak}) \\left( \\kappa_{itak} \\right)^{y_{itak}}}{y_{itak}!} \\right) \\\\ &= \\prod_k \\text{Poisson} \\left( y_{itak} \\, | \\, \\kappa_{itak} \\right). \\tag{5.7} \\end{align}\\] As expected, Equation (5.7) corresponds to the product of independent Poisson likelihoods defined in Equation (5.3). This exercise demonstrates that the Poisson log-linear model contains within it a multinomial likelihood, with a Poisson prior on the sample size. For this model to be equivalent to a multinomial logistic regression model, the normalisation constants \\(m_{ita}\\) must be recovered exactly. That is to say, their posterior distributions should be as close as possible to a Dirac delta distribution with value zero everywhere but the known value of the sample size. To ensure that this is the case, observation-specific random effects \\(\\theta_{ita}\\) can be included in the equation for the linear predictor. Multiplying each of \\(\\{\\kappa_{itak}\\}_{k = 1}^{3^+}\\) by \\(\\exp(\\theta_{ita})\\) has no effect on the category probabilities, but does provide the necessary flexibility for \\(\\kappa_{ita}\\) to recover \\(m_{ita}\\) exactly. Although in theory an improper prior distribution \\(\\theta_{ita} \\propto 1\\) should be used, I found that in practice, by keeping \\(\\eta_{ita}\\) otherwise small using appropriate constraints, so that arbitrarily large values of \\(\\theta_{ita}\\) are not required, it is sufficient (and practically preferable for inference) to instead use a vague prior distribution. 5.3.1.3 Model specifications I considered four models (Table 5.2) for \\(\\eta_{ita}\\) in the equivalent Poisson log-linear model of the form \\[\\begin{equation} \\eta_{ita} = \\theta_{ita} + \\beta_k + \\zeta_{c[i]k} + \\alpha_{ac[i]k} + u_{ik} + \\gamma_{tk}. \\end{equation}\\] Observation random effects \\(\\theta_{ita} \\sim \\mathcal{N}(0, 1000^2)\\) with a vague prior distribution were included in all models to ensure the multinomial-Poisson transformation was valid. To capture country-specific proportion estimates for each category, I included category random effects \\(\\beta_k \\sim \\mathcal{N}(0, \\tau_\\beta^{-1})\\) and country-category random effects \\(\\zeta_{ck} \\sim \\mathcal{N}(0, \\tau_\\zeta^{-1})\\). Heterogeneity in risk group proportions by age was allowed by including age-country-category random effects \\(\\alpha_{ack} \\sim \\mathcal{N}(0, \\tau_\\alpha^{-1})\\). Several specifications were considered for the space-category \\(u_{ik}\\) and time-category effects \\(\\gamma_{tk}\\), described in Sections 5.3.1.3.1 and 5.3.1.3.2. Table 5.2: Four multinomial regression models were considered. Observation random effects \\(\\theta_{ita}\\), included in all models, are omitted from this table. Category \\(\\beta_k\\) Country \\(\\zeta_{ck}\\) Age \\(\\alpha_{ack}\\) Spatial \\(u_{ik}\\) Temporal \\(\\gamma_{tk}\\) M1 IID IID IID IID IID M2 IID IID IID Besag IID M3 IID IID IID IID AR1 M4 IID IID IID Besag AR1 Use of the multinomial-Poisson transformation required all random effects to include interaction with category \\(k\\), because any random effects which did not include interaction with category would give no change in category probabilities. The only exception were the observation random effects, which were included as a device to ensure the transformation is valid, rather than to model the data. 5.3.1.3.1 Spatial random effects For the space-category random effects \\(u_{ik}\\) I considered two specifications: Independent and identically distributed (IID) \\(u_{ik} \\sim \\mathcal{N}(0, \\tau_u^{-1})\\), The Besag improper conditional autoregressive (ICAR) model (Besag, York, and Mollié 1991) grouped by category \\[ \\mathbf{u} = (u_{11}, \\ldots, u_{n1}, \\ldots, u_{1{3^{+}}}, \\ldots u_{n3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_u \\mathbf{R}^\\star_u)^{-}). \\] The scaled structure matrix \\(\\mathbf{R}^\\star_u = \\mathbf{R}^\\star_b \\otimes \\mathbf{I}\\) is given by the Kronecker product of the scaled Besag structure matrix \\(\\mathbf{R}^\\star_b\\) and the identity matrix \\(\\mathbf{I}\\), and \\(\\mathbf{A}^{-}\\) denotes the generalised matrix inverse of \\(\\mathbf{A}\\) I followed best practices for the Besag model as described in Chapter 4. To implement the Kronecker product I used the group option in R-INLA [Section 3.5.5; Gómez-Rubio (2020)] setting the random effect to be f(area_idx, model = \"besag\", group = cat_idx, control.group = list(model = \"iid\"), ...). Though the Kronecker product is symmetric, performance is better in R-INLA when the more complicated effect is written as the first variable rather than the grouping variable. In preliminary testing I used the BYM2 model (Simpson et al. 2017) in place of the Besag. I found that the proportion parameter posteriors tended to be highly peaked at the value one. For simplicity and to avoid numerical issues, by using Besag random effects I effectively decided to fix this proportion to one. 5.3.1.3.2 Temporal random effects For the time-category random effects \\(\\gamma_{tk}\\) I considered two specifications: IID \\(\\gamma_{tk} \\sim \\mathcal{N}(0, \\tau_\\gamma^{-1})\\), First order autoregressive (AR1) grouped by category \\[ \\boldsymbol{\\mathbf{\\gamma}} = (\\gamma_{11}, \\ldots, \\gamma_{13^{+}}, \\ldots, \\gamma_{T1}, \\ldots, \\gamma_{T3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_\\gamma \\mathbf{R}^\\star_\\gamma)^{-}). \\] The scaled structure matrix \\(\\mathbf{R}^\\star_\\gamma = \\mathbf{R}^\\star_r \\otimes \\mathbf{I}\\) is given by the Kronecker product of a scaled AR1 structure matrix \\(\\mathbf{R}^\\star_r\\) and the identity matrix \\(\\mathbf{I}\\). The AR1 structure matrix \\(\\mathbf{R}_r\\) is obtained by the precision matrix of the random effects \\(\\mathbf{r} = (r_1, \\ldots, r_T)^\\top\\) specified by \\[\\begin{align} r_1 &\\sim \\left( 0, \\frac{1}{1 - \\rho^2} \\right), \\\\ r_t &= \\rho r_{t - 1} + \\epsilon_t, \\quad t = 2, \\ldots, T, \\end{align}\\] where \\(\\epsilon_t \\sim \\mathcal{N}(0, 1)\\) and \\(|\\rho| < 1\\). As with the structured spatial random effects, I implemented this Kronecker product using the group option via f(year_idx, model = \"ar1\", group = cat_idx, control.group = list(model = \"iid\"), ...). Again, the variable with the more complicated model was written first. 5.3.1.3.3 Note on spatio-temporal interaction random effects I also considered including separable space-time-category random effects \\(\\delta_{itk}\\) in the model, using the specification \\[\\begin{equation} \\boldsymbol{\\mathbf{\\delta}} = (\\delta_{111}, \\ldots, \\delta_{nT3^{+}})^\\top \\sim \\mathcal{N}(\\mathbf{0}, (\\tau_\\delta \\mathbf{R}^\\star_\\delta)^{-}), \\end{equation}\\] where \\(\\mathbf{R}^\\star_\\delta\\) is a Kronecker product of the relevant space, time and category structure matrices. These specifications were: IID spatial and IID temporal (Type I) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{I} \\otimes \\mathbf{I} \\otimes \\mathbf{I}\\), Besag spatial and IID temporal (Type II) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{R}^\\star_b \\otimes \\mathbf{I} \\otimes \\mathbf{I}\\), IID spatial and AR1 temporal (Type III) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{I} \\otimes \\mathbf{R}^\\star_a \\otimes \\mathbf{I}\\), Besag spatial and AR1 (Type IV) \\(\\mathbf{R}^\\star_\\delta = \\mathbf{R}^\\star_b \\otimes \\mathbf{R}^\\star_a \\otimes \\mathbf{I}\\), where the first, second and third elements of the Kronecker product represent space, time and category (always IID) structure matrices respectively. The interaction type in brackets (e.g. Type I) is given according to the Knorr-Held (2000) framework. Though three-way Kronecker products are not directly supported in R-INLA, I implemented each specification using a combination of the group and replicate options [Section 6.5.2; Gómez-Rubio (2020)]. For example, for the Type IV effects the random effects were specified by f(area_idx_copy, model = \"besag\", group = year_idx, replicate = cat_idx, control.group = list(model = \"ar1\")). I was able to run these models for single countries, keeping only years at which surveys occurred in those countries. However, when fitting all countries jointly I found inclusion of the space-time-category random effects to be intractable, and as such decided not to include them in the model. 5.3.1.3.4 Prior distributions All random effect precision parameters \\[\\begin{equation} \\tau \\in \\{\\tau_\\beta, \\tau_\\zeta, \\tau_\\alpha, \\tau_u, \\tau_\\gamma, \\tau_\\delta\\} \\end{equation}\\] were given independent penalised complexity (PC) prior distributions (Simpson et al. 2017) with base model \\(\\sigma = 0\\) given by \\[\\begin{equation} p(\\tau) = 0.5 \\nu \\tau^{-3/2} \\exp \\left( - \\nu \\tau^{-1/2} \\right), \\end{equation}\\] where \\(\\nu = - \\ln(0.01) / 2.5\\) such that \\(\\mathbb{P}(\\sigma > 2.5) = 0.01\\). For the lag-one correlation parameter \\(\\rho\\), I used the PC prior distribution, as derived by Sørbye and Rue (2017), with base model \\(\\rho = 1\\) and condition \\(\\mathbb{P}(\\rho > 0 = 0.75)\\). I chose the base model \\(\\rho = 1\\) corresponding to no change in behaviour over time, rather than the alternative \\(\\rho = 0\\) corresponding to no correlation in behaviour over time, as I judged the former to be more plausible a priori. 5.3.1.4 Identifiability constraints To facilitate interpretability of the posterior inferences, I applied sum-to-zero constraints (Table 5.3) such that none of the category interaction random effects altered overall category probabilities. In testing of the space-time-category random effects, I applied analogous sum-to-zero constraints to maintain roles of the space-category and time-category random effects. In some cases it was not possible to implement all three sets of constraints for the three-way interactions in R-INLA. Table 5.3: Applying sum-to-zero constraints to interaction effects ensured that the main effect was not interfered with. Random effects Constraints Category \\(\\sum_k \\beta_k = 0\\) Country \\(\\sum_c \\zeta_{ck} = 0, \\, \\forall \\, k\\) Age-country \\(\\sum_a \\alpha_{ack} = 0, \\, \\forall \\, c, k\\) Spatial \\(\\sum_i u_{ik} = 0, \\, \\forall \\, k\\) Temporal \\(\\sum_t \\gamma_{tk} = 0, \\, \\forall \\, k\\) Spatio-temporal \\(\\sum_i \\delta_{itk} = 0, \\, \\forall \\, t, k; \\sum_t \\delta_{itk} = 0, \\, \\forall \\, i, k; \\sum_k \\delta_{itk} = 0, \\, \\forall \\, i, t\\) 5.3.1.5 Survey weighted likelihood I accounted for the survey design using a weighted pseudo-likelihood where the observed counts \\(y\\) are replaced by effective counts \\(y^\\star\\), as described in Section 3.5. These counts may not be integers, and as such the Poisson likelihood given in Equation (5.3) is not appropriate. Instead, I used a generalised Poisson pseudo-likelihood \\(y^\\star \\sim \\text{xPoisson}(\\kappa)\\) given by \\[\\begin{equation} p(y^\\star) = \\frac{\\kappa^{y^\\star}}{\\left \\lfloor{y^\\star!}\\right \\rfloor } \\exp \\left(- \\kappa \\right), \\end{equation}\\] to extend the Poisson distribution to non-integer weighted counts. This working likelihood is implemented by family = \"xPoisson\" in R-INLA. 5.3.1.6 Model selection I selected the model including Besag spatial random effects and IID temporal random effects based on the conditional predictive ordinate (CPO) criterion (Pettit 1990). For comparison, I also computed the deviance information criterion (DIC) (D. J. Spiegelhalter et al. 2002) and widely applicable information criterion (WAIC) (Watanabe 2013). Each of these criterion can be calculated in R-INLA without requiring model refitting. The results are presented in Table 5.4 and Figure 5.4. Figure 5.4: For the multinomial logistic regression model, under the conditional predictive ordinate (CPO) criterion, including Besag spatial random effects rather than IID spatial random effects improved model performance. On the other hand, under the deviance information criterion (DIC) and widely applicable information criterion (WAIC), where smaller values are preferred, the opposite was true. The relatively poor DIC and WAIC performance of Besag random effects was due to outlying values of these criteria for three of four surveys in Tanzania, and as such may be erroneous. Though IID temporal random effects are preferred by all criteria, AR1 temporal random effects performed very similarly, likely as there is a limited amount of temporal variation in the data to describe. Table 5.4: Conditional predictive ordinate (CPO), deviance information criterion (DIC), and widely applicable information criterion (WAIC) values for the multinomial logistic regression model specifications with corresponding standard errors. M1 M2 M3 M4 CPO 5573 (36) 5772 (36) 5574 (36) 5771 (36) DIC 100780 (300) 101588 (317) 100781 (300) 101589 (317) WAIC 103763 (358) 105008 (383) 103763 (358) 105009 (383) 5.3.2 Spatial logistic regression To estimate the proportion of those in the \\(k = 3^{+}\\) risk group that were in the \\(k = 3\\) and \\(k = 4\\) risk groups respectively, I fit logistic regression models of the form \\[\\begin{align} y_{ia4} &\\sim \\text{Binomial} \\left( y_{ia3} + y_{ia4}, q_{ia} \\right), \\tag{5.8} \\\\ q_{ia} &= \\text{logit}^{-1} \\left( \\eta_{ia} \\right), \\end{align}\\] where \\[\\begin{equation} q_{ia} = \\frac{p_{ia4}}{p_{ia3} + p_{ia4}} = \\frac{p_{ia4}}{p_{ia{3^+}}}. \\end{equation}\\] This two-step approach allowed all surveys to be included in the multinomial regression model, but only those surveys with a specific transactional sex question to be included in the logistic regression model. As all such surveys occurred in the years 2013-2018 (Figure 5.2) I assumed no dependence on time, hence omission of the index \\(t\\). Model specification for the linear predictor \\(\\eta_{ia}\\) is discussed in Section 5.3.2.1 to follow. 5.3.2.1 Model specifications Table 5.5: Six logistic regression models were considered. The covariate cfswever denotes the proportion of men who have ever paid for sex and cfswrecent denotes the proportion of men who have paid for sex in the past 12 months. Intercept \\(\\beta_0\\) Country \\(\\zeta_{c}\\) Age \\(\\alpha_{ac}\\) Spatial \\(u_{i}\\) Covariates L1 Constant IID IID IID None L2 Constant IID IID Besag None L3 Constant IID IID IID cfswever L4 Constant IID IID Besag cfswever L5 Constant IID IID IID cfswrecent L6 Constant IID IID Besag cfswrecent I considered six logistic regression models (Table 5.5). Each included a constant intercept \\(\\beta_0 \\sim \\mathcal{N}(-2, 1^2)\\), country random effects \\(\\zeta_{c} \\sim \\mathcal{N}(0, \\tau_\\zeta^{-1})\\), and age-country random effects \\(\\alpha_{ac} \\sim \\mathcal{N}(0, \\tau_\\alpha^{-1})\\). The Gaussian prior distribution on \\(\\beta_0\\) placed 95% prior probability on the range 2-50% for the percentage of those with non-regular or multiple partners who report transactional sex. I considered two specifications (IID, Besag) for the spatial random effects \\(u_i\\). To aid estimation with sparse data, I also considered national-level covariates for the proportion of men who have paid for sex ever or in the last twelve months (Hodgins et al. 2022). For both random effect precision parameters \\(\\tau \\in \\{\\tau_\\alpha, \\tau_\\zeta\\}\\) I used the PC prior distribution with base model \\(\\sigma = 0\\) and \\(\\mathbb{P}(\\sigma > 2.5 = 0.01)\\). For both regression parameters \\(\\beta \\in \\{\\beta_\\texttt{cfswever}, \\beta_\\texttt{cfswrecent}\\}\\) I used the prior distribution \\(\\beta \\sim \\mathcal{N}(0, 2.5^2)\\). 5.3.2.2 Survey weighted likelihood As with the multinomial regression model, I used survey weighted counts \\(y^\\star\\) and sample sizes \\(m^\\star\\). I used a generalised binomial pseudo-likelihood \\(y^\\star \\sim \\text{xBinomial}(m^\\star, q)\\) given by \\[\\begin{equation} p(y^\\star \\, | \\, m^\\star, q) = \\binom{\\lfloor m^\\star \\rfloor}{\\lfloor y^\\star \\rfloor} q^{y^\\star} (1 - q)^{m^\\star - y^\\star} \\end{equation}\\] to extend the binomial distribution to non-integer weighted counts and sample sizes. This working likelihood is implemented by family = \"xBinomial\" in R-INLA. 5.3.2.3 Model selection I selected the model including Besag spatial effects and cfswrecent covariates according to the CPO criterion. All results, including DIC and WAIC, are presented in Table 5.6 and Figure 5.5. Inclusion of Besag spatial random effects, rather than IID, consistently improved performance. Benefits from inclusion of covariates were more marginal. As some countries had no suitable surveys, I nonetheless preferred to include covariate information so that estimates in these countries would be based on some country-specific data. Figure 5.5: For the logistic regression model, the CPO, DIC, and WAIC each agreed that the model containing Besag spatial random effects and the cfswrecent covariates was best. Inclusion of Besag spatial random effects consistently improved each criterion, whereas improvements from inclusion of any covariates were marginal. Table 5.6: CPO, DIC, and WAIC values for the logistic regression model specifications with corresponding standard errors. L1 L2 L3 L4 L5 L6 CPO 950 (15) 969 (15) 951 (15) 970 (15) 950 (15) 970 (15) DIC 4662 (110) 4605 (111) 4662 (110) 4605 (111) 4662 (110) 4605 (111) WAIC 4692 (115) 4624 (115) 4692 (115) 4624 (115) 4692 (115) 4624 (115) 5.3.3 Female sex worker population size adjustment Domain experts do not consider having had sex “in return for gifts, cash or anything else in the past 12 months” sufficient to constitute sex work. For this reason, I adjusted the estimates obtained based on the transactional sex survey question to match alternatively obtained age-country FSW population size estimates. Taking this approach retained subnational variation informed by the transactional sex survey question. I used the estimates of adult (15-49) FSW population size by country from a Bayesian meta-analysis of key population specific data sources (Stevens et al. 2023). To disaggregate these estimates by age, I took the following steps. First, I calculated the total sexually debuted population in each age group, by country. To describe the distribution of age at first sex, I used skew logistic distributions (Nguyen and Eaton 2022) with cumulative distribution function given by \\[\\begin{equation} F(x) = \\left(1 + \\exp(\\kappa_c (\\mu_c - x)) \\right)^{- \\gamma_c}, \\end{equation}\\] where \\(\\kappa_c, \\mu_c, \\gamma_c > 0\\) are country-specific shape, shape and skewness parameters respectively. Next, I used the assumed \\(\\text{Gamma}(\\alpha = 10.4, \\beta = 0.36)\\) FSW age distribution in South Africa from the Thembisa model (L. Johnson and Dorrington 2020) to calculate the implied ratio between the number of FSW and the sexually debuted population in each age group. I assumed the South African ratios were applicable to every country, allowing calculation of the number of FSW by age group in all 13 countries. The resulting age trends obtained (Figure 5.6) reflect country-level variation in demographics and age-at-first-sex. Altering the FSW population size estimates requires that other risk group population size estimates are also altered such that the corresponding risk group proportion estimates sum to one. Here, estimates of the non-regular or multiple sexual partner(s) population size were altered to facilitate changing of the FSW population size. Figure 5.6: The disaggregation procedure I used produces an age distribution for FSW peaking in the 20-24 and 25-29 age groups, and declining for older age groups. 5.4 Prevalence and incidence by risk group Using the most recent risk group proportion estimates, I calculated the following indicators, stratified by district, age group and risk group: HIV prevalence \\(\\rho_{iak}\\), the number of people living with HIV (PLHIV) \\(H_{iak}\\), HIV incidence \\(\\lambda_{iak}\\), and the number of new HIV infections \\(I_{iak}\\). To do so, I disaggregated district, age group specific Naomi estimates by risk group. 5.4.1 Disaggregation of Naomi prevalence estimates To disaggregate HIV prevalence, I began by estimating HIV prevalence log odds ratios \\(\\log(\\text{OR}_k)\\) relative to the general population. To do so, I began by calculating age, country, and risk group specific (as well as general population) HIV prevalence \\(\\rho_{cak}\\) using bio-marker survey data from all 46 surveys included in the risk group model (Section 5.2.1). I then fit a logistic regression model, with indicator functions for each risk group, and an indicator for being in the general population. The fitted regression coefficients in this model \\(\\beta_k\\) correspond to log odds \\(\\log \\rho_k - \\log(1 - \\rho_k)\\). The required log odds ratios may then be easily obtained by taking the difference in odds ratios. To allow the log odds ratio for the highest risk group to vary based on general population prevalence I fit a linear regression of the FSW log odds against the general population log odds. I ensured that log odds ratios for the FSW risk group were at least as large as those for the multiple or non-regular partner(s) risk group. Given the fitted log odds ratios, I disaggregated Naomi estimates of PLHIV \\(H_{ia}\\) on the logit scale using numerical optimisation. To do so, I found the values of \\(\\theta_{ia}\\) which minimised the equation \\[\\begin{equation} f(\\theta_{ia}) = \\sum_{k = 1}^4 \\left( \\text{logistic}(\\theta_{ia} + \\log(\\text{OR}_k)) \\cdot N_{iak} \\right) - H_{ia}, \\end{equation}\\] where \\(\\text{logistic}(x) = \\exp(x) / (1 + \\exp(x))\\) such that \\(\\text{logistic}(\\hat \\theta_{ia} + \\log(\\text{OR}_k)) = \\rho_{iak}\\). These values were given by \\[\\begin{equation} \\hat \\theta_{ia} = \\arg\\min_{\\theta_{ia} \\in [-10, 10]} f(\\theta_{ia})^2. \\end{equation}\\] The number of PLHIV were obtained by \\(H_{iak} = \\rho_{iak} N_{iak}\\), where \\(N_{iak}\\) is the risk group population size. 5.4.2 Disaggregation of Naomi incidence estimates I used linear disaggregation to calculate the number of new HIV infections by risk group \\[\\begin{align} I_{ia} &= \\sum_k I_{iak} = \\sum_k \\lambda_{iak} (1 - \\rho_{iak}) N_{iak} \\\\ &= 0 + \\lambda_{ia2} (1 - \\rho_{ia2}) N_{ia2} + \\lambda_{ia3} (1 - \\rho_{ia3}) {ia3} + \\lambda_{ia4} (1 - \\rho_{ia4}) N_{ia4} \\\\ &= \\lambda_{ia2} \\left((1 - \\rho_{ia2}) N_{ia2} + \\text{RR}_{3} (1 - \\rho_{ia3}) N_{ia3} + \\text{RR}_4(\\lambda_{ia}) (1 - \\rho_{i4}) N_{ia4} \\right), \\end{align}\\] where \\(\\text{RR}_{2}\\), \\(\\text{RR}_{3}\\) and \\(\\text{RR}_{4}(\\cdot)\\) are the HIV risk ratios given in Table 5.1, and \\((1 - \\rho_{iak}) N_{iak}\\) are the susceptible population sizes in each risk group. The risk ratio for FSW was defined as a function of district-level incidence in the general population \\(\\lambda_{ia}\\). Risk group specific HIV incidence estimates were then given by \\[\\begin{align} \\lambda_{ia1} &= 0, \\\\ \\lambda_{ia2} &= \\frac{I_{ia}}{(1 - \\rho_{ia2}) N_{ia2} + \\text{RR}_{3} (1 - \\rho_{ia3}) N_{ia3} + \\text{RR}_4(\\lambda_{ia}) (1 - \\rho_{ia4}) N_{ia4}}, \\\\ \\lambda_{ia3} &= \\text{RR}_{3} \\lambda_{ia2}, \\\\ \\lambda_{ia4} &= \\text{RR}_4(\\lambda_{ia}) \\lambda_{ia2}. \\end{align}\\] These equations were evaluated using Naomi model estimates of the number of new HIV infections \\(I_{ia} = \\lambda_{ia} N_{ia}\\). The number of new HIV infections were \\(I_{iak} = \\lambda_{iak} N_{iak}\\). 5.4.3 Expected new infections reached To quantify the number of new infections that could be reached prioritising according to each possible stratification of the population, I took the following approach, which I illustrate for stratification by age. First, I aggregated the number of new HIV infections and HIV incidence (calculated above in Section 5.4.2) such that \\[\\begin{align} I_a &= \\sum_{ik} I_{iak}, \\\\ \\lambda_a &= I_a / \\sum_{ik} (1 - \\rho_{iak}) N_{iak}. \\end{align}\\] I then considered prioritisation individuals by age group \\(a\\) according to the highest HIV incidence \\(\\lambda_a\\). By cumulatively summing the expected infections, for each fraction of the total population reached (0-100%) I calculated the fraction of total expected new infections that would be reached. In this instance, as there are three age groups, the resulting function was piecewise linear with three segments. This analysis was repeated for all \\(2^3 = 8\\) possible combinations of stratification by location, age, and risk group. 5.5 Results 5.5.1 Model for risk group proportions 5.5.1.1 Estimates Figure 5.7: The posterior mean of the AGYW risk group proportions over space in 2018. Estimates are stratified by risk group (columns) and five-year age group (rows). Countries in grey were not included in the analysis. A limitation of this figure is that using a common colour scale, though desirable for other reasons, makes it challenging to see spatial variation in the FSW risk group. Figure 5.8: National (in white) and subnational (in color) posterior means of the risk group proportions. Estimates are stratified by risk group (columns) and five-year age group (rows). Though the information presented is similar to that of Figure 5.7, this figure presents a clear view of within- and between-country variation in risk group proportions. Figure B.1 and Figure 5.8 show posterior mean estimates for the proportion in each risk group for the final model in 2018, the most recent year included in our analysis. I focused on the most recent estimates because they are the most relevant to inform ongoing HIV policy. In subsequent results, all estimates refer to 2018, unless otherwise indicated. The median national FSW proportion was 1.1% (95% CI 0.4–1.9) for the 15-19 age group, 1.6% (95% CI 0.6–2.8) for the 20-24 age group and 1.9% (95% CI 0.5–3.5) for the 25-29 age group, in line with the results displayed in Figure 5.6. In the 20-24 and 25-29 year age groups, the majority of women were either cohabiting or had non-regular or multiple partner(s). Countries in eastern and central Africa (Cameroon, Kenya, Malawi, Mozambique, Tanzania, Uganda, Zambia and Zimbabwe) had a higher proportion of women in these age groups cohabiting (63.1% [95% CI 35–78.7%] compared with 21.3% [95% CI 10.1–48.8%] with non-regular partner[s]). In contrast, countries in southern Africa (Botswana, Eswatini, Lesotho, Namibia and South Africa) had a higher proportion with non-regular or multiple partner(s) (58.9% [95% CI 43.2–70.5%], compared with 23.4% [95% CI 9.7–39.1%] cohabiting). This finding is the most notable feature of between-country variation shown in Figure 5.8. Figure 5.7 shows the geographic delineation to pass along the border of Mozambique, through the interior of Zimbabwe and along the border of Zambia. The bimodality of the 20-24 and 25-29 year age groups is shown in Figure B.2. In the median district, 57.9% of adolescent girls 15-19 were not sexually active (95% credible interval [CI] at the district-level 27.7–79.7). The country of Mozambique was an exception, where the majority of adolescent girls 15-19 (64.23%) were sexually active in the past year and close to a third (34.17%) were cohabiting with a partner. 5.5.1.2 Coverage assessment Figure 5.9: Probability integral transform (PIT) histograms (top row) and empirical cumulative distribution function (ECDF) difference plots (bottom row) for the final selected model. To assess the calibration of the fitted model, I calculated the quantile \\(q\\) of each observation within the posterior predictive distribution. For calibrated models, these quantiles, known as probability integral transform (PIT) values (Dawid 1984; Nikos I. Bosse et al. 2022), should follow a uniform distribution \\(q \\sim \\mathcal{U}[0, 1]\\). To generate samples from the posterior predictive distribution, I applied the multinomial likelihood to samples from the latent field, setting the sample size to be the floor of the Kish effective sample size. Using the PIT values, it is possible to calculate the empirical coverage of all \\((1 - \\alpha)100\\)% equal-tailed posterior predictive credible intervals. These empirical coverages can be compared to the nominal coverage \\((1 - \\alpha)\\) for each value of \\(\\alpha \\in [0, 1]\\) to give empirical cumulative distribution function (ECDF) difference values. This approach has the advantage of considering all possible confidence values at once. To test for uniformity, I used the binomial distribution based simultaneous confidence bands for ECDF difference values developed by Säilynoja, Bürkner, and Vehtari (2022). I found the only significant deviation from uniformity occurred in the right-hand tail of the one cohabiting partner risk group. That is to say, the proportion of the PIT values which were greater than 0.95 was significantly more than would be expected under a calibrated model. 5.5.1.3 Variance decomposition Age group was the most important factor explaining variation in risk group proportions, accounting for 65.9% (95% CI 54.1–74.9%) of total variation. The primary change in risk group proportions by age group occurs between the 15-19 age group and 20-29 age group (Figure 5.7). The next most important factor was location. Country-level differences explained 20.9% (95% CI 11.9–34.5%) of variation, while district-level variation within countries explained 11.3% (95% CI 8.2–15.3%). Temporal changes only explained 0.9% (95% CI 0.6–1.4%) of variation, indicating very little change in risk group proportions over time. I found similar variance decomposition results fitting each country individually (Figure B.1) and using other model specifications. 5.5.2 Prevalence and incidence by risk group Figure 5.10: Percentage of new infections reached across all 13 countries, taking a variety of risk stratification approaches, against the percentage of at risk population required to be reached. For any given fraction of AGYW prioritised, substantially more new infections were reached by strategies that included behavioural risk stratification. Reaching half of all expected new infections required reaching 19.4% of the population when stratifying by subnational area and age, but only 10.6% when behavioural stratification was included (Figure 5.10). The majority of this benefit came from reaching FSW, who were 1.3% of the population but 10.6% of all new infections. Considering each country separately, on average, reaching half of new infections in each country required reaching 14.6% (range 8.7-21.8%) of the population when stratifying by area and age, reducing to 5.1% (range 2.1-13.2%) when behaviour was included. The relative importance of stratifying by age, location and behaviour varied between countries, analogous to the varying contribution of each to the total variance (Section 5.5.1.3). 5.6 Discussion In this chapter, I estimated the proportion of AGYW who fall into different risk groups at a district level in 13 sub-Saharan African countries. These estimates support consideration of differentiated prevention programming according to geographic locations and risk behaviour, as outlined in the Global AIDS Strategy. Systematic differences in risk by age groups, and variation within and between countries, explained the large majority of variation in risk group proportions. Changes over time were negligible in the overall variation in risk group proportions. The proportion of 15-19 year olds who are sexually active, and among women aged 20-29 years, norms around cohabitation especially varied across districts and countries. This variation underscores the need for these granular data to implement HIV prevention options aligned to local norms and risk behaviours. I considered four risk groups based on sexual behaviour, the most proximal determinant of risk. Other factors, such as condom usage or type of sexual act, may account for additional heterogeneity in risk from sexual behaviour. However, I did not include these factors in view of measurement difficulties, concerns about consistency across contexts, and the operational benefits of describing risk parsimoniously. Sexual behaviour confers risk only when AGYW reside in geographic locations where there is unsuppressed viral load among their potential partners. I did not include more distal determinants, such as school attendance, orphanhood, or gender empowerment, as I expect their effects on risk to largely be mediated by more proximal determinants. However, to effectively implement programming, it is crucial to understand these factors, as well as the broader structural barriers and limits to personal agency faced by AGYW. Importantly, programs must ensure that intervention prioritisation occurs without stigmatising or blaming AGYW. By considering a range of possible risk stratification strategies, I showed that successful implementation of a risk-stratified approach would allow substantially more of those at risk for infections to be identified before infection occurs. A considerable proportion of estimated new infections were among FSW, supporting the case for HIV programming efforts focused on key population groups (Baral et al. 2012). There is substantial variation in the importance of prioritisation by age, location and behaviour within each country. This highlights the importance of understanding and tailoring HIV prevention efforts to country-specific contexts. By standardising the analysis across all 13 countries, I showed the additional efficiency benefits of resource allocation between countries. I found a geographic delineation in the proportion of women cohabiting between southern and eastern Africa, calling attention to a divide attributable to many cultural, social, and economic factors. The delineation does not represent a boundary between predominately Christian and Muslim populations, which is further north. I also note that the high numbers of adolescent girls aged 15-19 cohabiting in Mozambique is markedly different from the other countries (UNICEF 2019). Brugh et al. (2021) previously geographically mapped AGYW HIV risk groups using biomarker and behavioural data from the most recent surveys in Eswatini, Haiti and Mozambique to define and subsequently map risk groups with a range of machine learning techniques. My work builds on Brugh et al. (2021) by including more countries, integrating a greater number of surveys, and connecting risk group proportions with HIV epidemic indicators to help inform programming. My modelled estimates of risk group proportions improve upon direct survey results for three reasons. First, by taking a modular modelling approach, I integrated all relevant survey information from multiple years, allowing estimation of the FSW proportion for surveys without a specific transactional sex question. Second, whereas direct estimates exhibit large sampling variability at a district level, I alleviated this issue using spatio-temporal smoothing (Figure 5.11). Third, I provided estimates in all district-years, including those not directly sampled by surveys, allowing estimates to be consistently fed into further analysis and planning pipelines such as my analysis of risk group specific prevalence and incidence. Figure 5.11: The modelled estimates display more plausible spatial smoothness than the direct estimates. In addition, missing values in the direct estimates are appropriately infilled by the model. The final surveys included in the risk model model were conducted in 2018. The analysis may be updated with more surveys as they become available. I do not anticipate that the risk group proportions will change substantially, as I found that they did not change significantly over time. My analysis focused on females aged 15-29 years, and could be extended to consider optimisation of prevention more broadly, accounting for new infections among adults 15-49 which occur in women 30-49 and men 15-49. Estimating sexual risk behaviour in adults 15-49 would be a crucial step toward greater understanding of the dynamics of the HIV epidemic in sub-Saharan Africa, and would allow incidence models to include stratification of individuals by sexual risk. 5.6.1 Limitations This analysis was subject to challenges shared by most approaches to monitoring sexual behaviour in the general population (Cleland et al. 2004). In particular, under-reporting of higher risk sexual behaviours among AGYW could affect the validity of my risk group proportion estimates. Due to social stigma or disapproval, respondents may be reluctant to report non-marital partners (Nnko et al. 2004; Helleringer et al. 2011) or may bias their reporting of sexual debut (Zaba et al. 2004; Wringe et al. 2009; Nguyen and Eaton 2022). For guidance of resource allocation, differing rates of under-reporting by country, district, year or age group are particularly concerning to the applicability of my results; and, while it may be reasonable to assume a constant rate over space-time, the same cannot be said for age, where aspects of under-reporting have been shown to decline as respondents age (Glynn et al. 2011), suggesting that the elevated risks I found faced by younger women are likely a conservative estimate. If present, these reporting biases will also have distorted the estimates of infection risk ratios and prevalence ratios I used in my analysis, likely over-attributing risk to higher risk groups. I have the least confidence in my estimates for the FSW risk group. As well as having the smallest sample sizes, my transactional sex estimates do not overcome the difficulties of sampling hard to reach groups. I inherent any limitations of the national FSW estimates (Stevens et al. 2023) which I adjust my estimates of transactional sex to match. Furthermore, I do not consider seasonal migration patterns, which may particularly affect FSW population size. More generally, I did not consider covariates potentially predictive of risk group proportions (such as sociodemographic characteristics, education, local economic activity, cultural and religious norms and attitudes), which are typically difficult to measure spatially. Identifying measurable correlates of risk, or particular settings in which time-concentrated HIV risk occurs, is an important area for further research to improve risk prioritisation and precision HIV programme delivery. The efficiency of each stratified prevention strategy depends on the ability of programmes to identify and effectively reach those in each strata. My analysis of new infections potentially averted assumed a “best-case” scenario where AGYW of every strata can be reached perfectly, and should therefore be interpreted as illustrating the potentially obtainable benefits rather than benefits which would be obtained from any specific intervention strategy. In practice, stratified prevention strategies are likely to be substantially less efficient than this best-case scenario. Factors I did not consider include the greater administrative burden of more complex strategies, variation in difficulty or feasibility of reaching individuals in each strata, variation in the range or effectiveness of interventions by strata, and changes in strata membership that may occur during the course of a year. Identifying and reaching behavioural strata may be particularly challenging. Empirical evaluations of behavioural risk screening tools have found only moderate discriminatory ability (Jia et al. 2022), and risk behaviour may change rapidly among young populations, increasing the challenge to effectively deliver appropriately timed prevention packages. This consideration may motivate selecting risk groups based on easily observable attributes, such as attendance of a particular service or facility, rather than sexual behaviour. In conducting this work, there was insufficient engagement with country experts or civil society organisations. As a result, in early use of the risk group tool the FSW population size estimates were met with some disagreement in Malawi. In that instance, the cause of the disagreement was external model inputs used. In future, estimates should be generated and reviewed by country teams. 5.6.2 Conclusion I estimated HIV risk group proportions, HIV prevalences and HIV incidences for AGYW aged 15-19, 20-24 and 25-29 years at a district-level in 13 priority countries. Using these estimates, I analysed the number of infections that could be reached by prioritisation based upon location, age and behaviour. Though subject to limitations, these estimates provide data that national HIV programmes can use to set targets and implement differentiated HIV prevention strategies as outlined in the Global AIDS Strategy. Successfully implementing this approach would result in more efficiently reaching a greater number of those at risk of infection. Among AGYW, there was systematic variation in sexual behaviour by age and location, but not over time. Age group variation was primarily attributable to age of sexual debut (ages 15-24). Spatial variation was particularly present between those who reported one cohabiting partner versus non-regular or multiple partners. Risk group proportions did not change substantially over time, indicating that norms relating to sexual behaviour are relatively static. These findings underscore the importance of providing effective HIV prevention options tailored to the needs of particular age groups, as well as local norms around sexual partnerships. References Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Baral, Stefan, Chris Beyrer, Kathryn Muessig, Tonia Poteat, Andrea L Wirtz, Michele R Decker, Susan G Sherman, and Deanna Kerrigan. 2012. “Burden of HIV among female sex workers in low-income and middle-income countries: a systematic review and meta-analysis.” The Lancet Infectious Diseases 12 (7): 538–49. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Bosse, Nikos I., Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. 2022. “Evaluating Forecasts with scoringutils in R.” arXiv. https://arxiv.org/abs/2205.07090. Brugh, Kristen N, Quinn Lewis, Cameron Haddad, Jon Kumaresan, Timothy Essam, and Michelle S Li. 2021. “Characterizing and mapping the spatial variability of HIV risk among adolescent girls and young women: A cross-county analysis of population-based surveys in Eswatini, Haiti, and Mozambique.” PLOS One 16 (12): e0261520. Cleland, John, J Ties Boerma, Michel Caraël, and Sharon S Weir. 2004. “Monitoring sexual behaviour in general populations: a synthesis of lessons of the past decade.” Sexually Transmitted Infections 80 (suppl 2): ii1–7. Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Glynn, Judith R, Ndoliwe Kayuni, Emmanuel Banda, Fiona Parrott, Sian Floyd, Monica Francis-Chizororo, Misheck Nkhata, et al. 2011. “Assessing the validity of sexual behaviour reports in a whole population survey in rural Malawi.” PLOS One 6 (7): e22840. Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Helleringer, Stéphane, Hans-Peter Kohler, Linda Kalilani-Phiri, James Mkandawire, and Benjamin Armbruster. 2011. “The reliability of sexual partnership histories: implications for the measurement of partnership concurrency during surveys.” AIDS (London, England) 25 (4): 503. Hodgins, Caroline, James Stannah, Salome Kuchukhidze, Lycias Zembe, Jeffrey W Eaton, Marie-Claude Boily, and Mathieu Maheu-Giroux. 2022. “Population sizes, HIV prevalence, and HIV prevention among men who paid for sex in sub-Saharan Africa (2000–2020): A meta-analysis of 87 population-based surveys.” PLOS Medicine 19 (1): e1003861. ———. 2023b. multi.utils: Utility functions for multi-agyw. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Johnson, L, and RE Dorrington. 2020. “Thembisa version 4.3: A model for evaluating the impact of HIV/AIDS in South Africa.” View Article. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. McCullagh, Peter, and John A Nelder. 1989. Generalized linear models. Routledge. Nguyen, Van Kính, and Jeffrey W. Eaton. 2022. “Trends and country-level variation in age at first sex in sub-Saharan Africa among birth cohorts entering adulthood between 1985 and 2020.” BMC Public Health 22 (1): 1120. https://doi.org/10.1186/s12889-022-13451-y. Nnko, Soori, J Ties Boerma, Mark Urassa, Gabriel Mwaluko, and Basia Zaba. 2004. “Secretive females or swaggering males?: An assessment of the quality of sexual partnership reporting in rural Tanzania.” Social Science & Medicine 59 (2): 299–310. Pettit, LI. 1990. “The conditional predictive ordinate for the normal distribution.” Journal of the Royal Statistical Society: Series B (Methodological) 52 (1): 175–84. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saul, Janet, Gretchen Bachman, Shannon Allen, Nora F Toiv, Caroline Cooney, and Ta’Adhmeeka Beamon. 2018. “The DREAMS core package of interventions: a comprehensive approach to preventing HIV among adolescent girls and young women.” PLOS One 13 (12): e0208167. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Slaymaker, Emma, Kathryn A. Risher, Ramadhani Abdul, Milly Marston, Keith Tomlin, Robert Newton, Anthony Ndyanabo, et al. 2020. “Risk factors for new HIV infections in the general population in sub-Saharan Africa.” ———. 2017. “Penalised complexity priors for stationary autoregressive processes.” Journal of Time Series Analysis 38 (6): 923–35. Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. The Global Fund. 2018. The Global Fund Measurement Framework for Adolescent Girls and Young Women Programs. https://www.theglobalfund.org/media/8076/me\\%5Fadolescentsgirlsandyoungwomenprograms\\%5Fframeworkmeasurement\\%5Fen.pdf. UNAIDS. 2021a. “2021 UNAIDS Global AIDS Update - Confronting Inequalities - Lessons for pandemic responses from 40 Years of AIDS.” Geneva, Switzerland. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” UNICEF. 2019. “Adolescent & social norms situation in Mozambique.” https://www.unicef.org/mozambique/en/adolescent-social-norms. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Wringe, A, I Cremin, J Todd, N McGrath, I Kasamba, K Herbst, P Mushore, B Żaba, and E Slaymaker. 2009. “Comparative assessment of the quality of age-at-event reporting in three HIV cohort studies in sub-Saharan Africa.” Sexually Transmitted Infections 85 (Suppl 1): i56–63. Zaba, Basia, Elizabeth Pisani, Emma Slaymaker, and J Ties Boerma. 2004. “Age at first sex: understanding recent trends in African demographic surveys.” Sexually Transmitted Infections 80 (suppl 2): ii28–35. "],["naomi-aghq.html", "6 Fast approximate Bayesian inference 6.1 Inference methods and software 6.2 A universal INLA implementation 6.3 The Naomi model 6.4 AGHQ in moderate dimensions 6.5 Malawi case-study 6.6 Discussion", " 6 Fast approximate Bayesian inference This chapter describes the development of a novel deterministic Bayesian inference approach, motivated by the Naomi small-area estimation model (Eaton et al. 2021). Over 35 countries (UNAIDS 2023b) have used the Naomi model web interface (https://naomi.unaids.org) to produce subnational estimates of HIV indicators. In Naomi, evidence is synthesised from household surveys and routinely collected health data to generate estimates of HIV indicators by district, age, and sex. The complexity and size of the model makes obtaining fast and accurate Bayesian inferences challenging. As such, development of the approach required meeting both methodological challenges and implementation difficulties. The methods in this chapter combine Laplace approximations with adaptive quadrature, and are descended from the integrated nested Laplace approximation (INLA) method pioneered by Håvard Rue, Martino, and Chopin (2009). The INLA method has enabled fast and accurate Bayesian inferences for a vast array of models, across a large number of scientific fields (Håvard Rue et al. 2017). The success of INLA is in large part due to its accessible implementation in the R-INLA software. Use of the INLA method and the R-INLA software are nearly ubiquitous in applied settings. However, the Naomi model is not compatible with R-INLA. The foremost reason is that Naomi is too complex to be expressed using a formula interface of the form y ~ .... Additionally, Naomi has more hyperparameters (moderate-dimensional, >20) than can typically be handled using INLA (low-dimensional, certainly below 10). As a result, inferences for the Naomi model have previously been obtained using an empirical Bayes [EB; Casella (1985)] approximation to full Bayesian inference, with the Laplace approximation implemented by the more flexible Template Model Builder [TMB; Kristensen et al. (2016)] R package. Under the EB approximation, the hyperparameters are fixed by optimising an approximation to the marginal posterior. This is undesirable as fixing the hyperparameters underestimates their uncertainty. Ultimately, the resulting overconfidence may lead to worse HIV prevention policy decisions. Most methodological work relating to INLA has taken place using the R-INLA software package. There are two notable exceptions. First, the simplified INLA approach of Wood (2020), implemented in the mgcv R package, proposed a fast Laplace approximation approach which does not rely on Markov structure of the latent field in the same way as Håvard Rue, Martino, and Chopin (2009). Second, Stringer, Brown, and Stafford (2022) extended the scope and scalability of INLA by avoiding augmenting the latent field with the noisy structured additive predictors. This enables the application of INLA to a wider class of extended latent Gaussian models, which includes Naomi. Van Niekerk et al. (2023) refer to this as the “modern” formulation of the INLA method, as opposed to the “classic” formulation of Håvard Rue, Martino, and Chopin (2009), and it is now included in R-INLA using inla.mode = \"experimental\". Stringer, Brown, and Stafford (2022) also propose use of the adaptive Gauss-Hermite quadrature [AGHQ; Naylor and Smith (1982)] rule to perform integration with respect to the hyperparameters. The methodological contributions of this chapter extend Stringer, Brown, and Stafford (2022) in two directions: First, a universally applicable implementation of INLA with Laplace marginals, where automatic differentiation via TMB is used to obtain the derivatives required for the Laplace approximation. For users of R-INLA, the Stringer, Brown, and Stafford (2022) approach is analogous to method = \"gaussian\", while the approach newly implemented in this chapter is analogous to method = \"laplace\". Section 6.2 demonstrates the implementation using two examples, one compatible with R-INLA and one incompatible. Second, a quadrature rule which combines AGHQ with principal components analysis to enable integration over moderate-dimensional spaces, described in Section 6.4. This quadrature rule is used to perform inference for the Naomi model by integrating the marginal Laplace approximation with respect to the moderate-dimensional hyperparameters within an INLA algorithm implemented in TMB in Section 6.5. This work was conducted in collaboration with Prof. Alex Stringer, whom I visited at the University of Waterloo during the fall term of 2022. Code for the analysis in this chapter is available from https://github.com/athowes/naomi-aghq. 6.1 Inference methods and software This section reviews existing deterministic Bayesian inference methods (Sections 6.1.1, 6.1.2, 6.1.3) and the software implementing them (Section 6.1.4). Recall that inference comprises obtaining the posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\frac{p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})}{p(\\mathbf{y})}, \\tag{6.1} \\end{equation}\\] or some way to compute relevant functions of it. The posterior distribution encapsulates beliefs about the parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\phi_1, \\ldots, \\phi_d)\\) having observed data \\(\\mathbf{y} = (y_1, \\ldots, y_n)\\). Here I assume these quantities are expressible as vectors. Inference is a sensible goal because (under Bayesian decision theory) the posterior distribution is sufficient for use in decision making. More specifically, given a loss function \\(l(a, \\boldsymbol{\\mathbf{\\phi}})\\), the expected posterior loss of a decision \\(a\\) depends on the data only via the posterior distribution \\[\\begin{equation} \\mathbb{E}(l(a, \\boldsymbol{\\mathbf{\\phi}}) \\, | \\, \\mathbf{y}) = \\int_{\\mathbb{R}^d} l(a, \\boldsymbol{\\mathbf{\\phi}}) p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\end{equation}\\] For example, historic data about treatment demand are only required for planning of HIV treatment service provision in so far as they alter the posterior distribution of current demand. The information provided for strategic response to the HIV epidemic may therefore be thought of as functions of some posterior distribution. It is usually intractable to obtain the posterior distribution. This is because the denominator in Equation (6.1) contains a potentially high-dimensional integral over the \\(d \\in \\mathbb{Z}^+\\) -dimensional parameters \\[\\begin{equation} p(\\mathbf{y}) = \\int_{\\mathbb{R}^d} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\tag{6.2} \\end{equation}\\] This quantity is sometimes called the evidence or posterior normalising constant. As a result, approximations to the posterior distribution \\(\\tilde p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\) are typically used in place of the exact posterior distribution. Some approximate Bayesian inference methods, like Markov chain Monte Carlo (MCMC), avoid directly calculating the posterior normalising constant. Instead they find ways to work with the unnormalised posterior distribution \\[\\begin{equation} p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\propto p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y}), \\end{equation}\\] where \\(p(\\mathbf{y})\\) is not a function of \\(\\boldsymbol{\\mathbf{\\phi}}\\) and so can be removed as a constant. Other approximate Bayesian inference methods can more directly be thought of as ways to estimate the posterior normalising constant (Equation (6.2)). The methods in this chapter fall into this latter category, and are sometimes described as deterministic Bayesian inference methods because they do not make fundamental use of randomness. 6.1.1 The Laplace approximation Laplace’s method (Laplace 1774) is a technique used to approximate integrals of the form \\[\\begin{equation} \\int \\exp(C h(\\mathbf{z})) \\text{d}\\mathbf{z}, \\end{equation}\\] where \\(C > 0\\) is a constant, \\(h\\) is a function which is twice-differentiable, and \\(\\mathbf{z}\\) are generic variables. The Laplace approximation (Tierney and Kadane 1986) is obtained by application of Laplace’s method to calculate the posterior normalising constant (Equation (6.2)). Let \\(h(\\boldsymbol{\\mathbf{\\phi}}) = \\log p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})\\) such that \\[\\begin{equation} p(\\mathbf{y}) = \\int_{\\mathbb{R}^d} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}) \\text{d}\\boldsymbol{\\mathbf{\\phi}} = \\int_{\\mathbb{R}^d} \\exp(h(\\boldsymbol{\\mathbf{\\phi}})) \\text{d}\\boldsymbol{\\mathbf{\\phi}}. \\end{equation}\\] Laplace’s method involves approximating the function \\(h\\) by its second order Taylor expansion. This expansion is then evaluated at a maxima of \\(h\\) to eliminate the first order term. Let \\[\\begin{equation} \\hat{\\boldsymbol{\\mathbf{\\phi}}} = \\arg\\max_{\\boldsymbol{\\mathbf{\\phi}}} h(\\boldsymbol{\\mathbf{\\phi}}) \\tag{6.3} \\end{equation}\\] be the posterior mode, and \\[\\begin{equation} \\hat {\\mathbf{H}} = - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\phi}} \\partial \\boldsymbol{\\mathbf{\\phi}}^\\top} h(\\boldsymbol{\\mathbf{\\phi}}) \\rvert_{\\boldsymbol{\\mathbf{\\phi}} = \\hat{\\boldsymbol{\\mathbf{\\phi}}}} \\tag{6.4} \\end{equation}\\] be the Hessian matrix evaluated at the posterior mode. The Laplace approximation to the posterior normalising constant (Equation (6.2)) is then \\[\\begin{align} \\tilde p_{\\texttt{LA}}(\\mathbf{y}) &= \\int_{\\mathbb{R}^d} \\exp \\left( h(\\hat{\\boldsymbol{\\mathbf{\\phi}}}) - \\frac{1}{2} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}})^\\top \\hat {\\mathbf{H}} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\right) \\text{d}\\boldsymbol{\\mathbf{\\phi}} \\tag{6.5} \\\\ &= p(\\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\mathbf{y}) \\cdot \\frac{(2 \\pi)^{d/2}}{| \\hat {\\mathbf{H}} |^{1/2}}. \\tag{6.6} \\end{align}\\] The result above is calculated using the known normalising constant of the Gaussian distribution \\[\\begin{equation} p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) = \\mathcal{N} \\left( \\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\hat {\\mathbf{H}}^{-1} \\right) = \\frac{| \\hat {\\mathbf{H}} |^{1/2}}{(2 \\pi)^{d/2}} \\exp \\left( - \\frac{1}{2} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}})^\\top \\hat {\\mathbf{H}} (\\boldsymbol{\\mathbf{\\phi}} - \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\right). \\end{equation}\\] The Laplace approximation may be thought of as approximating the posterior distribution by a Gaussian distribution \\(p(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y}) \\approx p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})\\) such that \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\mathbf{y}) = \\frac{p(\\boldsymbol{\\mathbf{\\phi}}, \\mathbf{y})}{p_\\texttt{G}(\\boldsymbol{\\mathbf{\\phi}} \\, | \\, \\mathbf{y})} \\Big\\rvert_{\\boldsymbol{\\mathbf{\\phi}} = \\hat{\\boldsymbol{\\mathbf{\\phi}}}}. \\end{equation}\\] Calculation of the Laplace approximation requires obtaining the second derivative of \\(h\\) with respect to \\(\\boldsymbol{\\mathbf{\\phi}}\\) (Equation (6.4)). Derivatives may also be used to improve the performance of the optimisation algorithm used to obtain the maxima of \\(h\\) (Equation (6.3)) by providing access to the gradient of \\(h\\) with respect to \\(\\boldsymbol{\\mathbf{\\phi}}\\). Figure 6.1: Demonstration of the Laplace approximation for the simple Bayesian inference example of Figure 3.1. The unnormalised posterior is \\(p(\\phi, \\mathbf{y}) = \\phi^8 \\exp(-4 \\phi)\\), and can be recognised as the unnormalised gamma distribution \\(\\text{Gamma}(9, 4)\\). The true log normalising constant is \\(\\log p(\\mathbf{y}) = \\log\\Gamma(9) - 9 \\log(4) = -1.872046\\), whereas the Laplace approximate log normalising constant is \\(\\log \\tilde p_{\\texttt{LA}}(\\mathbf{y}) = -1.882458\\), resulting from the Gaussian approximation \\(p_\\texttt{G}(\\phi \\, | \\, \\mathbf{y}) = \\mathcal{N}(\\phi \\, | \\,\\mu = 2, \\tau = 2)\\). 6.1.1.1 The marginal Laplace approximation Approximating the full joint posterior distribution using a Gaussian distribution may be inaccurate. An alternative is to approximate the marginal posterior distribution of some subset of the parameters, referred to as the marginal Laplace approximation. It remains to integrate out the remaining parameters, using another more suitable method. This approach is the basis of the INLA method. Let \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) and consider a three-stage hierarchical model \\[\\begin{equation} p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) = p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] where \\(\\mathbf{x} = (x_1, \\ldots, x_N)\\) is the latent field, and \\(\\boldsymbol{\\mathbf{\\theta}} = (\\theta_1, \\ldots, \\theta_m)\\) are the hyperparameters. Applying a Gaussian approximation to the latent field, we have \\(h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) = \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) with \\(N\\)-dimensional posterior mode \\[\\begin{equation} \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}) = \\arg\\max_{\\mathbf{x}} h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\tag{6.7} \\end{equation}\\] and \\((N \\times N)\\)-dimensional Hessian matrix evaluated at the posterior mode \\[\\begin{equation} \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) = - \\frac{\\partial^2}{\\partial \\mathbf{x} \\partial \\mathbf{x}^\\top} h(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\tag{6.8} \\end{equation}\\] Dependence on the hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}}\\) is made explicit in both Equation (6.7) and (6.8) such that there is a Gaussian approximation to the marginal posterior of the latent field \\(\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1})\\) at each value \\(\\boldsymbol{\\mathbf{\\theta}}\\) in the space \\(\\mathbb{R}^m\\). The resulting marginal Laplace approximation, for a particular value of the hyperparameters, is then \\[\\begin{align} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &= \\int_{\\mathbb{R}^N} \\exp \\left( h(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\boldsymbol{\\mathbf{\\theta}}) - \\frac{1}{2} (\\mathbf{x} - \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}))^\\top \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) (\\mathbf{x} - \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})) \\right) \\text{d}\\mathbf{x} \\tag{6.9} \\\\ &= \\exp(h(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\mathbf{y})) \\cdot \\frac{(2 \\pi)^{d/2}}{| \\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) |^{1/2}} \\\\ &= \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{align}\\] The marginal Laplace approximation is most accurate when the marginal posterior \\(p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) is accurately approximated by a Gaussian distribution. For the class of latent Gaussian models (Håvard Rue, Martino, and Chopin 2009) the prior distribution on the latent field is Gaussian \\[\\begin{equation} \\mathbf{x} \\sim \\mathcal{N}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\mathbf{0}, \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}})), \\end{equation}\\] with assumed zero mean \\(\\mathbf{0}\\), and precision matrix \\(\\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}})\\). The resulting marginal posterior distribution \\[\\begin{align} p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &\\propto \\mathcal{N}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}) p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\\\ &\\propto \\exp \\left( - \\frac{1}{2} \\mathbf{x}^\\top \\mathbf{Q}(\\boldsymbol{\\mathbf{\\theta}}) \\mathbf{x} + \\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\right) \\end{align}\\] is not exactly Gaussian. However, its deviation can be expected to be small if \\(\\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) is small (Blangiardo et al. 2013). 6.1.2 Quadrature Quadrature is a method used to approximate integrals using a weighted sum of function evaluations. As with the Laplace approximation, it is deterministic in that the computational procedure is not intrinsically random. Let \\(\\mathcal{Q}\\) be a set of quadrature nodes \\(\\mathbf{z} \\in \\mathcal{Q}\\) and \\(\\omega: \\mathbb{R}^d \\to \\mathbb{R}\\) be a weighting function. Then, quadrature can be used to estimate the posterior normalising constant (Equation (6.2)) by \\[\\begin{equation} \\tilde p_{\\mathcal{Q}}(\\mathbf{y}) = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} p(\\mathbf{y}, \\mathbf{z}) \\omega(\\mathbf{z}). \\end{equation}\\] To illustrate quadrature for a simple example, consider integrating the univariate function \\(f(z) = z \\sin(z)\\) between \\(z = 0\\) and \\(z = \\pi\\). This integral can be calculated analytically using integration by parts and evaluates to \\(\\pi\\). A quadrature approximation of this integral is \\[\\begin{equation} \\pi = \\sin(z) - z \\cos(z) \\bigg|_0^\\pi = \\int_{0}^\\pi z \\sin(z) \\text{d} z \\approx \\sum_{z \\in \\mathcal{Q}} z \\sin(z) \\omega(z), \\tag{6.10} \\end{equation}\\] where \\(\\mathcal{Q} = \\{z_1, \\ldots z_k\\}\\) are a set of \\(k\\) quadrature nodes and \\(\\omega: \\mathbb{R} \\to \\mathbb{R}\\) is a weighting function. The trapezoid rule is an example of a quadrature rule, in which quadrature nodes are spaced throughout the domain with \\(\\epsilon_i = z_i - z_{i - 1} > 0\\) for \\(1 < i < k\\). The weighting function is \\[\\begin{equation} \\omega(z_i) = \\begin{cases} \\epsilon_i & 1 < i < k, \\\\ \\epsilon_i / 2 & i \\in \\{1, k\\}. \\end{cases} \\end{equation}\\] Figure 6.2 shows application of the trapezoid rule to integration of \\(z \\sin(z)\\) as described in Equation (6.10). The more quadrature nodes are used, the more accurate the estimate of the integrand is. Under some regularity conditions on \\(f\\), as the spacing between quadrature nodes \\(\\epsilon \\to 0\\) the estimate obtained using the trapezoid rule converges to the true value of the integral. Indeed, this approach was used by Riemann to provide the first rigorous definition of the integral. Figure 6.2: The trapezoid rule with \\(k = 5, 10, 20\\) equally-spaced (\\(\\epsilon_i = \\epsilon > 0\\)) quadrature nodes can be used to integrate the function \\(f(z) = z \\sin(z)\\), shown in green, in the domain \\([0, \\pi]\\). Here, the exact solution is \\(\\pi \\approx 3.1416\\). As \\(k\\) increases and more nodes are used in the computation, the quadrature estimate becomes closer to the exact solution. The trapezoid rule estimate is given by the sum of the areas of the grey trapezoids. Quadrature methods are most effective when integrating over small dimensions, say three or less. This is because the number of quadrature nodes at which the function is required to be evaluated in the computation grows exponentially with the dimension. For even moderate dimension, this quickly makes computation intractable. For example, using 5, 10, or 20 quadrature nodes per dimension, as in Figure 6.2, in five-dimensions (rather than one, as shown) would require 3125, 100000 or 3200000 quadrature nodes respectively. Though quadrature is easily parallelisable, in that function evaluation at each node are entirely independent, solutions requiring the evaluation of millions quadrature nodes are unlikely to be tractable. 6.1.2.1 Gauss-Hermite quadrature It is possible to construct quadrature rules which use relatively few nodes and are highly accurate when the integrand adheres to certain assumptions [Chapter 4; Press et al. (2007)]. Gauss-Hermite quadrature [GHQ; Davis and Rabinowitz (1975)] is a quadrature rule designed to integrate functions of the form \\(f(\\mathbf{z}) = \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z})\\) exactly, that is with no error, such that \\[\\begin{equation} \\int \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z}) \\text{d} \\mathbf{z} = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} \\varphi(\\mathbf{z}) P_\\alpha(\\mathbf{z}) \\omega(\\mathbf{z}). \\tag{6.11} \\end{equation}\\] In this equation, the term \\(\\varphi(\\cdot)\\) is a standard multivariate normal density \\(\\mathcal{N}(\\cdot \\, | \\, \\mathbf{0}, \\mathbf{I})\\), where \\(\\mathbf{0}\\) and \\(\\mathbf{I}\\) are the zero-vector and identify matrix of relevant dimension, and the term \\(P_\\alpha(\\cdot)\\) is a polynomial with highest degree monomial \\(\\alpha \\leq 2k - 1\\), where \\(k\\) is the number of quadrature nodes per dimension. GHQ is attractive for Bayesian inference problems because posterior distributions are typically well approximated by functions of this form. Support for this statement is provided by the Bernstein–von Mises theorem, which states that, under some regularity conditions, as the number of data points increases the posterior distribution convergences to a Gaussian. I follow the notation for GHQ established by Bilodeau, Stringer, and Tang (2022). First, to construct the univariate GHQ rule for \\(z \\in \\mathbb{R}\\), let \\(H_k(z)\\) be the \\(k\\)th (probabilist’s) Hermite polynomial \\[\\begin{equation} H_k(z) = (-1)^k \\exp(z^2 / 2) \\frac{\\text{d}}{\\text{d}z^k} \\exp(-z^2 / 2) \\end{equation}\\] The Hermite polynomials are defined to be orthogonal with respect to the standard Gaussian probability density function \\[\\begin{equation} \\int H_k(z) H_l(z) \\varphi(z) \\text{d} z = \\delta_{kl}, \\end{equation}\\] where \\(\\delta_{kl} = 1\\) if \\(k = l\\) and \\(\\delta_{kl} = 0\\) otherwise. The GHQ nodes \\(z \\in \\mathcal{Q}(1, k)\\) are given by the \\(k\\) zeroes of the \\(k\\)th Hermite polynomial. For \\(k = 1, 2, 3\\) these zeros, up to three decimal places, are \\[\\begin{align} H_1(z) = z = 0 \\implies \\mathcal{Q}(1, 1) &= \\{0\\}, \\\\ H_2(z) = z^2 - 1 = 0 \\implies \\mathcal{Q}(1, 2) &= \\{-0.707, 0.707\\}, \\\\ H_3(z) = z^3 - 3z = 0 \\implies \\mathcal{Q}(1, 3) &= \\{-1.225, 0, 1.225\\}. \\end{align}\\] The quadrature nodes are symmetric about zero, and include zero when \\(k\\) is odd. The corresponding weighting function \\(\\omega: \\mathcal{Q}(1, k) \\to \\mathbb{R}\\) chosen to satisfy Equation (6.11) is given by \\[\\begin{equation} \\omega(z) = \\frac{k!}{\\varphi(z) [H_{k + 1}(z)]^2}. \\end{equation}\\] Multivariate GHQ rules are usually constructed using the product rule with identical univariate GHQ rules in each dimension. As such, in \\(d\\) dimensions, the multivariate GHQ nodes \\(\\mathbf{z} \\in \\mathcal{Q}(d, k)\\) are defined by \\[\\begin{equation} \\mathcal{Q}(d, k) = \\mathcal{Q}(1, k)^d = \\mathcal{Q}(1, k) \\times \\cdots \\times \\mathcal{Q}(1, k). \\end{equation}\\] The corresponding weighting function \\(\\omega: \\mathcal{Q}(d, k) \\to \\mathbb{R}\\) is given by a product of the univariate weighting functions \\(\\omega(\\mathbf{z}) = \\prod_{j = 1}^d \\omega(z_j)\\). 6.1.2.2 Adaptive quadrature In adaptive quadrature, the quadrature nodes and weights selected depend on the specific integrand being considered. For example, adaptive use of the trapezoid rule requires specifying a rule for the start point, end point, and spacing between quadrature nodes. It is particularly important to use an adaptive quadrature rule for Bayesian inference problems because the posterior normalising constant \\(p(\\mathbf{y})\\) is a function of the data. No fixed quadrature rule can be expected to effectively integrate all possible posterior distributions. In adaptive GHQ [AGHQ; Naylor and Smith (1982)] the quadrature nodes are shifted by the mode of the integrand, and rotated based on a matrix decomposition of the inverse curvature at the mode. To demonstrate AGHQ, consider its application to calculation of the posterior normalising constant. The relevant transformation of the GHQ nodes \\(\\mathcal{Q}(d, k)\\) is \\[\\begin{equation} \\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z}) = \\hat{\\mathbf{P}} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\phi}}}, \\end{equation}\\] where \\(\\hat{\\mathbf{P}}\\) is a matrix decomposition of \\(\\hat{\\boldsymbol{\\mathbf{H}}}^{-1} = \\hat{\\mathbf{P}} \\hat{\\mathbf{P}}^\\top\\). To account for the transformation, the weighting function may be redefined to include a matrix determinant, analogous to the Jacobian determinant, or more simply the matrix determinant may be written outside the integral. Taking the later approach, the resulting adaptive quadrature estimate of the posterior normalising constant is \\[\\begin{align} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) &= | \\hat{\\mathbf{P}} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(d, k)} p(\\mathbf{y}, \\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z})) \\omega(\\mathbf{z}) \\\\ &= | \\hat{\\mathbf{P}} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(d, k)} p(\\mathbf{y}, \\hat{\\mathbf{P}} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\omega(\\mathbf{z}). \\end{align}\\] The quantities \\(\\hat{\\boldsymbol{\\mathbf{\\phi}}}\\) and \\(\\hat{\\boldsymbol{\\mathbf{H}}}\\) are exactly those given in Equations (6.3) and (6.4) and used in the Laplace approximation. Indeed, when \\(k = 1\\) then AGHQ corresponds to the Laplace approximation. To see this, we have \\(H_1(z)\\) with univariate zero \\(z = 0\\) such that the adapted node is given by the mode \\(\\boldsymbol{\\mathbf{\\phi}}(\\mathbf{z} = \\mathbf{0} = 0 \\times \\cdots \\times 0) = \\hat{\\boldsymbol{\\mathbf{\\phi}}}\\). The weighting function is given by \\[\\begin{equation} \\omega(0)^d = \\left( \\frac{1!}{\\varphi(0) H_{2}(0)^2} \\right)^d = \\left( \\frac{1}{\\varphi(0)} \\right)^d = \\left(2 \\pi\\right)^{d / 2}. \\end{equation}\\] The AGHQ estimate of the normalising constant for \\(k = 1\\) is then given by \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = p(\\mathbf{y}, \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\cdot | \\hat{\\mathbf{P}} | \\cdot (2 \\pi)^{d / 2} = p(\\mathbf{y}, \\hat{\\boldsymbol{\\mathbf{\\phi}}}) \\cdot \\frac{(2 \\pi)^{d / 2}}{| \\hat{\\mathbf{H}} | ^{1/2}}, \\end{equation}\\] which corresponds to the Laplace approximation \\(\\tilde p_{\\texttt{LA}}(\\mathbf{y})\\) given in Equation (6.6). This connection supports AGHQ being a natural extension of the Laplace approximation when greater accuracy than \\(k = 1\\) is required. Figure 6.3: The Gauss-Hermite quadrature nodes \\(\\mathbf{z} \\in \\mathcal{Q}(2, 3)\\) for a two-dimensional integral with three nodes per dimension (Panel A). Adaption occurs based on the mode (Panel B) and covariance of the integrand via either the Cholesky (Panel C) or spectral (Panel D) decomposition of the inverse curvature at the mode. Here, the integrand is \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\), where \\(\\text{sn}(\\cdot)\\) is the standard skewnormal probability density function with shape parameter \\(\\alpha \\in \\mathbb{R}\\). Two alternatives for the matrix decomposition \\(\\hat{\\boldsymbol{\\mathbf{H}}}^{-1} = \\hat{\\mathbf{P}} \\hat{\\mathbf{P}}^\\top\\) are the Cholesky and spectral decomposition (Jäckel 2005). For the Cholesky decomposition \\(\\hat{\\mathbf{P}} = \\hat{\\mathbf{L}}\\), where \\[\\begin{equation} \\hat{\\mathbf{L}} = \\begin{pmatrix} L_{11} & 0 & \\cdots & 0 \\\\ \\hat{L}_{12} & \\hat{L}_{22} & \\ddots & \\vdots \\\\ \\vdots & \\ddots& \\ddots& 0 \\\\ \\hat{L}_{1d} & \\ldots& \\hat{L}_{(d-1)d} & \\hat{L}_{dd}\\\\ \\end{pmatrix} \\end{equation}\\] is a lower triangular matrix. For the spectral decomposition \\(\\hat{\\mathbf{P}} = \\hat{\\mathbf{E}} \\hat{\\mathbf{\\Lambda}}^{1/2}\\), where \\(\\hat{\\mathbf{E}} = (\\hat{\\mathbf{e}}_{1}, \\ldots \\hat{\\mathbf{e}}_{d})\\) contains the eigenvectors of \\(\\hat{\\mathbf{H}}^{-1}\\) and \\(\\hat{\\mathbf{\\Lambda}}\\) is a diagonal matrix containing its eigenvalues \\((\\hat \\lambda_{1}, \\ldots, \\hat \\lambda_{d})\\). Figure 6.3 demonstrates GHQ and AGHQ for a two-dimensional example, using both decomposition approaches. Using the Cholesky decomposition results in adapted quadrature nodes which collapse along one of the dimensions, as a result of the matrix \\(\\hat{\\mathbf{L}}\\) being lower triangular. On the other hand, using the spectral decomposition results in adapted quadrature nodes which lie along the orthogonal eigenvectors of \\(\\hat{\\mathbf{H}}^{-1}\\). Using AGHQ, Bilodeau, Stringer, and Tang (2022) provide the first stochastic convergence rate for adaptive quadrature applied to Bayesian inference. 6.1.3 Integrated nested Laplace approximation The integrated nested Laplace approximation (INLA) method (Håvard Rue, Martino, and Chopin 2009) combines marginal Laplace approximations with quadrature to enable approximation of posterior marginal distributions. Consider the marginal Laplace approximation (Section 6.1.1.1) for a three-stage hierarchical model given by \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] To complete approximation of the posterior normalising constant, the marginal Laplace approximation can be integrated over the hyperparameters using a quadrature rule (Section 6.1.2) \\[\\begin{equation} \\tilde p(\\mathbf{y}) = \\sum_{\\mathbf{z} \\in \\mathcal{Q}} \\tilde p_\\texttt{LA}(\\mathbf{z}, \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.12} \\end{equation}\\] Though any choice of quadrature rule is possible, following Stringer, Brown, and Stafford (2022) here I consider use of AGHQ. Let \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) be the \\(m\\)-dimensional GHQ nodes constructed using the product rule with \\(k\\) nodes per dimension, and \\(\\omega: \\mathbb{R}^m \\to \\mathbb{R}\\) the corresponding weighting function. These nodes are adapted by \\(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) = \\hat{\\mathbf{P}}_\\texttt{LA} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\) where \\[\\begin{align} \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA} &= \\arg\\max_{\\boldsymbol{\\mathbf{\\theta}}} \\log \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}), \\\\ \\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA} &= - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\theta}} \\partial \\boldsymbol{\\mathbf{\\theta}}^\\top} \\log \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) \\rvert_{\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}}, \\tag{6.13} \\\\ \\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA}^{-1} &= \\hat{\\mathbf{P}}_\\texttt{LA} \\hat{\\mathbf{P}}_\\texttt{LA}^\\top. \\end{align}\\] The nested AGHQ estimate of the posterior normalising constant is then \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = | \\hat{\\mathbf{P}}_\\texttt{LA} | \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.14} \\end{equation}\\] This estimate can be used to normalise the marginal Laplace approximation as follows \\[\\begin{equation} \\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_{\\texttt{AQ}}(\\mathbf{y})}. \\end{equation}\\] The posterior marginals \\(\\tilde p(\\theta_j \\, | \\, \\mathbf{y})\\) may be obtained by \\[\\begin{align} \\tilde p(\\theta_j \\, | \\, \\mathbf{y}) = \\int \\tilde p(\\theta_j \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}}_{-j}. \\end{align}\\] These integrals may be computed by reusing the AGHQ rule. More recent methods are discussed in Section 3.2 of Martins et al. (2013). Multiple methods have been proposed for obtaining the \\(\\tilde p(\\mathbf{x} \\, | \\, \\mathbf{y})\\) or individual marginals \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) Four methods are presented below, trading-off accuracy with computational expense. 6.1.3.1 Gaussian marginals Most easily, inferences for the latent field can be obtained by approximation of \\(p(\\mathbf{x} \\, | \\, \\mathbf{y})\\) using another application of the quadrature rule (Håvard Rue and Martino 2007) \\[\\begin{align} p(\\mathbf{x} \\, | \\, \\mathbf{y}) &= \\int p(\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} = \\int p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) p(\\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} \\\\ &\\approx |\\hat{\\mathbf{P}}_\\texttt{LA}| \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\omega(\\mathbf{z}). \\tag{6.15} \\end{align}\\] The quadrature rule \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) is used both internally to normalise the marginal Laplace approximation, and externally to perform integration with respect to the hyperparameters. Equation (6.15) is a mixture of Gaussian distributions \\[\\begin{equation} p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}), \\tag{6.16} \\end{equation}\\] each with multinomial probabilities \\[\\begin{equation} \\lambda(\\mathbf{z}) = |\\hat{\\mathbf{P}}_\\texttt{LA}| \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\omega(\\mathbf{z}), \\end{equation}\\] where \\(\\sum \\lambda(\\mathbf{z}) = 1\\) and \\(\\lambda(\\mathbf{z}) > 0\\). Samples may therefore be naturally obtained for the complete vector \\(\\mathbf{x}\\) jointly by first drawing a node \\(\\mathbf{z} \\in \\mathcal{Q}(m, k)\\) with multinomial probabilities \\(\\lambda(\\mathbf{z})\\) then drawing a sample from the corresponding Gaussian distribution in Equation (6.16). Algorithms for fast and exact simulation from a Gaussian distribution have been developed, including by Håvard Rue (2001). The posterior marginals for any subset of the complete vector can simply be obtained by keeping the relevant entries of \\(\\mathbf{x}\\). 6.1.3.2 Laplace marginals An alternative higher accuracy, but more computationally expensive, approach is to calculate a Laplace approximation to the marginal posterior \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\tag{6.17} \\end{equation}\\] Here, the variable \\(x_i\\) is excluded from the Gaussian approximation such that \\[\\begin{equation} p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x}_{-i} \\, | \\, \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}})), \\end{equation}\\] with \\((N - 1)\\)-dimensional posterior mode \\[\\begin{equation} \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) = \\arg\\max_{\\mathbf{x}_{-i}} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}), \\end{equation}\\] and \\([(N - 1) \\times (N - 1)]\\)-dimensional Hessian matrix evaluated at the posterior mode \\[\\begin{equation} \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) = - \\frac{\\partial^2}{\\partial \\mathbf{x}_{-i} \\partial \\mathbf{x}_{-i}^\\top} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] The approximate posterior marginal \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) may be obtained by normalising the marginal Laplace approximation (Equation (6.17)) before performing integration with respect to the hyperparameters (as in Equation (6.15)). The normalised Laplace approximation is \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p(\\mathbf{y})}. \\end{equation}\\] where either the estimate of the evidence in Equation (6.14) may be reused or a de novo estimate can be computed. Integration with respect to the hyperparameters is performed via \\[\\begin{align} p(x_i \\, | \\, \\mathbf{y}) &= \\int p(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y}) \\text{d} \\boldsymbol{\\mathbf{\\theta}} \\\\ &\\approx |\\hat{\\mathbf{P}}_\\texttt{LA}| \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}) \\, | \\, \\mathbf{y}) \\tilde \\omega(\\mathbf{z}). \\tag{6.18} \\end{align}\\] Equation (6.18) is a mixture of the normalised Laplace approximations \\(\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}} \\, | \\, \\mathbf{y})\\) over the hyperparameter quadrature nodes. However, unlike the Gaussian case (Section 6.1.3.1) it is not easy to directly sample each Laplace approximation. As such, Equation (6.18) may instead be represented by its evaluation at a number of nodes. One approach is to chose these nodes based on a one-dimensional AGHQ rule, using the mode and standard deviation of the Gaussian approximation to avoid unnecessary computation of the Laplace marginal mode and standard deviation. The probability density function of the marginal posterior may then be recovered using a Lagrange polynomial or spline interpolant to the log probabilities. An important downside of the Laplace approach is that posterior dependences between posterior marginal draws are not preserved, unlike in the mixture of Gaussians case (Equation (6.15)). Recent work using Gaussian copulas (Chiuchiolo, Niekerk, and Rue 2023) aims to retain the accuracy of the Laplace marginals strategy while obtaining a joint approximation. 6.1.3.3 Simplified Laplace marginals When the latent field \\(\\mathbf{x}\\) is a Gauss-Markov random fields [GMRF; Havard Rue and Held (2005)] it is possible to efficiently approximate the Laplace marginals in Section 6.1.3.2. The simplified approximation is achieved by a Taylor expansion on the numerator and denominator of Equation (6.17) up to third order. The approach is analogous to correcting the Gaussian approximation in Section 6.1.3.1 for location and skewness. Details are left to Section 3.2.3 of Håvard Rue, Martino, and Chopin (2009). 6.1.3.4 Simplified INLA Wood (2020) describe a method for approximating the Laplace marginals without depending on the Markov structure, while still achieving equivalent efficiency. This work was motivated by a setting in which, similar to extended latent Gaussian models [ELGMs; Stringer, Brown, and Stafford (2022)], precision matrices are not typically as sparse as GMRFs. Details are left to Wood (2020). 6.1.3.5 Augmenting a noisy structured additive predictor to the latent field Discussion of INLA is concluded by briefly mentioning a difference in implementation between Håvard Rue, Martino, and Chopin (2009) and Stringer, Brown, and Stafford (2022). Specifically, Håvard Rue, Martino, and Chopin (2009) augment the latent field to include a noisy structured additive predictor as follows \\[\\begin{align} \\boldsymbol{\\mathbf{\\eta}}^\\star &= \\boldsymbol{\\mathbf{\\eta}} + \\boldsymbol{\\mathbf{\\epsilon}}, \\\\ \\boldsymbol{\\mathbf{\\epsilon}} &\\sim \\mathcal{N}(\\mathbf{0}, \\tau^{-1} \\mathbf{I}_n), \\\\ \\mathbf{x}^\\star &= (\\boldsymbol{\\mathbf{\\eta}}^\\star, \\mathbf{x}). \\end{align}\\] Stringer, Brown, and Stafford (2022) (Section 3.2) omit this augmentation, highlighting several drawbacks including: fitting ELGMs, fitting LGMs to large datasets, and theoretical study of the approximation error. Similarly, in what Van Niekerk et al. (2023) (Section 2.1) refer to as the “modern” formula of INLA, the latent field is not augmented. The crux of the issue regards the dimensions and sparsity structure of the Hessian matrix \\(\\hat {\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})\\). Details are left to Stringer, Brown, and Stafford (2022). Based on these findings, this thesis does not augment the latent field. 6.1.4 Software 6.1.4.1 R-INLA The R-INLA software (Martins et al. 2013) implements the INLA method, as well as the stochastic partial differential equation (SPDE) approach of Lindgren, Rue, and Lindström (2011). R-INLA is the R interface to the core inla program, which is written in C (Martino and Rue 2009). Algorithms for sampling from GMRFs are used from the GMRFLib C library (Håvard Rue and Follestad 2001). First and second derivatives are either hard coded, or computed numerically using central finite differences (Fattah, Niekerk, and Rue 2022). For a review recent computational features of R-INLA, including parallelism via OpenMP (Diaz et al. 2018) and use of the PARDISO sparse linear equation solver (Bollhöfer et al. 2020), see Gaedke-Merzhäuser et al. (2023). Further information about R-INLA, including recent developments, can be found at https://r-inla.org. The connection between the latent field \\(\\mathbf{x}\\) and structured additive predictor \\(\\boldsymbol{\\mathbf{\\eta}}\\) is specified in R-INLA using a formula interface of the form y ~ .... The interface is similar to that used in the lm function in the core stats R package. For example, a model with one fixed effect a and one IID random effect b, has the formula y ~ a + f(b, model = \"iid\"). This interface is easy to engage with for new users, but can be limiting for more advanced users. The approach used to compute the marginals \\(\\tilde p(x_i \\, | \\, \\mathbf{y})\\) can be chosen by setting method to \"gaussian\" (Section 6.1.3.1), \"laplace\" (Section 6.1.3.2) or simplified.laplace (Section 6.1.3.3). The quadrature grid used can be chosen by setting int.strategy to \"eb\" (empirical Bayes, one quadrature node), \"grid\" (a dense grid), or \"ccd\" [Box-Wilson central composite design; Box and Wilson (1992)]. Figure 6.4 demonstrates the latter two integration strategies. By default, the \"grid\" strategy is used for \\(m \\leq 2\\) and the \"ccd\" strategy is used for \\(m > 2\\). Various software packages have been built using R-INLA. Perhaps the most substantial is the inlabru R package (Bachl et al. 2019). As well as a simplified syntax, inlabru provides capabilities for fitting more general non-linear structured additive predictor expressions via linearisation and repeat use of R-INLA. These complex model components are specified in inlabru using the bru_mapper system. See the inlabru package vignettes for additional details. Further inference procedures which leverage R-INLA include INLA within MCMC (Gómez-Rubio and Rue 2018) and importance sampling with INLA (Berild et al. 2022). Figure 6.4: Consider the function \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\) as described in Figure 6.3. Panel A shows the grid method as used in R-INLA and detailed in Section 3.1 of Håvard Rue, Martino, and Chopin (2009). Briefly, equally-weighted quadrature points are generated by starting at the mode and taking steps of size \\(\\delta_z\\) along each eigenvector of the inverse curvature at the mode, scaled by the eigenvalues, until the difference in log-scale function evaluations (compared to the mode) is below a threshold \\(\\delta_\\pi\\). Intermediate values are included if they have sufficient log-scale function evaluation. Here, I set \\(\\delta_z = 0.75\\) and \\(\\delta_\\pi = 2\\). Panel B shows a CCD as used in R-INLA and detailed in Section 6.5 of Håvard Rue, Martino, and Chopin (2009). The CCD was generated using the rsm R package (Lenth 2009), and is comprised of: one centre point; four factorial points, used to help estimate linear effects; and four star points, used to help estimate the curvature. 6.1.4.2 TMB Template Model Builder [TMB; Kristensen et al. (2016)] is an R package which implements the Laplace approximation. In TMB, derivatives are obtained using automatic differentiation, also known as algorithmic differentiation [AD; Baydin et al. (2017)]. The approach of AD is to decompose any function into a sequence of elementary operations with known derivatives. The known derivatives of the elementary operations may then be composed by repeat use of the chain rule to obtain the function’s derivative. A review of AD and how it can be efficiently implemented is provided by C. C. Margossian (2019). TMB uses the C++ package CppAD (Bell 2023) for AD [Section 3; Kristensen et al. (2016)]. The development of TMB was strongly inspired by the Automatic Differentiation Model Builder [ADMB; Fournier et al. (2012); Bolker et al. (2013)] project. An algorithm is used in TMB to automatically determine matrix sparsity structure [Section 4.2; Kristensen et al. (2016)]. The R package Matrix and C++ package Eigen are then used for sparse and dense matrix calculations. Kristensen et al. (2016) highlight the modular design philosophy of TMB. Models are specified in TMB using a C++ template file which evaluates \\(\\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) in a Bayesian context or \\(\\log p(\\mathbf{y} \\, | \\, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) in a frequentist setting. Other software packages have been developed which also use TMB C++ templates. The tmbstan R package (Monnahan and Kristensen 2018) allows running the Hamiltonian Monte Carlo (HMC) algorithm via Stan. The aghq R package (Stringer 2021) allows use of AGHQ, and AGHQ over the marginal Laplace approximation, via the mvQuad R package (Weiser 2016). The glmmTMB R package (Brooks et al. 2017) allows specification of common GLMM models via a formula interface. It is also possible to extract the TMB objective function used by glmmTMB, which may then be passed into aghq or tmbstan. A review of the use of TMB for spatial modelling, including comparison to R-INLA, is provided by Osgood-Zimmerman and Wakefield (2023). 6.1.4.3 Other software The mgcv [Mixed GAM computation vehicle; Wood (2017)] R package estimates generalised additive models (GAMs) specified using a formula interface. This package is briefly mentioned so as to note that the function mgcv::ginla implements the simplified INLA approach of Wood (2020) (Section 6.1.3.4). 6.2 A universal INLA implementation This section is about implementation of the INLA method using AD via the TMB package. Both the Gaussian and Laplace latent field marginal approximations are implemented. The implementation is universal in that it is compatible with any model with a TMB C++ template, rather than based on a restrictive formula interface. The TMB probabilistic programming language is described as “universal” in that it is an extension of the Turing-complete general purpose language C++. Martino and Riebler (2020) note that “implementing INLA from scratch is a complex task” and as a result “applications of INLA are limited to the (large class of) models implemented [in R-INLA]”. A universal INLA implementation facilitates application of the method to models which are not compatible with R-INLA. The Naomi model is one among many examples. Section 5 of Osgood-Zimmerman and Wakefield (2023) notes that “R-INLA is capable of using higher-quality approximations than TMB” (hyperparameter integration and latent field Laplace marginals) and “in return TMB is applicable to a wider class of models”. Yet there is no inherent reason for these capabilities to be in conflict: it is possible to have both high-quality approximations and flexibility. The potential benefits of a more flexible INLA implementation based on AD were noted by Skaug (2009) (a coauthor of TMB) in discussion of Håvard Rue, Martino, and Chopin (2009), who noted that such a system would be “fast, flexible, and easy-to-use”, as well as “automatic from a user’s perspective”. As this suggestion was made close to 15 years ago, it is surprising that its potential remains unrealised. I demonstrate the universal implementation with two examples: Section 6.2.1 considers a generalised linear mixed model (GLMM) of an epilepsy drug. The model was used in Section 5.2 of Håvard Rue, Martino, and Chopin (2009), and is compatible with R-INLA. For some parameters there is a notable difference in approximation error depending on use of Gaussian or Laplace marginals. This example demonstrates the correspondence between the Laplace marginal implementation developed in TMB, and that of R-INLA with method set to \"laplace\". Section 6.2.2 considers an extended latent Gaussian model (ELGM) of a tropical parasitic infection. The model was used in Section 5.2 of Bilodeau, Stringer, and Tang (2022), and is not compatible with R-INLA. This example demonstrates the benefit of a more widely applicable INLA implementation. 6.2.1 Epilepsy GLMM Thall and Vail (1990) considered a GLMM for an epilepsy drug double-blind clinical trial (Leppik et al. 1985). This model was modified by Breslow and Clayton (1993) and widely disseminated as a part of the BUGS [Bayesian inference using Gibbs sampling; D. Spiegelhalter et al. (1996)] manual. Patients \\(i = 1, \\ldots, 59\\) were each assigned either a new drug \\(\\texttt{Trt}_i = 1\\) or a placebo \\(\\texttt{Trt}_i = 0\\). Each patient made four visits the clinic \\(j = 1, \\ldots, 4\\), and the observations \\(y_{ij}\\) are the number of seizures of the \\(i\\)th person in the two weeks preceding their \\(j\\)th clinic visit (Figure 6.5). The covariates used in the model were baseline seizure counts \\(\\texttt{Base}_i\\), treatment \\(\\texttt{Trt}_i\\), age \\(\\texttt{Age}_i\\), and an indicator for the final clinic visit \\({\\texttt{V}_4}_j\\). Each of the covariates were centred. The observations were modelled using a Poisson distribution \\[\\begin{equation} y_{ij} \\sim \\text{Poisson}(e^{\\eta_{ij}}), \\end{equation}\\] with structured additive predictor \\[\\begin{align} \\eta_{ij} &= \\beta_0 + \\beta_\\texttt{Base} \\log(\\texttt{Base}_i / 4) + \\beta_\\texttt{Trt} \\texttt{Trt}_i + \\beta_{\\texttt{Trt} \\times \\texttt{Base}} (\\texttt{Trt}_i \\times \\log(\\texttt{Base}_i / 4)) \\\\ &+ \\beta_\\texttt{Age} \\log(\\texttt{Age}_i) + \\beta_{\\texttt{V}_4} {\\texttt{V}_4}_j + \\epsilon_i + \\nu_{ij}, \\quad i \\in [59], \\quad j \\in [4]. \\tag{6.19} \\end{align}\\] The prior distribution on each of the regression parameters, including the intercept \\(\\beta_0\\), was \\(\\mathcal{N}(0, 100^2)\\). The patient \\(\\epsilon_i \\sim \\mathcal{N}(0, 1/\\tau_\\epsilon)\\) and patient-visit \\(\\nu_{ij} \\sim \\mathcal{N}(0, 1/\\tau_\\nu)\\) random effects were IID with gamma precision prior distributions \\(\\tau_\\epsilon, \\tau_\\nu \\sim \\Gamma(0.001, 0.001)\\). Figure 6.5: The number of seizures in the treatment group was fewer, on average, than the number of seizures in the control group. This is not sufficient to conclude that the treatment was effective. The GLMM accounts for differences between the treatment and control group, including in baseline seizures and age, and so can be used to help estimate a causal treatment effect. Table 6.1: The inference methods and software considered to fit the epilepsy GLMM in Section 6.2.1. Method Software Section 6.2.1.1 Gaussian latent field marginals, EB over hyperparameters R-INLA Section 6.2.1.1 Gaussian latent field marginals, grid over hyperparameters R-INLA Section 6.2.1.1 Laplace latent field marginals, EB over hyperparameters R-INLA Section 6.2.1.1 Laplace latent field marginals, grid over hyperparameters R-INLA Section 6.2.1.2 Gaussian latent field marginals, EB over hyperparameters TMB Section 6.2.1.3 Gaussian latent field marginals, AGHQ over hyperparameters TMB and aghq Section 6.2.1.4 Laplace latent field marginals, EB over hyperparameters TMB Section 6.2.1.5 Laplace latent field marginals, AGHQ over hyperparameters TMB and aghq Section 6.2.1.6 NUTS tmbstan Section 6.2.1.7 NUTS rstan Inference for the epilepsy GLMM was conducted using a range of approaches (Table 6.1). Section 6.2.1.8 compares the results. The foremost objective of this exercise is to demonstrate correspondence between inferences obtained from R-INLA and those from TMB. Furthermore, illustrative code is used throughout this section to enhance understanding of the methods and software used. As such, this section is more verbose than future sections. 6.2.1.1 INLA with R-INLA The epilepsy data are available from the R-INLA package. The covariates may be obtained and their transformations centred by: centre <- function(x) (x - mean(x)) Epil <- Epil %>% mutate(CTrt = centre(Trt), ClBase4 = centre(log(Base/4)), CV4 = centre(V4), ClAge = centre(log(Age)), CBT = centre(Trt * log(Base/4))) The structured additive predictor in Equation (6.19) is then specified by: formula <- y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + f(rand, model = "iid", hyper = tau_prior) + f(Ind, model = "iid", hyper = tau_prior) The object tau_prior specifies the \\(\\Gamma(0.001, 0.001)\\) precision prior: tau_prior <- list(prec = list( prior = "loggamma", param = c(0.001, 0.001), initial = 1, fixed = FALSE) ) The prior is specified as loggamma because R-INLA represents the precision internally on the log scale, to avoid any \\(\\tau > 0\\) constraints. Inference may then be performed, specifying the latent field posterior marginals approach strat and quadrature approach int_strat: beta_prior <- list(mean = 0, prec = 1 / 100^2) epil_inla <- function(strat, int_strat) { inla( formula, control.fixed = beta_prior, family = "poisson", data = Epil, control.inla = list(strategy = strat, int.strategy = int_strat), control.predictor = list(compute = TRUE), control.compute = list(config = TRUE) ) } The object beta_prior specifies the \\(\\mathcal{N}(0, 100^2)\\) regression coefficient prior. The Poisson likelihood is specified via the family argument. Inferences may be then obtained via the fit object: fit <- epil_inla(strat = "gaussian", int_strat = "grid") As described in Section 6.1.4.1, strat may be set to one of \"gaussian\", \"laplace\", or \"simplified.laplace\" and int_strat may be set to one of \"eb\", \"grid\", or \"ccd\". 6.2.1.2 Gaussian marginals and EB with TMB With TMB, the log-posterior of the model is specified using a C++ template. For simple models, writing this template is usually a more involved task then specifying the formula object required for R-INLA. The TMB C++ template epil.cpp for the epilepsy GLMM is in Appendix C.1.1. This template specifies exactly the same model as R-INLA in Section 6.2.1.1. It is not trivial to do this, because each detail of the model must match. Lines with a DATA prefix specify the fixed data inputs to be passed to TMB. For example, the data \\(\\mathbf{y}\\) are passed via: DATA_VECTOR(y); Lines with a PARAMETER prefix specify the parameters \\(\\boldsymbol{\\mathbf{\\phi}} = (\\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})\\) to be estimated. For example, the regression coefficients \\(\\boldsymbol{\\mathbf{\\beta}}\\) are specified by: PARAMETER_VECTOR(beta); It is recommended to specify all parameters on the real scale to help performance of the optimisation procedure. More familiar versions of parameters, such as the precision rather than log precision, may be created outside the PARAMETER section. Lines of the form nll -= ddist(...) increment the negative log-posterior, where dist is the name of a distribution. For example, the Gaussian prior distributions on \\(\\boldsymbol{\\mathbf{\\beta}}\\) are implemented by: nll -= dnorm(beta, Type(0), Type(100), true).sum(); In R, the TMB user template may now be compiled and linked: compile("epil.cpp") dyn.load(dynlib("epil")) An objective function obj implementing \\(\\tilde p_{\\texttt{LA}}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) and its first and second derivatives may then be created: obj <- TMB::MakeADFun( data = dat, parameters = param, random = c("beta", "epsilon", "nu"), DLL = "epil" ) The object dat is a list of data inputs passed to TMB. The object param is a list of parameter starting values passed to TMB. The argument random determines which parameters are to be integrated out with a Gaussian approximation, here set to c(\"beta\", \"epsilon\", \"nu\"). Mathematically, these parameters correspond to the latent field \\[\\begin{equation} (\\beta_0, \\beta_\\texttt{Base}, \\beta_\\texttt{Trt}, \\beta_{\\texttt{Trt} \\times \\texttt{Base}}, \\beta_\\texttt{Age}, \\beta_{\\texttt{V}_4}, \\epsilon_1, \\ldots, \\epsilon_{59}, \\nu_{1,1}, \\ldots, \\nu_{59,4}) = (\\boldsymbol{\\mathbf{\\beta}}, \\boldsymbol{\\mathbf{\\epsilon}}, \\boldsymbol{\\mathbf{\\nu}}) = \\mathbf{x}. \\end{equation}\\] The objective function obj may then be optimised using a gradient based optimiser to obtain \\(\\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\). Here I use a quasi-Newton method (Dennis Jr, Gay, and Walsh 1981) as implemented by nlminb from the stats R package, making use of the first derivative obj$gr of the objective function: opt <- nlminb( start = obj$par, objective = obj$fn, gradient = obj$gr, control = list(iter.max = 1000, trace = 0) ) The sdreport function is used to evaluate the Hessian matrix of the parameters at a particular value. Typically, these Hessian matrices are for the hyperparameters, and based on the marginal Laplace approximation. Setting par.fixed to the previously obtained opt$par returns \\(\\hat{\\boldsymbol{\\mathbf{H}}}_\\texttt{LA}\\). However, by setting getJointPrecision = TRUE the the full Hessian matrix for the hyperparameters and latent field together is returned: sd_out <- TMB::sdreport( obj, par.fixed = opt$par, getJointPrecision = TRUE ) Figure 6.6: A submatrix of the full parameter Hessian obtained from TMB::sdreport with getJointPrecision = TRUE on the log scale. Entries for the latent field parameters \\(\\boldsymbol{\\mathbf{\\epsilon}}\\) and \\(\\boldsymbol{\\mathbf{\\nu}}\\) are omitted due to their respective lengths of 56 and 236. Light grey entries correspond to zeros on the real scale, which cannot be log transformed. Note that the epilepsy GLMM may also be succinctly fit in a frequentist setting (that is, using improper hyperparameter priors \\(p(\\boldsymbol{\\mathbf{\\theta}}) \\propto 1\\)) using the formula interface provided by glmmTMB: fit <- glmmTMB( y ~ 1 + CTrt + ClBase4 + CV4 + ClAge + CBT + (1 | rand) + (1 | Ind), data = Epil, family = poisson(link = "log") ) 6.2.1.3 Gaussian marginals and AGHQ with TMB The objective function obj created in Section 6.2.1.2 may be directly passed to aghq to perform inference by integrating the marginal Laplace approximation over the hyperparameters using AGHQ. The argument k specifies the number of quadrature nodes to be used per hyperparameter dimension. Here there are two hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}} = (\\tau_\\epsilon, \\tau_\\nu)\\), and k is set to three, such that in total there are \\(3^2 = 9\\) quadrature nodes: init <- c(param$l_tau_epsilon, param$l_tau_nu) fit <- aghq::marginal_laplace_tmb(obj, k = 3, startingvalue = init) Draws from the mixture of Gaussians approximating the latent field posterior distribution (Equation (6.15)) can be obtained by: samples <- aghq::sample_marginal(aghq, M = 1000)$samps For a more complete aghq vignette, see Stringer (2021). 6.2.1.4 Laplace marginals and EB with TMB The Laplace latent field marginal \\(\\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) may be obtained using TMB by setting random to \\(\\mathbf{x}_{-i}\\) in the MakeADFun function call to approximate \\(p(\\mathbf{x}_{-i} \\, | \\,x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) with a Gaussian distribution. However, it is not directly possible to do this, because the random argument takes a vector of strings as input (e.g. c(\"beta\", \"epsilon\", \"nu\")) and does not have a native method for indexing. Instead, I took the following steps to modify the TMB C++ template and enable the desired indexing: Include DATA_INTEGER(i) to pass the index \\(i\\) to TMB via the data argument of MakeADFun. Concatenate the latent field to PARAMETER_VECTOR(x_minus_i) and PARAMETER(x_i) such that random can be set to x_minus_i in the call to MakeADFun. Include DATA_IVECTOR(x_lengths) and DATA_IVECTOR(x_starts) to pass the (integer) start point and lengths of each subvector of \\(\\mathbf{x}\\) via the data argument of MakeADFun. The \\(j\\)th subvector may then be obtained from within the TMB template via x.segment(x_starts(j), x_lengths(j)). The modified TMB C++ template epil_modified.cpp for the epilepsy GLMM is in Appendix C.1.2, and may be compared to the unmodified version to provide an example of implementing the above steps. After suitable alterations are made to dat and param, it is then possible to obtain the desired objective function in TMB via: compile("epil_modified.cpp") dyn.load(dynlib("epil_modified.cpp")) obj_i <- MakeADFun( data = dat, parameters = param, random = "x_minus_i", DLL = "epil_modified", silent = TRUE, ) This section takes an EB approach, fixing the hyperparameters to their modal value \\(\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}\\) obtained previously in opt. The latent field marginals approximation is then directly proportional to the unnormalised Laplace approximation obtained above as obj_i, evaluated at \\((x_i, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA})\\) \\[\\begin{align} \\tilde p(x_i \\, | \\, \\mathbf{y}) &\\approx \\tilde p_\\texttt{LA}(x_i \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}) \\tilde p_\\texttt{LA}(\\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA} \\, | \\, \\mathbf{y}) \\\\ &\\propto \\tilde p_\\texttt{LA}(x_i, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}). \\end{align}\\] This expression may be evaluated at a set of GHQ nodes \\(z \\in \\mathcal{Q}(1, l)\\) adapted \\(z \\mapsto x_i(z)\\) based on the mode and standard deviation of the Gaussian marginal. Here, \\(l = 5\\) quadrature nodes were chosen to allow spline interpolation of the resulting log-posterior. Each evaluation of obj_i, which involves an inner optimisation loop to compute the Laplace approximation, can be initialised by \\(\\mathbf{x}_{-i}\\) set to the mode of the full \\(N\\)-dimensional Gaussian approximation \\(p_\\texttt{G}(\\mathbf{x} \\, | \\, \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y})\\) with the \\(i\\)th entry removed \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_{-i}\\). This is an efficient approach because the \\((N - 1)\\)-dimensional posterior mode, with \\(x_i\\) fixed, is likely to be similar to the \\(N\\)-dimensional posterior mode with the \\(i\\)th entry removed. A normalised posterior can be obtained by computing a de novo posterior normalising constant based on the set of evaluated \\(l\\) quadrature nodes. This approach requires creation of the objective function obj_i for \\(i = 1, \\ldots, N\\). Each of these functions are then evaluated at a set of \\(l\\) quadrature nodes. It is inefficient to run MakeADFun from scratch for each \\(i\\), when only one data input i is changing. TMB does have a DATA_UPDATE macro, which would allow changing of data “on the R side” without retaping via: obj_i$env$data$i <- i Although this approach would be more efficient, if else statements on data items which can be updated (as used in epil_modified.cpp) are not supported, so this is not yet possible. 6.2.1.5 Laplace marginals and AGHQ with TMB The approach taken in Section 6.2.1.4 may be extended by integrating the marginal Laplace approximation with respect to the hyperparameters. To perform this integration, the quadrature nodes used to integrate \\(p_\\texttt{LA}({\\boldsymbol{\\mathbf{\\theta}}}, \\mathbf{y})\\) may be reused. The latent field marginal approximation is then \\[\\begin{equation} \\tilde p(x_i \\, | \\, \\mathbf{y}) \\propto \\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y}) \\omega(\\mathbf{z}). \\end{equation}\\] As in Section 6.2.1.4 this expression may be evaluated at a set of \\(l\\) quadrature nodes, and normalised de novo. Each objective function inner optimisation can be initialised using the mode \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}))_{-i}\\) of \\(p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{z}), \\mathbf{y})\\). Integration over the hyperparameters requires each of the \\(N\\) objective functions to be evaluated at \\(k \\times l\\) points, rather than the \\(1 \\times l\\) points required in the EB approach. The complete algorithm is given in Appendix C.3. 6.2.1.6 NUTS with tmbstan Running NUTS with tmbstan using the objective function obj is easy to do: fit <- tmbstan::tmbstan(obj = obj, chains = 4, laplace = FALSE) As specified above, the objective function with no marginal Laplace approximation is used. To instead use the marginal Laplace approximation, set laplace = TRUE. Four chains of 2000 iterations, with the first 1000 iterations from each chain discarded as warm-up, were run. Convergence diagnostics are in Appendix C.1.4.1. 6.2.1.7 NUTS with rstan For interest in the relative inefficiency of tmbstan, the epilepsy model was also implemented in Stan. The Stan C++ template epil.stan for the epilepsy GLMM is in Appendix C.1.3. This may be of interest to users familiar with Stan syntax, to help provide context for TMB. The Stan template was validated as be equivalent to the TMB template up to a constant of proportionality. Inferences from Stan may be obtained by fit <- rstan::stan(file = "epil.stan", data = dat, chains = 4) Like for tmbstan, four chains of 2000 iterations, with 1000 iterations of burn-in, were run. Convergence diagnostics are in Appendix C.1.4.2. 6.2.1.8 Comparison Figure 6.7: Percentage difference in posterior summary estimate obtained from NUTS as compared to that obtained from a Gaussian (Section 6.2.1.3) or Laplace marginal (Section 6.2.1.5) with AGHQ over the hyperparameters. NUTS results were obtained with tmbstan. Results from R-INLA and TMB are similar, especially for the posterior mean, but do differ in places. Differences could be attributable to bias corrections used in R-INLA. Figure 6.8: The ECDF and ECDF difference for the \\(\\beta_0\\) latent field parameter. For this parameter, the Gaussian marginal results are inaccurate, and are corrected almost entirely by the Laplace marginal. An ECDF difference of zero corresponds to obtaining exactly the same results as NUTS, taken to be the gold-standard. Crucially, results obtained using R-INLA and TMB implementations are similar. Posterior means and standard deviations for the the six regression parameters \\(\\boldsymbol{\\mathbf{\\beta}}\\) from the inference methods implemented in TMB (Section 6.2.1.2, 6.2.1.3, 6.2.1.3, 6.2.1.5) were highly similar to their R-INLA analogues in Section 6.2.1.1 (Figure 6.7). Posterior distributions obtained were also similar. Figure 6.8 shows ECDF difference plots for Gaussian or Laplace marginals from TMB and R-INLA (as compared with results from NUTS implemented in tmbstan) for \\(\\beta_0\\). These results provide evidence that the implementation of INLA in TMB is correct. Figure 6.9: The number of seconds taken to perform inference for the epilepsy GLMM using each method and software implementation given in Table 6.1. Figures 6.9 shows the number of seconds taken to fit the epilepsy GLMM model for each approach. Gaussian marginals with either EB or AGHQ via TMB were the fastest approach. All of the approaches using R-INLA took a similar amount of time. The approaches using TMB to implement Laplace marginals were slower than their equivalent in R-INLA. The TMB implementation is relatively naive, based on a simple for loop, and does not use the more advanced approximations of R-INLA. Laplace marginals in TMB with AGHQ (\\(k^2 = 3^2 = 9\\) quadrature nodes) took 3.4 times as long as Laplace marginals in TMB with EB (\\(k^2 = 1^2 = 1\\) quadrature node). For this problem, the tmbstan implementation of NUTS took 38.9% of time of the rstan implementation. Diagnostics (Figures C.1 and C.2) show that both implementations converged. Monnahan and Kristensen (2018) (Supporting information) found runtime with rstan and tmbstan to be comparable, so the relatively large difference in this case is surprising. 6.2.2 Loa loa ELGM Figure 6.10: Empirical prevalence of Loa loa in 190 sampled villages in Cameroon and Nigeria. The map in Panel A shows the village locations, empirical prevalences, presence of zeros, and sample sizes. The zeros are typically located in close proximity to each other. The histogram in Panel B shows the empirical prevalences, and high number of zeros. Bilodeau, Stringer, and Tang (2022) considered a ELGM for the prevalence of the parasitic worm Loa loa. Counts of cases \\(y_i \\in \\mathbb{N}^{+}\\) from a sample of size \\(n_i \\in \\mathbb{N}^{+}\\) were obtained from field studies in \\(n = 190\\) villages in Cameroon and Nigeria [Schlüter et al. (2016); Figure 6.10]. Some areas are thought to be unsuitable for disease transmission, and possibly as a result there are relatively high number of villages with zero prevalence. To account for the possibility of structural zeros, following Diggle and Giorgi (2016), a zero-inflated binomial likelihood was used \\[\\begin{equation} p(y_i) = (1 - \\phi(s_i)) \\mathbb{I}(y_i = 0) + \\phi(s_i) \\text{Bin}(y_i \\, | \\, n_i, \\rho(s_i)) \\tag{6.20} \\end{equation}\\] where \\(s_i \\in \\mathbb{R}^2\\) is the village location, \\(\\phi(s_i) \\in [0, 1]\\) is the suitability probability, and \\(\\rho(s_i) \\in [0, 1]\\) is the disease prevalence. The prevalence and suitability were modelled jointly using logistic regressions \\[\\begin{align} \\text{logit}[\\phi(s)] &= \\beta_\\phi + u(s), \\\\ \\text{logit}[\\rho(s)] &= \\beta_\\rho + v(s). \\end{align}\\] The two regression coefficients \\(\\beta_\\phi\\) and \\(\\beta_\\rho\\) were given diffuse Gaussian prior distributions \\[\\begin{equation} \\beta_\\phi, \\beta_\\rho \\sim \\mathcal{N}(0, 1000). \\end{equation}\\] Independent Gaussian processes \\(u(s)\\) and \\(v(s)\\) were specified by a Matérn kernel (Stein 1999) with shared hyperparameters. Gamma penalised complexity (Simpson et al. 2017; Fuglstad et al. 2019) prior distributions were used for the standard deviation \\(\\sigma\\) and range \\(\\rho\\) hyperparameters such that (Brown 2015) \\[\\begin{align} \\mathbb{P}(\\sigma < 4) &= 0.975, \\\\ \\mathbb{P}(\\rho < 200\\text{km}) &= 0.975. \\end{align}\\] The smoothness parameter \\(\\nu\\) was fixed to 1. The zero-inflated likelihood in Equation (6.20) is not compatible with R-INLA. Section 2.2 of Brown (2015) demonstrates use of R-INLA to fit a simpler LGM model which includes covariates. Instead, Bilodeau, Stringer, and Tang (2022) implemented this model in TMB. Inference was then performed using Gaussian marginals and AGHQ via aghq and NUTS via tmbstan. This section considers inference using three approaches (Table 6.2), extending Bilodeau, Stringer, and Tang (2022) by including AGHQ with Laplace marginals. Table 6.2: The inference methods and software considered to fit the Loa loa ELGM in Section 6.2.2. Method Software Details Gaussian, AGHQ TMB and aghq \\(k = 3\\) Laplace, AGHQ TMB and aghq \\(k = 3\\) NUTS tmbstan 4 chains of 5000 iterations, with default NUTS settings as implemented in rstan (Carpenter et al. 2017) Bilodeau, Stringer, and Tang (2022) found that NUTS did not converge for the full model, but did converge when the values of \\(\\beta_\\phi\\) and \\(\\beta_\\rho\\) were fixed at their posterior mode (obtained using AGHQ with Gaussian marginals). To allow for comparison between Gaussian and Laplace marginals, the same approach was taken here. After obtaining posterior inferences at each \\(s_i\\), the gstat::krige function (E. J. Pebesma 2004) was used to implement conditional Gaussian field simulation [E. Pebesma and Bivand (2023); Chapter 12] over a fine spatial grid. Independent latent field and hyperparameter samples were used in each conditional simulation. For each method (Table 6.2) 500 conditional Gaussian field simulations were obtained. 6.2.2.1 Results Figure 6.11: Posterior mean of the suitability \\(\\mathbb{E}[\\phi_\\texttt{LA}(s)]\\) (Panel A) and prevalence \\(\\mathbb{E}[\\rho_\\texttt{LA}(s)]\\) (Panel B) random fields computed using Laplace marginals. Inferences over this fine spatial grid were using conditional Gaussian field simulation as implemented by gstat::krige. Figure 6.11 shows the suitability and prevalence posterior means across the fine grid obtained using AGHQ with Laplace marginals. Figure 6.12: Difference between the suitability posterior means with Gaussian marginals \\(\\mathbb{E}[\\phi_\\texttt{G}(s)]\\) and Laplace marginals \\(\\mathbb{E}[\\phi_\\texttt{LA}(s)]\\) to NUTS results. While the Gaussian approximation appears to systematically underestimate suitability, results from the Laplace approximation are substantially closer to results from NUTS. As \\(\\beta_\\phi\\) was fixed then differences in approximation accuracy between the Gaussian and Laplace approximations of \\(\\phi(s)\\) are due only to differences in estimation of \\(u(s)\\). The diverging colour palette used in this figure is from Thyng et al. (2016). Figure 6.13: Difference between the prevalence posterior means with Gaussian marginals \\(\\mathbb{E}[\\rho_\\texttt{G}(s)]\\) and Laplace marginals \\(\\mathbb{E}[\\rho_\\texttt{LA}(s)]\\) to NUTS results. Like the suitability in Figure 6.12, the error the the Gaussian approximation is higher than that of the Laplace approximation. As \\(\\beta_\\rho\\) was fixed this difference is as a result in differences in estimation of \\(v(s)\\). The diverging colour palette used in this figure is from Thyng et al. (2016). Figure 6.14: Absolute difference between the Gaussian and Laplace marginal posterior means and standard deviations to NUTS results at each \\(u(s_i), v(s_i): i \\in [190]\\). Relative differences are in Figure C.4. For close to every node, the Laplace approximation produced a more accurate posterior mean than the Gaussian approximation. For the posterior standard deviation (SD), the picture was more mixed. Figure 6.15: The element of the latent field with maximum difference in absolute difference to NUTS for the posterior mean was \\(u_{184}\\). While the Gaussian approximation has substantial error as compared with NUTS, the Laplace approximation is a close match. For both the suitability and prevalence posterior mean, using Laplace marginals rather than Gaussian marginals substantially reduced error compared to NUTS (Figures 6.12 and 6.13). As the hyperparameter posteriors for each approach were the same, differences in Gaussian field simulation results were due to differences in latent field posterior marginals at each of the 190 sites, shown in Figure 6.14. At some sites, the differences in ECDF were substantial (Figure 6.15). This improvement is even given that draws from the Laplace marginals do not take posterior dependences into account like the draws from the mixture of Gaussians used to construct the Gaussian marginals. Figure C.3 shows that the results from NUTS were suitable for use, and therefore that this comparison is valid. Figure 6.16: The number of minutes taken to perform inference for the Loa loa ELGM using each approach given in Table 6.2. Laplace marginals with AGHQ took 12% of time taken (23.1 hours) by NUTS (Figure 6.16). That said, Gaussian marginals with AGHQ took less than a minute to run: substantially less than the 2.77 hours taken by the Laplace marginals. A less naive Laplace implementation may achieve a runtime more competitive to the Gaussian. 6.3 The Naomi model The work in this chapter was conducted in search of a fast and accurate Bayesian inference method for the Naomi model (Eaton et al. 2021). Software has been developed for Naomi to allow countries to input their data and interactively generate estimates during week long workshops as a part of a yearly process supported by UNAIDS. Generation of estimates by country teams, rather than external agencies or researchers, is an important and distinctive feature of the HIV response. Drawing on expertise closest to the data being modelled improves the accuracy of the process, as well as strengthening trust in the resulting estimates, creating a virtuous cycle of data quality, use and ownership (Noor 2022). To allow interactive review and iteration of model results by workshop participants, any inference procedure for Naomi should ideally be fast and have low memory usage. Additionally, it should be reliable and automatic, across a range of country settings. Naomi is a complex model, comprised of multiple linked generalized linear mixed models (GLMMs), and as such these requirements present a challenging Bayesian inference problem. This section begins (Section 6.3.1) by describing a simplified version of Naomi. The model is simplified in that it is defined only at the time of the most recent household survey with HIV testing. The nowcasting and temporal projection components of the complete model are omitted. These time points play a limited role in inference as they correspond to a small proportion of the total data. As such, findings about inference for the simplified model are likely transferable to the complete model. Description of some features of the simplified model is left to the more exhaustive Appendix C.4. After outlining the model, Section 6.3.2 explains why it is an ELGM (Stringer, Brown, and Stafford 2022) rather than an LGM (Håvard Rue, Martino, and Chopin 2009). 6.3.1 Model structure Naomi synthesises data from three sources to estimate HIV indicators at a district-level, by age and sex. It may be described as having three components, corresponding to these three data sources. The model components are: the household survey component (Section 6.3.1.2); the antenatal care (ANC) clinic testing component (Section 6.3.1.4); the antiretroviral therapy (ART) attendance component (Section 6.3.1.4). After specifying common notation used throughout the model (Section 6.3.1.1) each of these components is described in turn. 6.3.1.1 Notation Consider a country in sub-Saharan Africa where a household survey with complex design has taken place. Let \\(x \\in \\mathcal{X}\\) index district, \\(a \\in \\mathcal{A}\\) index five-year age group, and \\(s \\in \\mathcal{S}\\) index sex. For ease of notation, let \\(i\\) index the finest district-age-sex division included in the model. (A district-age-sex specific quantity \\(z_{x,a,s}\\) may then be written as \\(z_i\\). When required the district, age, and sex corresponding to the index \\(i\\) may be recovered by \\(x(i) = x\\), \\(a(i) = a\\), and \\(s(i) = s\\).) Let: \\(N_i \\in \\mathbb{N}\\) be the known, fixed population size; \\(\\rho_i \\in [0, 1]\\) be the HIV prevalence; \\(\\alpha_i \\in [0, 1]\\) be the ART coverage; \\(\\kappa_i \\in [0, 1]\\) be the proportion recently infected among HIV positive persons; \\(\\lambda_i > 0\\) be the annual HIV incidence rate. Some observations are made at an aggregate level over a collection of strata \\(i\\) rather than for a single \\(i\\). Let \\(I \\subseteq \\mathcal{X} \\times \\mathcal{A} \\times \\mathcal{S}\\) be a set of indices \\(i\\) for which an aggregate observation is reported. The set of all \\(I\\) is denoted \\(\\mathcal{I}\\) such that \\(I \\in \\mathcal{I}\\). 6.3.1.2 Household survey component Independent logistic regression models are specified for HIV prevalence and ART coverage in the general population. Without giving the linear predictors in detail, these models are specified by \\[\\begin{equation} \\text{logit}(\\rho_i) = \\eta^\\rho_i, \\tag{6.21} \\end{equation}\\] and \\[\\begin{equation} \\text{logit}(\\alpha_i) = \\eta^\\alpha_i. \\tag{6.22} \\end{equation}\\] HIV incidence rate is modelled on the log scale as \\[\\begin{equation} \\log(\\lambda_i) = \\eta^\\lambda_i. \\tag{6.23} \\end{equation}\\] The structured additive predictor \\(\\eta^\\lambda_i\\) includes terms for adult HIV prevalence and adult ART coverage. The proportion recently infected among HIV positive persons is linked to HIV incidence via \\[\\begin{equation} \\kappa_i = 1- \\exp \\left( - \\lambda_i \\cdot \\frac{1 - \\rho_i}{\\rho_i} \\cdot (\\Omega_T - \\beta_T) - \\beta_T \\right), \\tag{6.24} \\end{equation}\\] where the mean duration of recent infection \\(\\Omega_T\\) and the proportion of long-term HIV infections misclassified as recent \\(\\beta_T\\) are set based on informative priors for the particular HIV test used. The three processes in Equations (6.21), (6.22), and (6.23) are each primarily informed by household survey data. Let \\(j\\) denote a surveyed individual, in district-age-sex strata \\(i(j)\\). Weighted aggregate survey observations are calculated based on individual responses \\(\\theta_j \\in \\{0, 1\\}\\) as \\[\\begin{equation} \\hat \\theta_I = \\frac{\\sum_{i(j) \\in I} w_j \\cdot\\theta_j}{\\sum_{i(j) \\in I} w_j}, \\tag{6.25} \\end{equation}\\] Survey weights \\(w_j\\) for each of \\(\\theta \\in \\{\\rho, \\alpha, \\kappa\\}\\) are supplied by the survey provider. These weights aim to reduce bias by decreasing possible correlation between response and recording mechanism (Meng 2018). The weighted aggregate number of outcomes are obtained by multiplying Equation (6.25) by the Kish effective sample size [ESS; Kish (1965)] \\[\\begin{equation} y^{\\theta}_{I} = m^{\\theta}_{I} \\hat \\theta_{I}, \\tag{6.26} \\end{equation}\\] where \\[\\begin{equation} m^{\\theta}_I = \\frac{\\left(\\sum_{i(j) \\in I} w_j\\right)^2}{\\sum_{i(j) \\in I} w_j^2}. \\tag{6.27} \\end{equation}\\] As the Kish ESS is maximised by constant survey weights, in exchange for reducing bias, survey weighting increases variance. Equations (6.25) and (6.27) are slightly imprecise in the notation used does not reflect the fact that \\(j\\) only runs over individuals within the relevant denominator. In particular, for ART coverage \\(\\alpha\\) and the proportion recently infected among HIV positive persons \\(\\kappa\\), only those individuals who are HIV positive are included in the set. The denominator for HIV prevalence \\(\\rho\\) includes all individuals. The weighted aggregate number of outcomes are modelled using a binomial working likelihood (Chen, Wakefield, and Lumely 2014) defined to operate on the reals \\[\\begin{equation} y^{\\theta}_{I} \\sim \\text{xBin}(m^{\\theta}_{I}, \\theta_{I}). \\tag{6.28} \\end{equation}\\] The terms \\(\\theta_{I}\\) are the following weighted aggregates \\[\\begin{equation} \\rho_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i}{\\sum_{i \\in I} N_i}, \\quad \\alpha_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i \\alpha_i}{\\sum_{i \\in I} N_i \\rho_i}, \\quad \\kappa_{I} = \\frac{\\sum_{i \\in I} N_i \\rho_i \\kappa_i}{\\sum_{i \\in I} N_i \\rho_i}, \\tag{6.29} \\end{equation}\\] where the denominators of \\(\\alpha_{I}\\) and \\(\\kappa_{I}\\) reflect their restriction to HIV positive persons. 6.3.1.3 ANC testing component Women attending ANC clinics are routinely tested for HIV, to help prevent mother-to-child transmission. HIV prevalence \\(\\rho^\\text{ANC}_i \\in [0, 1]\\) and ART coverage \\(\\alpha^\\text{ANC}_i \\in [0, 1]\\) among pregnant women are modelled as offset from the general population indicators. (For \\(s(i)\\) male, these quantities are not defined.) Again not detailing the linear predictors, the model is of the form \\[\\begin{align} \\text{logit}(\\rho^\\text{ANC}_i) &= \\text{logit}(\\rho_i) + \\eta^{\\rho^\\text{ANC}}_i, \\\\ \\text{logit}(\\alpha^\\text{ANC}_i) &= \\text{logit}(\\alpha_i) + \\eta^{\\alpha^\\text{ANC}}_i. \\end{align}\\] The terms \\(\\eta^{\\rho^\\text{ANC}}_i\\) and \\(\\eta^{\\alpha^\\text{ANC}}_i\\) can be interpreted as the differences in HIV prevalence and ART coverage between pregnant women attending ANC, and the general population. As such, both the household survey data informs ANC indicators, and the ANC indicator informs general population indicators. These two processes are informed by likelihoods specified for aggregate ANC clinic data from the year of the most recent survey. Let: the number of ANC clients with ascertained status be fixed as \\(m^{\\rho^\\text{ANC}}_I\\); the number of those with positive status are \\(y^{\\rho^\\text{ANC}}_I \\leq m^{\\rho^\\text{ANC}}_I\\); the number of those already on ART prior to their first ANC visit are \\(y^{\\alpha^\\text{ANC}}_I \\leq y^{\\rho^\\text{ANC}}_I\\). These data are modelled using nested binomial likelihoods \\[\\begin{align*} y^{\\rho^\\text{ANC}}_I &\\sim \\text{Bin}(m^{\\rho^\\text{ANC}}_I, \\rho^\\text{ANC}_{I}), \\\\ y^{\\alpha^\\text{ANC}}_I &\\sim \\text{Bin}(y^{\\rho^\\text{ANC}}_I, \\alpha^\\text{ANC}_{I}). \\end{align*}\\] It is not necessary to use an extended binomial working likelihood, as in Section 3.5, because the ANC data are not survey weighted and therefore are integer valued. Analogous to Equation (6.29) in the household survey component, the weighted aggregates used here are \\[\\begin{equation*} \\rho^\\text{ANC}_{I} = \\frac{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC}}{\\sum_{i \\in I} \\Psi_i}, \\quad \\alpha^\\text{ANC}_{I} = \\frac{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC} \\alpha^\\text{ANC}_i}{\\sum_{i \\in I} \\Psi_i \\rho_i^\\text{ANC}}, \\end{equation*}\\] where \\(\\Psi_i\\) are the number of pregnant women, which are assume to be fixed. 6.3.1.4 ART attendance component Data on attendance of ART clinics are routinely collected. These data provide helpful information about HIV prevalence and coverage of ART, but are challenging to use because people living with HIV sometimes choose to access ART services outside of the district that they reside in. (Indeed, this section of the model remains a challenge, and is under active development (Esra et al. 2024).) Multinomial logistic regression equations are used to model the probabilities of individuals accessing treatment outside their home district. Briefly, let \\(\\gamma_{x, x'}\\) be the probability that a person on ART residing in district \\(x\\) receives ART in district \\(x'\\). These probabilities are set to \\(\\gamma_{x, x'} = 0\\) unless \\(x = x'\\) or the two districts are neighbouring such that \\(x \\sim x'\\). As such, it is assumed that no one travels beyond their district or its immediate neighbours to receive ART services. (Of course, in reality this assumption is violated.) The log-odds are modelled using a structured additive predictor which only depends on the home district \\(x\\) \\[\\begin{equation} \\tilde \\gamma_{x, x'} = \\text{logit}(\\gamma_{x, x'}) = \\eta_{x}^{\\tilde \\gamma}. \\end{equation}\\] As a result, it is assumed that travel to each neighbouring district, for all age-sex strata, is equally likely. Let the number of people observed receiving ART in strata \\(i\\) be \\(y^{A}_i\\) with corresponding aggregate \\[\\begin{equation} y^{A}_I = \\sum_{i \\in I} y^{A}_i. \\tag{6.30} \\end{equation}\\] Let the probability of a person in strata \\(i\\) travelling from district \\(x(i) = x\\) to \\(x'\\) to receive ART be \\[\\begin{equation} \\pi_{i, x(i) = x, x'} = \\rho_{i} \\alpha_{i} \\gamma_{x(i) = x, x'}. \\tag{6.31} \\end{equation}\\] These probabilities are the product of three probabilities, each for a person in strata \\(i\\): the probability of a having HIV \\(\\rho_{i}\\), the probability of taking ART \\(\\alpha_{i}\\), the probability of travelling from district \\(x(i) = x\\) to district \\(x'\\) to receive ART \\(\\gamma_{x(i) = x, x'}\\). Let the unobserved count of people in strata \\(i\\) who travel to \\(x'\\) to receive ART be \\(A_{i, x(i) = x, x'}\\), such that \\[\\begin{equation} A_i = \\sum_{x' \\sim x, x' = x} A_{i, x(i) = x', x}. \\end{equation}\\] Each unobserved count can be considered as arising from a binomial distribution, with sample size given by the population in strata \\(i\\), here with \\(x(i) = x'\\) such that \\[\\begin{equation} A_{i, x(i) = x', x} \\sim \\text{Bin}(N_{i, x(i) = x'}, \\pi_{i, x(i) = x', x}). \\end{equation}\\] Each aggregate attendance observation (Equation (6.30)) is modelled using a Gaussian approximation to a sum of binomials. This sum is over both the strata \\(i \\in I\\) and the number of ART clients travelling from district \\(x(i) = x'\\) to \\(x\\) to receive treatment. The Gaussian approximation is \\[\\begin{equation} y^{A}_I \\sim \\mathcal{N}(\\mu^A_I, {\\sigma^A_I}^2), \\end{equation}\\] where the mean is \\[\\begin{equation} \\mu^A_I = \\sum_{i \\in I} \\sum_{x' \\sim x, x' = x} N_{i, x(i) = x'} \\cdot \\pi_{i, x(i) = x', x}, \\tag{6.32} \\end{equation}\\] and the variance is \\[\\begin{equation} {\\sigma^A_I}^2 = \\sum_{i \\in I} \\sum_{x' \\sim x, x' = x} N_{i, x(i) = x'} \\cdot \\pi_{i, x(i) = x', x} \\cdot (1 - \\pi_{i, x(i) = x', x}). \\tag{6.33} \\end{equation}\\] Equations (6.32) and (6.33) are based on a Gaussian approximation to the binomial distribution \\(\\text{Bin}(n, p)\\) with mean \\(np\\) and variance \\(np(1 - p)\\), together with the equations for a linear combination of Gaussian random variables. 6.3.2 Naomi as an ELGM In all, Naomi is a joint model on the observations \\[\\begin{equation} \\mathbf{y} = (y^{\\theta}_I), \\quad \\theta \\in \\{\\rho, \\alpha, \\kappa, \\rho^\\text{ANC}, \\alpha^\\text{ANC}, A\\}, \\quad I \\in \\mathcal{I}. \\end{equation}\\] The observations are modelled using the structured additive predictor \\(\\boldsymbol{\\mathbf{\\eta}}\\), which includes intercept effects, age random effects, and spatial random effects which may be concatenated into the latent field \\(\\mathbf{x}\\). The latent field is controlled by hyperparameters \\(\\boldsymbol{\\mathbf{\\theta}}\\) which include standard deviations, first-order autoregressive model correlation parameters, and reparameterised Besag-York-Mollie model [BYM2; Simpson et al. (2017)] proportion parameters. These features are described in more detail in Appendix C.4. Naomi has a large Gaussian latent field, governed by a smaller number of hyperparameters \\(m < N\\). However, it has complexities which place it outside the class of LGMs, as defined in Section 3.3.4. Instead, it is an ELGM, as defined in Section 3.3.5. In an ELGM, each mean response is allowed to depend non-linearly upon more than one structured additive predictor. The departures of Naomi from the LGM framework are enumerated below. When dependence on a specific number of structured additive predictors is given, it is in isolation, rather than in conjunction. Throughout Naomi, processes are modelled at the finest district-age-sex division \\(i\\), but likelihoods are defined for observations aggregated over sets of indices \\(i \\in I\\). As such, these aggregate observations are related to \\(|I|\\) structured additive predictors, rather than just one. Multiple link functions are used in Naomi, such that there is no one inverse link function \\(g\\) as specified in definition of an LGM. This is a relatively minor point, and it is possible to specify models with several likelihoods in R-INLA by setting family to be vector valued [Section 6.4; Gómez-Rubio (2020)]. In Section 6.3.1.2, HIV incidence depends on district-level adult HIV prevalence and ART coverage (Equation (C.4))). Each \\(\\log(\\lambda_i)\\) therefore depends on 28 structured additive predictors, where 28 arises by the product of 2 sexes (male and female), 7 age groups (\\(\\{\\text{15-19}, \\ldots, \\text{45-49}\\}\\)), and 2 indicators, HIV prevalence and ART coverage. This reflects basic HIV epidemiology: incidence of sexually transmitted HIV is proportional to unsuppressed viral load among an individual’s potential sexual partners. The district-level adult averages are used as a proxy. In Section 6.3.1.2, the proportion recently infected \\(\\kappa_i\\) is given by a non-linear function (Equation (6.24)) of HIV incidence \\(\\lambda_i\\), HIV prevalence \\(\\rho_i\\), mean duration of recent infection \\(\\Omega_T\\) and proportion of long-term HIV infections misclassified as recent \\(\\beta_T\\). Though arguably a contorting of the ELGM framework, by considering \\(\\Omega_T\\) and \\(\\beta_T\\) as (Gaussian) linear predictors, then each \\(\\kappa_i\\) depends on four structured additive predictors. In Section 6.3.1.3, HIV prevalence and ART coverage among pregnant women are modelled as offset from their respective indicators in the general population. Thus each mean response depends on two structured additive predictors. The copy feature in R-INLA [Section 6.5; Gómez-Rubio (2020)] allows for this type of model structure. In Section 6.3.1.3, nested binomial likelihoods are used. In Section 6.3.1.4 a multinomial model with softmax link function is used. The multinomial likelihood takes as input \\(|\\{x': x' \\sim x\\}| + 1\\) structured additive predictors, one for each neighbouring district plus one for remaining in the home district. In Section 6.3.1.4 the probability of an individual receiving ART in a given district is the product of three probabilities. Though intended for use with LGMs, the advanced features of R-INLA [Chapter 6; Gómez-Rubio (2020)] allow for fitting of some ELGMs as described above. In some sense then, the above exercise is mostly academic rather than practical. The crux is that Naomi cannot be fit using R-INLA because it is not possible to specify such a complex model using a formula interface. The limitations of modelling with formula interfaces are not unique to R-INLA. Indeed, any such statistical software will see requests for users for additional features. The practical impossibility of meeting all feature requests motivates a more universal INLA implementation (Section 6.2) for advanced users. 6.4 AGHQ in moderate dimensions Inference for the Naomi model was previously conducted using a marginal Laplace approximation, and optimisation over the hyperparameters, implemented using TMB. This approach was illustrated for the epilepsy example in Section 6.2.1.2 and is analogous for Naomi. It would be desirable to instead integrate with respect to the hyperparameters, taking an INLA-like approach as described in Section 6.1.3. Section 6.2 attends to part of the challenge, by developing INLA methods which compatible with the Naomi model log-posterior as implemented in TMB. However, naive quadrature methods are not directly applicable to Naomi. This is because Naomi has \\(m = 24\\) hyperparameters. Although \\(m = 24\\) cannot be described as high-dimensional, it is certainly more than the \\(m < 4\\) or so hyperparameters typical for use of INLA. Hence here the term moderate-dimensional is used. Naive use of AGHQ with the product rule requires evaluation of \\(|\\mathcal{Q}(m, k)| = k^m\\) quadrature points. This would be intractable for \\(m = 24\\) and any \\(k > 1\\). As a result, a quadrature rule which does not scale exponentially is required for integrating out the Naomi model hyperparameters. This section focuses on the development of an AGHQ rule for moderate dimension, for use within an inference procedure for the Naomi model. Though the rule is to be applied within a nested Laplace approximation approach, it is not limited to this setting. 6.4.1 AGHQ with variable levels Rather than having the same number of quadrature nodes for each dimension of \\(\\boldsymbol{\\mathbf{\\theta}}\\), it is possible to use a variable number of nodes per dimension. In line with the terminology used in the mvQuad package, the number of nodes per dimension are referred to as “levels”. Let \\(\\mathbf{k} = (k_1, \\ldots, k_m)\\) be a vector of levels, where each \\(k_j \\in \\mathbb{Z}^+\\). A GHQ grid with (potentially) variable levels is then given by \\[\\begin{equation} \\mathcal{Q}(m, \\mathbf{k}) = \\mathcal{Q}(1, k_1) \\times \\cdots \\times \\mathcal{Q}(1, k_m). \\end{equation}\\] The size of this grid is given by the product of the levels \\(|\\mathcal{Q}(m, \\mathbf{k})| = \\prod_{j = 1}^m k_j\\). The corresponding weighting function is given by \\[\\begin{equation} \\omega(\\mathbf{z}) = \\prod_{j = 1}^m \\omega_{k_j}(z_j). \\end{equation}\\] This expression is a product of the univariate weighting functions for the relevant GHQ rule with \\(k_j\\) nodes. 6.4.2 Principal components analysis A special case of the variable levels approach above is to set the first \\(s \\leq m\\) levels to be \\(k\\) and the remaining \\(m - s \\geq 0\\) levels to be one. Denote \\(\\mathcal{Q}(m, s, k)\\) to be \\(\\mathcal{Q}(m, \\mathbf{k})\\) with levels \\(k_j = k, j \\leq s\\) and \\(k_j = 1, j > s\\) for some \\(s \\leq m\\). For example, for \\(m = 2\\) and \\(s = 1\\) then \\(\\mathbf{k} = (k, 1)\\). When the spectral decomposition is used to adapt the quadrature nodes, this choice of levels is analogous to principal components analysis (PCA). Figure 6.17 illustrates PCA-AGHQ for a case when \\(m = 2\\) and \\(s = 1\\). Since AGHQ with \\(k = 1\\) corresponds to the Laplace approximation, PCA-AGHQ can be interpreted as performing AGHQ on the first \\(s\\) principal components of the inverse curvature, and a Laplace approximation on the remaining \\(m - s\\) principal components. As such, it may be argued that PCA-AGHQ provides a natural compromise between the EB and AGHQ integration strategies. For concreteness, the normalising constant obtained by application of PCA-AGHQ to integration of the marginal Laplace approximation (Equation (6.12)) is given by \\[\\begin{equation} \\tilde p_\\texttt{PCA}(\\mathbf{y}) = |\\hat{\\mathbf{E}}_{\\texttt{LA}} \\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}}^{1/2}|\\sum_{\\mathbf{z} \\in \\mathcal{Q}(m, s, k)} \\tilde p_\\texttt{LA}(\\hat{\\mathbf{E}}_{\\texttt{LA}, s} \\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}, s}^{1/2} \\mathbf{z} + \\hat{\\boldsymbol{\\mathbf{\\theta}}}_\\texttt{LA}, \\mathbf{y}) \\omega(\\mathbf{z}), \\end{equation}\\] where \\(\\hat{\\mathbf{E}}_{\\texttt{LA}, s}\\) is an \\(m \\times s\\) matrix containing the first \\(s\\) eigenvectors, \\(\\hat{\\mathbf{\\Lambda}}_{\\texttt{LA}, s}\\) is the \\(s \\times s\\) diagonal matrix containing the first \\(s\\) eigenvalues, and \\[\\begin{equation} \\omega(\\mathbf{z}) = \\prod_{j = 1}^s \\omega_s(z_j) \\times \\prod_{j = s + 1}^m \\omega_1(z_j). \\end{equation}\\] Figure 6.17: Consider the function \\(f(z_1, z_2) = \\text{sn}(0.5 z_1, \\alpha = 2) \\cdot \\text{sn}(0.8 z_1 - 0.5 z_2, \\alpha = -2)\\) as described in Figure 6.3. Panel A shows the usual AGHQ nodes with a spectral matrix decomposition. Panel B shows the adapted PCA-AGHQ nodes \\(\\mathcal{Q}(2, 1, 3)\\). These nodes correspond exactly to those in Panel A along the first eigenvector. The proportion of variation explained by this direction is around 95%, with the remaining 5% explained by the second eigenvector. 6.5 Malawi case-study Figure 6.18: District-level HIV prevalence, ART coverage, and new HIV cases and HIV incidence for adults 15-49 in Malawi. Inference here was conducted using a Gaussian approximation and EB via TMB. This section presents a case-study of approximate Bayesian inference methods applied to the Naomi model in Malawi. Data from Malawi has previously been used to demonstrate the Naomi model, including as a part of the naomi R package vignette available from https://github.com/mrc-ide/naomi. Malawi was chosen for the vignette and this case-study in part because it has a small number of districts, \\(n = 30\\), limiting the computational demand of the model. Three Bayesian inference approaches were considered: Gaussian marginals and EB with TMB. This approach was previously used in production for Naomi. As short-hand, this approach is referred to as GEB. Gaussian marginals and PCA-AGHQ with TMB. This is a novel approach, enabled by the methodological work of Section 6.4. As short-hand, this approach is referred to as GPCA-AGHQ. NUTS with tmbstan. Conditional on assessing chain convergence and suitability, to be discussed in Section 6.5.1, inferences from NUTS represent a gold-standard. The TMB C++ user-template used to specify the log-posterior, described in Appendix C.4.4, was the same for each approach. The dimension of the latent field was \\(N = 467\\) and the dimension of the hyperparameters was \\(m = 24\\). For GEB and GPCA-AGHQ, hyperparameter and latent field samples were simulated following deterministic inference. For all methods, age-sex-district specific HIV prevalence, ART coverage and HIV incidence were simulated from the latent field and hyperparameter posterior samples. Model outputs from GEB are illustrated in Figure 6.18. 6.5.1 NUTS convergence and suitability The Naomi model was difficult to efficiently sample from using NUTS via tmbstan. Four chains run in parallel for 100 thousand iterations each were required to obtain acceptable NUTS diagnostics. For ease-of-storage, the samples were thinned by a factor of 20, resulting in 5000 iterations kept per chain, with the first 2500 removed as burn-in. The effective sample size ratios were typically low (Figure C.6). The lowest effective sample size was 208 (2.5% quantile 318, 50% quantile 1231, and 97.5% quantile 2776; Panel C.7A). The largest potential scale reduction factor was 1.021 (2.5% quantile 1, 50% quantile 1.003, and 97.5% quantile 1.017; Panel C.7B). Though inaccuracies remain possible, these diagnostics are sufficient to treat inferences obtained from NUTS as a gold-standard. Correlation structure in the posterior can result in sampler inefficiency. Each of the four pairs of AR1 log standard deviation \\(\\log(\\sigma)\\) and logit lag-one autocorrelation parameter \\(\\text{logit}(\\phi)\\) posteriors were positively correlated (mean absolute correlation 0.81, Figure C.8). These parameters are partially identifiable as variation can either be explained by high standard deviation and high autocorrelation or low standard deviation and low autocorrelation. On the other hand, the BYM2 log standard deviation \\(\\log(\\sigma)\\) and logit proportion parameter \\(\\text{logit}(\\phi)\\) were, as designed, more orthogonal (mean absolute correlation 0.17, Figure C.9). The informativeness of data about a parameter can be summarised by the posterior contraction (Schad, Betancourt, and Vasishth 2021) which compares the prior variance \\(\\mathbb{V}_\\text{prior}(\\phi)\\) to posterior variance \\(\\mathbb{V}_\\text{post}(\\phi)\\) via \\[\\begin{equation} c(\\phi) = 1 - \\frac{\\mathbb{V}_\\text{prior}}{\\mathbb{V}_\\text{post}(\\phi)}. \\end{equation}\\] Posterior variances were extracted from NUTS results, and prior variances obtained by simulating from a model with the likelihood component removed (Figure C.10). The average posterior contraction was positive for all latent field parameter vectors, and for the majority of hyperparameters (Figure C.11). However, for seven hyperparameters the posterior contraction was very close to zero. Furthermore, for some latent field parameter vectors, the average contraction was small. Based on this findings, these parameters may not be identifiable. 6.5.2 Use of PCA-AGHQ Figure 6.19: Under PCA, the proportion of total variation explained is given by the sum of the first \\(s\\) eigenvalues over the sum of all eigenvalues. A typical rule-of-thumb is to include dimensions sufficient to explain 90% of total variation. In this case, for computational reasons, 87% was considered sufficient. Figure 6.20: The full rank original covariance matrix (Panel A) was closely reproduced by its reduced rank (\\(s = 8\\)) matrix approximation (Panel B). For the PCA-AGHQ quadrature grid, a Scree plot based on the spectral decomposition of \\(\\hat {\\mathbf{H}}_\\texttt{LA}^{-1}\\) (as defined in Equation (6.13)) was used to select the number of principal components to keep (Figure 6.19). Keeping \\(s = 8\\) principal components was sufficient to explain 87% of total variation. The reduced rank approximation to the inverse curvature with this choice of \\(s\\) was visually similar to the full rank matrix (Figure 6.20). Figure 6.21: Each principal component loading, obtained by the eigendecomposition of the inverse curvature, gives the direction of maximum variation conditional on inclusion of each previous principal component loading. For example, the first principal component loading is a sum of log_sigma_alpha_as and logit_phi_alpha_as. The principal component (PC) loadings (Figure 6.21) provide interpretable information about which directions had the greatest variation. Many of the first PC loadings are sums of two hyperparameters. As such, there is some redundancy in the hyperparameter parameterisation, supporting the findings of Section 6.5.1 regarding correlation structure in the hyperparameter posterior. It is exactly this correlation structure that PCA, and PCA-AGHQ, looks to utilise. Figure 6.22: The grey histograms show the 24 hyperparameter marginal distributions obtained with NUTS. The green lines indicate the position of the 6561 PCA-AGHQ nodes projected onto each hyperparameter marginal. For some hyperparameters, the PCA-AGHQ nodes vary over the domain of the posterior marginal distribution, while for others they concentrate at the mode. Projecting the \\(3^8 = 6561\\) PCA-AGHQ quadrature nodes onto each hyperparameter dimension, there was substantial variation in coverage by hyperparameter (Figure 6.22). Approximately 12 hyperparameters had well covered marginals: greater than the 8 naively obtained with a dense grid, but nonetheless far fewer than the full 24. Coverage was higher among hyperparameters on the logistic scale, and lower among hyperparameters on the logarithmic scale. This discrepancy occurred due to logistic hyperparameters naturally having higher posterior marginal standard deviation than logarithmic hyperparameters (Figure C.13). 6.5.3 Time taken Figure 6.23: The number of hours taken to perform inference for the Naomi ELGM (Section 6.3.1) using each approach. Inference with NUTS took 79 hours, while inference with GPCA-AGHQ took 1.2 hours and GEB just 0.9 minutes (Figure 6.23). Both the NUTS and GPCA-AGHQ algorithms can be run under a range of settings, trading off accuracy and runtime. 6.5.4 Inference comparison Posterior inferences from GEB, GPCA-AGHQ and NUTS were compared using point estimates (Section 6.5.4.1) and distributional quantities (Section 6.5.4.2). 6.5.4.1 Point estimates Figure 6.24: The latent field posterior mean and posterior standard deviation point estimates from each inference method as compared with those from NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. For both the posterior mean and posterior standard deviation, GPCA-AGHQ reduced RMSE and MAE as compared with GEB. Latent field point estimates obtained from GPCA-AGHQ were closer to the gold-standard results from NUTS than those obtained from GEB (Figure 6.24). The root mean square error (RMSE) between posterior mean estimates from GPCA-AGHQ and NUTS (0.063) was 20% lower than that between GEB and NUTS (0.078). For the posterior standard deviation estimates, there was a substantial 60% reduction in RMSE: from 0.14 (GEB) to 0.05 (GPCA-AGHQ). However, puzzlingly, improvements in latent field estimate accuracy only transferred to model outputs to a limited extent (Figures C.15 and C.16). 6.5.4.2 Distributional quantities 6.5.4.2.1 Kolmogorov-Smirnov Figure 6.25: The average Kolmogorov-Smirnov (KS) test statistic for each latent field parameter of the Naomi model. Vectors of parameters were grouped together. For points above the dashed line at zero, performance of GEB was better. For points below the dashed line, performance of GPCA-AGHQ was better. Most notably, for the latent field parameters ui_lambda_x the test statistic for GEB was substantially higher than for GPCA-AGHQ. This parameter, of length 32, corresponds to \\(\\mathbf{u}_x^\\lambda\\) and plays a key role in the ART attendance component of the Naomi (Section 6.3.1.4). Figure 6.26: The parameter ui_lambda_x[26] had the greatest difference in KS test statistics between GEB and GPCA-AGHQ to NUTS. For this parameter, the potential scale reduction factor was 1 and effective sample size was 2100. The two-sample Kolmogorov-Smirnov (KS) test statistic (Smirnov 1948) is the maximum absolute difference between two ECDFs \\(F(\\omega) = \\frac{1}{n} \\sum_{i = 1}^n \\mathbb{I}_{\\phi_i \\leq \\omega}\\). It is a relatively stringent, worst case, measure of distance between empirical distributions. The average KS test statistic for GPCA-AGHQ (0.077) was 8.6% less than the average KS test statistic for GEB (0.084). For both GEB and GPCA-AGHQ the KS test statistic for a parameter was correlated with low NUTS ESS (Figure C.17). This may be due to by difficulties estimating particular parameters for all inference methods, or high KS values caused by NUTS inaccuracies. 6.5.4.2.2 Maximum mean discrepancy Let \\(\\Phi^{1} = \\{\\boldsymbol{\\mathbf{\\phi}}^1_i\\}_{i = 1}^n\\) and \\(\\Phi^2 = \\{\\boldsymbol{\\mathbf{\\phi}}^2_i\\}_{i = 1}^n\\) be two sets of joint posterior samples, and \\(k\\) be a kernel. The maximum mean discrepancy [MMD; Gretton et al. (2006)] is a measure of distance between joint distributions, and can be estimated empirically by samples \\[\\begin{equation} \\text{MMD}(\\Phi^1, \\Phi^2) = \\sqrt{\\frac{1}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}^1_i, \\boldsymbol{\\mathbf{\\phi}}^1_j) - \\frac{2}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}_i^1, \\boldsymbol{\\mathbf{\\phi}}_j^2) + \\frac{1}{n^2} \\sum_{i, j = 1}^n k(\\boldsymbol{\\mathbf{\\phi}}^2_i, \\boldsymbol{\\mathbf{\\phi}}^2_j)}. \\end{equation}\\] The kernel was set to \\(k(\\boldsymbol{\\mathbf{\\phi}}^1, \\boldsymbol{\\mathbf{\\phi}}^2) = \\exp(-\\sigma \\lVert \\boldsymbol{\\mathbf{\\phi}}^1 - \\boldsymbol{\\mathbf{\\phi}}^2 \\rVert^2)\\) with \\(\\sigma\\) estimated from data using the kernlab package (Karatzoglou et al. 2019). The first and third order MMD statistics for GEB were 0.08 and 0.0048. Those of GPCA-AGHQ (0.078 and 0.0044) were just 3% and 7% lower. 6.5.5 Exceedance probabilities As a more realistic use case for the Naomi model outputs, consider the two following case-studies based on exceedance probabilities. 6.5.5.1 Meeting the second 90 Ambitious targets for scaling up ART treatment have been developed by UNAIDS, with the goal of ending the AIDS epidemic by 2030 (UNAIDS 2014). Meeting the 90-90-90 fast-track target requires that 90% of people living with HIV know their status, 90% of those are on ART, and 90% of those have a suppressed viral load. Inferences from Naomi can be used to identify treatment gaps by calculating the probability that the second 90 target has been met, that is \\(\\mathbb{P}(\\alpha_i > 0.9^2 = 0.81)\\) for each strata \\(i\\). Figure 6.27: The probability each strata has met the second 90 (ART coverage above 81%) calculated using each inference method, as compared with NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. Strata probabilities of having met the second 90 target were more accurately estimated by GPCA-AGHQ than GEB (Figure 6.27). Both GPCA-AGHQ and GEB had substantial error as compared to results from NUTS, however, particularly for girls and women. This discrepancy in accuracy by sex may be caused by interactions between the household survey and ANC components of the model creating a more challenging posterior geometry. 6.5.5.2 Finding strata with high incidence Some HIV interventions are cost-effective only within high HIV incidence settings, typically defined as higher than 1% incidence per year. Inferences from Naomi can be used to calculate the probability of a strata having high incidence by evaluating \\(\\mathbb{P}(\\lambda_i > 0.01)\\). Figure 6.28: The probability each strata has high HIV incidence (above 1% per year) calculated using each inference method, as compared with NUTS. The root mean square error (RMSE) and mean absolute error (MAE) are displayed in the top left. GPCA-AGHQ gave more accurate estimates of the probability that a strata has high HIV incidence than GEB (Figure 6.28). Again, both methods had significant error. Unlike in Section 6.5.5.1, there was little difference in error by sex. 6.6 Discussion This chapter made two main contributions. First, the universal INLA implementation of Section 6.2. Second, the PCA-AGHQ rule (Sections 6.4). Both were applied to the Naomi model in Malawi in Section 6.5. These contributions are discussed in turn, before outlining suggestions for future work. 6.6.1 A universal INLA implementation Monnahan and Kristensen (2018) write that “to our knowledge, TMB is the only software platform capable of toggling between integration tools [the Laplace approximation and NUTS] so effortlessly”. Section 6.2 made important progress towards adding INLA to the integration tools easily accessible using TMB. Reaching this milestone would be of significant value to both applied and methodological researchers. The implementation is not intended to replace R-INLA, and indeed for the majority of users a formula-based interface is preferred. Both formula-based and universal statistical tools have value, as they inhabit different use-cases. For the NUTS algorithm, a universal interface is available via Stan, and packages such as brms (Bürkner 2017) and rstanarm (Goodrich et al. 2020) enable researchers to fit common models using a formula interface. Furthermore, developers of formula-based tools do have incentives to engage with the needs of their users, and indeed do so. For example, after requesting for the generalised binomial distribution used in Equation (6.28) to be included in R-INLA, a prototype version was shortly made available. That said, it is ultimately more sustainable for advanced users to have capacity to implement their own distributions and models. 6.6.2 PCA-AGHQ with application to INLA for Naomi For the simplified Naomi model applied to data from Malawi, GPCA-AGHQ more accurately inferred latent field posterior marginal distributions than GEB. However, model output posterior marginals did not see the same improvements. Approximate posterior exceedance probabilities from both GEB and GPCA-AGHQ had systematic inaccuracies as compared with NUTS. GEB and GPCA-AGHQ were substantially faster than NUTS, which took over two days to reach convergence. Inaccuracies in model outputs from GEB and GPCA-AGHQ do have potential to meaningfully mislead policy (Sections 6.5.5.1 and 6.5.5.2). As such, where possible, gold-standard NUTS results should be computed. Though NUTS is too slow to run during a workshop, it could be run afterwards. As the UNAIDS HIV estimates process occurs annually, requiring days to compute more accurate estimates is viable. That said, Malawi is one of the countries with the fewest number of districts. As NUTS took days to run in Malawi, for larger countries, with hundreds of districts, it may be impossible to run NUTS to convergence, and approximate methods may be required. To empower users, GPCA-AGHQ and NUTS could be added to the Naomi web interface (https://naomi.unaids.org) as alternatives to GEB. Analysts would be able to quickly iterate over model options using EB, before switching to a more accurate approach once they are happy with the results. PCA-AGHQ can be adjusted to suit the computational budget available by choice of the number of dimensions kept in the PCA \\(s\\) and the number of points per dimension \\(k\\). The scree plot is a well established heuristic for choosing \\(s\\). Heuristics for choosing \\(k\\) are less well established. Whether it is preferable for a given computational budget to increase \\(s\\) or increase \\(k\\) is an open question. Further strategies, such as gradually lowering \\(k\\) over the principal components, could also be considered. 6.6.3 Suggestions for future work Finally, this section presents suggestions for future work based on this chapter. Some suggestions relate more to individual contributions, others take a broader view, or relate to multiple contributions. 6.6.3.1 Further comparisons Comparison to further Bayesian inference methods could be included in Section 6.5. Four possibilities stand out as being particularly valuable: There exist other quadrature rules for moderate dimension, such as the CCD. It would be of interest to compare INLA with a PCA-AGHQ rule to INLA with other such quadrature rules. Rather than use quadrature to integrate the marginal Laplace approximation, an alternative approach is to run HMC (Monnahan and Kristensen 2018; C. Margossian et al. 2020). When run to convergence, inferential error of this method would solely be due to the Laplace approximation, helping to clarify the extent to which the inferential error of INLA is attributable to the quadrature grid. Preliminary testing of this approach, using tmbstan and setting laplace = TRUE, did not show immediate success but likely could be worked on. NUTS is not especially well suited to sampling from Gaussian latent field models like Naomi. Other MCMC algorithms, such as blocked Gibbs sampling (Geman and Geman 1984) or slice sampling (Neal 2003), could be considered. It may be difficult to implement such algorithms using TMB. Many MCMC algorithms are implemented and customisable (including, for example, the choice of block structure) within the NIMBLE probabilistic programming language (Valpine et al. 2017). Requiring rewriting the Naomi model log-posterior outside of TMB would be a substantial downside. Finally, it would be of substantial interest to implement the Naomi model using the iterative INLA method via inlabru. However, as inlabru, like R-INLA, is based on a formula interface, it may not be possible to do so directly. 6.6.3.2 Better quadrature grids PCA-AGHQ is a sensible approach to allocating more computational effort to dimensions which contribute more to the integral in question. However, its application to Naomi surfaced instances where it overlooked potential benefits, or otherwise did not behave as one might wish: The amount of variation explained in the Hessian matrix may not be of direct interest. For the Naomi model, interest is in the effect of including each dimension on the relevant model outputs. As such, using alternative measures of importance from sensitivity analysis, such as Shapley values (Shapley et al. 1953) or Sobol indices, could be preferable. Use of PCA is challenging when the dimensions have different scales. For the Naomi model, logit-scale hyperparameters were systematically favoured over those on the log-scale. When the quadrature rule is used within an INLA algorithm, it is more important to allocate quadrature nodes to those hyperparameter marginals which are non-Gaussian. This is because the Laplace approximation is exact when the integrand is Gaussian, so a single quadrature node is sufficient. The difficulty is, of course, knowing in advance which marginals will be non-Gaussian. This could be done if there were a cheap way to obtain posterior means, which could then be compared to posterior modes obtained using optimisation. Another approach would be to measure the fit of marginal samples from a cheap approximation, like EB. The measures of fit would have to be for marginals, ruling out approaches like PSIS (Yao et al. 2018) which operate on joint distributions. Finally, it may be possible to achieve better performance by pruning and prerotation, as discussed by Jäckel (2005). 6.6.3.3 Computational improvements Approximation: The most significant improvement likely could come by using approximations to the Laplace marginals. In particular, he simplified Laplace marginals of Wood (2020) (Section 6.1.3.4) should be implemented, as the ELGM setting has relatively dense precision matrices. Parallelisation: Integration over a moderate number of hyperparameters resulted in use of quadrature grids with a large number of nodes. Computation at each node is independent, so algorithm run-time could potentially be significantly improved using parallel computing. This point is discussed by Kristensen et al. (2016) who highlight that TMB could applied to perform function evaluations in parallel, for example using the parallel R package. Hardware: Further computational speed-ups might be obtained using graphics processing units (GPUs) specialised for the relevant matrix operations. 6.6.3.4 Statistical theory The class of functions which are integrated exactly by PCA-AGHQ remains to be shown. Theorem 1 of Stringer, Brown, and Stafford (2022) bounds the total variation error of AGHQ, establishing convergence in probability of coverage probabilities under the approximate posterior distribution to those under the true posterior distribution. Similar theory could be established for PCA-AGHQ, or more generally AGHQ with varying levels. The challenge of connecting this theory to nested use of any quadrature rule, like that in the INLA algorithm, remains an important open question. 6.6.3.5 Testing quadrature assumptions It may be possible to test the assumptions made by use of AGHQ grids, allowing their suitability for a particular integral to be assessed. Specifically, AGHQ assumes that the integrand is closely approximated by a polynomial multiplied by a Gaussian density. Given NUTS hyperparameter samples (or better yet, hyperparameter samples from the Laplace NUTS hybrid discussed in Section 6.6.3.1) this assumption could be tested by fitting a model using a polynomial times Gaussian kernel. This approach could be generalised to also test the suitability of PCA-AGHQ grids. 6.6.3.6 Exploration of the accuracy of INLA for complex models The universal INLA implementation can be used to measure the accuracy of INLA for a wider range of models than were previously possible. An important benefit of using TMB is that comparisons to NUTS can easily be made using exactly the same model template. Among the ELGM-type structures of particular interest for spatial epidemiology are aggregated likelihood models and evidence synthesis models. 6.6.3.7 Methods dissemination The approach used to implement Laplace marginals with TMB was relatively ad-hoc, and involved modification of the TMB C++ template (Section 6.2.1.4). For wider dissemination of this method, it is important that the user is not burdened with making these modifications. One possibility would be to change the random argument in TMB::MakeADFun to allow for indexing. Another (less desirable) option would be to algorithmically generate the modified TMB C++ template based on the original template. Figure 6.29: Monthly R package downloads from the Comprehensive R Archive Network (CRAN) for brms, glmmTMB, nimble, rstan and TMB, obtained using the cranlogs (Csárdi 2023) R package. Unfortunately, R-INLA is not available from CRAN, and so could not be included in this figure. The official rstan documentation recommends installation of a development version hosted outside CRAN. As such, this metric may underestimate the popularity of rstan. Though gaining in popularity, the user-base of TMB is relatively small, and package downloads are in large part driven by use of the more easy-to-use glmmTMB package (Figure 6.29). For users unfamiliar with C++, it can be challenging to use TMB directly. One possibility is to look to disseminate methods via the users of glmmTMB. Another approach would be to implement methods in other probabilistic programming languages, such as Stan or NIMBLE. Implementation in Stan is made possible by the bridgestan package (Ward 2023), which provides access to the methods of a Stan model, and could be combined with the prototyping of an adjoint-differentiated Laplace approximation done in Stan by C. Margossian et al. (2020). The ratio of downloads of rstan as compared with brms suggests a larger proportion of Stan users are interested in specifying their own model. Implementation in NIMBLE is also possible as of version >1.0.0 which includes functionality for automatic differentiation and Laplace approximation [Part V; de Valpine et al. (2023)] like TMB built using CppAD. Both NIMBLE and Stan developers are actively looking into implementation of algorithms combining the Laplace approximation and quadrature. References Bachl, Fabian E, Finn Lindgren, David L Borchers, and Janine B Illian. 2019. “inlabru: an R package for Bayesian spatial modelling from ecological survey data.” Methods in Ecology and Evolution 10 (6): 760–66. Baydin, Atılım Günes, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18 (1): 5595–5637. Bell, Bradley. 2023. “CppAD: a package for C++ algorithmic differentiation.” http://www.coin-or.org/CppAD. Berild, Martin Outzen, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. 2022. “Importance Sampling with the Integrated Nested Laplace Approximation.” Journal of Computational and Graphical Statistics 31 (4): 1225–37. Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12. Bollhöfer, Matthias, Olaf Schenk, Radim Janalik, Steve Hamm, and Kiran Gullapalli. 2020. “State-of-the-art sparse direct solvers.” Parallel Algorithms in Computational Science and Engineering, 3–33. Box, George EP, and Kenneth B Wilson. 1992. “On the experimental attainment of optimum conditions.” In Breakthroughs in Statistics: Methodology and Distribution, 270–310. Springer. Breslow, Norman E, and David G Clayton. 1993. “Approximate inference in generalized linear mixed models.” Journal of the American Statistical Association 88 (421): 9–25. Brooks, Mollie E, Kasper Kristensen, Koen J Van Benthem, Arni Magnusson, Casper W Berg, Anders Nielsen, Hans J Skaug, Martin Machler, and Benjamin M Bolker. 2017. “glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.” The R Journal 9 (2): 378–400. Brown, Patrick E. 2015. “Model-based geostatistics the easy way.” Journal of Statistical Software 63: 1–24. Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Casella, George. 1985. “An introduction to empirical Bayes data analysis.” The American Statistician 39 (2): 83–87. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chiuchiolo, Cristian, Janet van Niekerk, and Håvard Rue. 2023. “Joint Posterior Inference for Latent Gaussian Models with r-INLA.” Journal of Statistical Computation and Simulation 93 (5): 723–52. Csárdi, Gábor. 2023. cranlogs: Download Logs from the ’RStudio’ ’CRAN’ Mirror. Davis, Philip J, and Philip Rabinowitz. 1975. Methods of numerical integration. Academic Press. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dennis Jr, John E, David M Gay, and Roy E Walsh. 1981. “An adaptive nonlinear least-squares algorithm.” ACM Transactions on Mathematical Software (TOMS) 7 (3): 348–68. Diaz, Jose Monsalve, Swaroop Pophale, Oscar Hernandez, David E Bernholdt, and Sunita Chandrasekaran. 2018. “OpenMP 4.5 Validation and Verification Suite for Device Offload.” In Evolving OpenMP for Evolving Architectures: 14th International Workshop on OpenMP, IWOMP 2018, Barcelona, Spain, September 26–28, 2018, Proceedings 14, 82–95. Springer. Diggle, Peter J, and Emanuele Giorgi. 2016. “Model-based geostatistics for prevalence mapping in low-resource settings.” Journal of the American Statistical Association 111 (515): 1096–1120. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Fattah, EA, JV Niekerk, and H Rue. 2022. “Smart gradient-an adaptive technique for improving gradient estimation.” Foundations of Data Science. Fournier, David A, Hans J Skaug, Johnoel Ancheta, James Ianelli, Arni Magnusson, Mark N Maunder, Anders Nielsen, and John Sibert. 2012. “AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models.” Optimization Methods and Software 27 (2): 233–49. Fuglstad, Geir-Arne, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2019. “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association 114 (525): 445–52. Gaedke-Merzhäuser, Lisa, Janet van Niekerk, Olaf Schenk, and Håvard Rue. 2023. “Parallelized integrated nested Laplace approximations for fast Bayesian inference.” Statistics and Computing 33 (1): 25. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Gómez-Rubio, Virgilio, and Håvard Rue. 2018. “Markov Chain Monte Carlo with the Integrated Nested Laplace Approximation.” Statistics and Computing 28: 1033–51. Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm. Gretton, Arthur, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex Smola. 2006. “A Kernel Method for the Two-Sample-Problem.” Advances in Neural Information Processing Systems 19. Jäckel, Peter. 2005. “A note on multivariate Gauss-Hermite quadrature.” London: ABN-Amro. Re. Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Maintainer Alexandros Karatzoglou. 2019. “Package ‘Kernlab’.” CRAN R Project. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Laplace, P. S. 1774. “Memoire sur la probabilite de causes par les evenements.” Memoire de l’Academie Royale Des Sciences. Lenth, Russell. 2009. “Response-Surface Methods in R, Using rsm.” Journal of Statistical Software 32 (7): 1–17. https://doi.org/10.18637/jss.v032.i07. Leppik, IE, FE Dreifuss, T Bowman-Cloyd, N Santilli, M Jacobs, C Crosby, J Cloyd, et al. 1985. “A double-blind crossover evaluation of progabide in partial seizures.” Neurology 35 (4): 285. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Margossian, Charles C. 2019. “A review of automatic differentiation and its efficient implementation.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (4): e1305. Margossian, Charles, Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. “Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.” Advances in Neural Information Processing Systems 33: 9086–97. Martino, Sara, and Andrea Riebler. 2020. “Integrated Nested Laplace Approximations (INLA).” In Wiley StatsRef: Statistics Reference Online, 1–19. John Wiley & Sons, Ltd. https://doi.org/https://doi.org/10.1002/9781118445112.stat08212. Martino, Sara, and Håvard Rue. 2009. “Implementing approximate Bayesian inference using Integrated Nested Laplace Approximation: A manual for the inla program.” Department of Mathematical Sciences, NTNU, Norway. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Naylor, John C, and Adrian FM Smith. 1982. “Applications of a method for the efficient computation of posterior distributions.” Journal of the Royal Statistical Society Series C: Applied Statistics 31 (3): 214–25. Neal, Radford M. 2003. “Slice sampling.” The Annals of Statistics 31 (3): 705–67. Noor, Abdisalan Mohamed. 2022. “Country Ownership in Global Health.” PLOS Global Public Health 2 (2): e0000113. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Pebesma, Edzer J. 2004. “Multivariable geostatistics in S: the gstat package.” Computers & Geosciences 30: 683–91. Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016. Press, William H, Teukolsky Saul A, William T Vetterling, and Brian P Flannery. 2007. Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press. Rue, Håvard. 2001. “Fast sampling of Gaussian Markov random fields.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 325–38. Rue, Håvard, and Turid Follestad. 2001. “GMRFLib: a C-library for fast and exact simulation of Gaussian Markov random fields.” SIS-2002-236. Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Rue, Håvard, and Sara Martino. 2007. “Approximate Bayesian inference for hierarchical Gaussian Markov random field models.” Journal of Statistical Planning and Inference 137 (10): 3177–92. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Rue, Håvard, Andrea Riebler, Sigrunn H Sørbye, Janine B Illian, Daniel P Simpson, and Finn K Lindgren. 2017. “Bayesian computing with INLA: a review.” Annual Review of Statistics and Its Application 4: 395–421. Schad, Daniel J, Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103. Schlüter, Daniela K, Martial L Ndeffo-Mbah, Innocent Takougang, Tony Ukety, Samuel Wanji, Alison P Galvani, and Peter J Diggle. 2016. “Using community-level prevalence of Loa loa infection to predict the proportion of highly-infected individuals: statistical modelling to support lymphatic filariasis and onchocerciasis elimination programs.” PLOS Neglected Tropical Diseases 10 (12): e0005157. Shapley, Lloyd S et al. 1953. “A value for n-person games.” Princeton University Press Princeton. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Skaug, Hans J. 2009. “Discussion of \"Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations\".” In Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:319–92. 2. Wiley Online Library. Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” Annals of Mathematical Statistics 19 (2): 279–81. Spiegelhalter, David, Andrew Thomas, Nicky Best, and Wally Gilks. 1996. “BUGS 0.5 Examples.” MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK 256. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Thall, Peter F, and Stephen C Vail. 1990. “Some covariance models for longitudinal count data with overdispersion.” Biometrics, 657–71. Thyng, Kristen M, Chad A Greene, Robert D Hetland, Heather M Zimmerle, and Steven F DiMarco. 2016. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29 (3): 9–13. Tierney, Luke, and Joseph B Kadane. 1986. “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association 81 (393): 82–86. UNAIDS. 2014. “90-90-90. An ambitious treatment target to help end the AIDS epidemic.” ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. Valpine, Perry de, Daniel Turek, Christopher J Paciorek, Clifford Anderson-Bergman, Duncan Temple Lang, and Rastislav Bodik. 2017. “Programming with models: writing statistical algorithms for general model structures with NIMBLE.” Journal of Computational and Graphical Statistics 26 (2): 403–13. Van Niekerk, Janet, Elias Krainski, Denis Rustand, and Håvard Rue. 2023. “A new avenue for Bayesian inference with INLA.” Computational Statistics & Data Analysis 181: 107692. Ward, Brian. 2023. bridgestan: BridgeStan, Accessing Stan Model Functions in R. Weiser, Constantin. 2016. mvQuad: Methods for Multivariate Quadrature. http://CRAN.R-project.org/package=mvQuad. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. "],["conclusions.html", "7 Conclusions 7.1 Contributions 7.2 Future work 7.3 Broader reflections", " 7 Conclusions This chapter concludes the thesis by discussing its most important contributions, some promising avenues for future work, and broader reflections about the work. 7.1 Contributions Effective response to the HIV epidemic depends on strategic information provided by models of data. This thesis contributes both to generating this information and to advancing statistical methods. Chapter 4 found that spatially structured random effects should be used in small-area models for HIV. Kernel models performed better for data simulated from an adjacency-based spatial process than adjacency-based models did for data simulated from a kernel model. However, adjacency-based models performed better under cross-validation of real HIV survey data. Model comparison was conducted using strictly proper scoring rules, with checks for calibration. Figure 7.1: Panel A shows the front page of UNAIDS (2023b). Panel B shows the page containing text and a figure based on the work done in Chapter 5. In this figure, 30 countries are included. Chapter 5 estimated HIV risk group proportions for AGYW to enable implementation of the Global AIDS strategy (UNAIDS 2021b). Risk group proportion estimates were used to behaviourally disaggregate HIV prevalence and incidence and assess the benefits of a variety of risk stratification strategies. This work is the basis for a tool used to prioritise delivery of HIV prevention services by countries in SSA. The tool now encompasses at least 30 countries, expanding from the initial 13 included [Figure 7.1; UNAIDS (2023b)]. Models will be rerun each year to populate the tool with updated information as a part of the UNAIDS annual HIV estimates process. Alongside these applied contributions, Chapter 5 exemplified specification of complex multinomial spatio-temporal models in R-INLA using the Poisson-multinomial transformation, including using two- and three-way Kronecker product interactions. The Naomi model has been used in over 35 countries in SSA to produce district-level estimates of HIV indicators by synthesising evidence from multiple sources. Chapter 6 developed deterministic Bayesian inference methods, motivated by the aim of providing more accurate inferences for this challenging and practically important model. Its most important methodological contributions are two-fold. First, an implementation of INLA which is compatible with models specified using a TMB C++ template. For the first time, practitioners can now fit essentially any model using the INLA method. Second, a quadrature rule which combines PCA and AGHQ to naturally extend the applicability of INLA methods to moderate hyperparameter dimension, allowing more complex models to be fit. Additionally, Chapter 6 provides detailed description and analysis of the Naomi model. Indeed, Esra et al. (2024) used tables and text from Appendix C in an update to Eaton et al. (2021). 7.2 Future work Promising avenues for future work, that I might prioritise, include: It would be valuable to extend the risk group model developed in Chapter 5, and the resulting tool, to include all adults 15-49. Although AGYW are disproportionately at risk of HIV infection, 56% of new infections in SSA occur in other demographic groups. Modelling of age-stratified sexual partnerships (Wolock et al. 2021) may help to overcome reporting biases by harmonising male and female reporting. This model would likely fall outside the scope of R-INLA, but would be possible to write with TMB and therefore amenable to the inference methods advanced in Chapter 6. Although suitable for early stage research, wider adoption of the INLA implementation developed in Chapter 6 would be greatly enhanced by improvements to its speed and usability. The most important speed enhancement would come from using the simplified approximation to the Laplace marginals developed by Wood (2020). Although the naive implementation used in this thesis is viable for integrating Laplace marginals over a small number of hyperparameter quadrature nodes, such as the \\(3^2 = 9\\) nodes used Sections 6.2.2 and 6.2.1, it becomes prohibitively slow for larger numbers. Usability would be improved by providing the method as a part of statistical software, likely via the aghq package. The primary difficulty which would have to be overcome to do so is that the random argument of TMB::MakeADFun does not allow indexing. Figure 7.2: For the Loa loa ELGM (Section 6.2.2), increasing the number of quadrature nodes per hyperparameter dimension from \\(k = 3\\) to \\(k = 7\\) did little to improve accuracy. On the other hand, using Laplace marginals rather than Gaussian marginals did have a substantial effect (Figures 6.12 and 6.13). It would be valuable to better understand, and aspirationally have diagnostics for, the circumstances under which accuracy of INLA methods could be improved by additional computation. The universal INLA implementation developed in Chapter 6 enables empirical and methodological research that was previously not possible, or prohibitively difficult. INLA-like methods can now be tested for a broader class of models, such as the Loa loa and Naomi ELGMs (Sections 6.2.2 and 6.5). That a single TMB C++ template for the log-posterior supports inference using multiple methods, including gold-standard NUTS via tmbstan, is a substantial asset in conducting this type of research. As an example research question, within this class of models, what is the best way to obtain accurate inferences within a fixed computational budget. Is it better to use additional hyperparameter grid points, or more accurate latent field approximations? For the Loa loa ELGM in Section 6.2.2, the benefit of using Laplace marginals exceeded that of a denser AGHQ grid (Figure 7.2). It would also be of interest to find methods to obtain accurate inferences for particular parameters, or functions of parameters, using INLA-like methods. For example, in Section 6.5, although the PCA-AGHQ grid improved latent field parameter inferences, it did little to improve model output accuracy. Is there a way in which computational effort could be focused on obtaining accurate estimates of Naomi model outputs? Additionally, it is relatively easy to make alterations to the implementation, facilitating possible innovation in the design of INLA-like algorithms. Previously, it has been difficult for researchers not involved in development of R-INLA to engage in methodological work about the INLA method. Theoretical research could be conducted to complement the work described above, extending the findings of Bilodeau, Stringer, and Tang (2022). This work is benefited by the complete specification (Appendix C.3) of the INLA-like algorithm used in this thesis. 7.3 Broader reflections Conducting the work in this thesis involved testing the boundaries of available statistical software. For example, I found it challenging, if not impossible, to implement a common model using different inferential software. As the Frequently Asked Questions section of the R-INLA website (Havard Rue 2023) notes: “the devil is in the details”. Similarly, I encountered issues implementing a desired collection of different models in a common inferential software. From personal experience, my colleagues have also encountered similar problems. Needless to say, conflation of statistical models and inference methodologies limits the validity of any findings. To avoid this issue I implemented all models in Chapters 4 and 6 using TMB model templates. (Additionally, I would recommend implementing the model used in Chapter 5 in TMB for future development.) Alongside being sufficiently flexible to meet my model specification requirements, TMB is compatible with a range of inference methodologies, including those advanced in this thesis. As such, TMB remains (Osgood-Zimmerman and Wakefield 2023) an under-rated statistical tool. In demonstrating some of its capabilities, I hope this thesis contributes to its wider adoption. The work done in this thesis, particularly Chapters 4 and 6, focused on producing experimental, empirical evidence. This approach reflects the complexity of the models and methods used in this thesis. Understanding complex systems from a theoretical perspective can be challenging. That said, in my opinion the work in this thesis could benefit from closer integration with statistical theory. Although a full theoretical understanding of these models or algorithms may be ambitious, better understanding simplified examples, limiting cases, or constituent parts could still prove valuable. Working with the data in Chapter 5 deepened my appreciation for the realistic challenges faced in applied work, and data quality being linchpin for any successful statistical analysis. While from the real world, the data in Chapters 4 and 6 underwent substantial cleaning, processing, and vetting before I handled them, as is typical in methodological research. It is important that methodological and theoretical statisticians appreciate the real challenges of applied work, by doing it themselves, or working in close collaboration with those who do. There are both direct and indirect paths to impact for the work in this thesis. Directly, the methodological contributions of Chapters 4 and 6 may eventually lead to marginally more accurate indicator estimates, contributing to a broadly more effective response. However, these improvements in accuracy seem of minor consequence within the broader context of the HIV response, and factors limiting its effectiveness. The applied contributions of Chapter 5 have a more promising case for direct impact. Indeed, I have seen evidence of engagement with this work by decision makers. To the best of my abilities, this thesis, and the work described within it, was written in keeping with the principles of open science. I hope that having done so facilitates my work to be scrutinised, and more optimistically, built upon. In part this hope has already been realised, as with limited input from me, Dr. Kathryn Risher was able to extend my code for Chapter 5 to include additional countries (Panel 7.1B). This would not have been possible without tools from the R ecosystem such as rmarkdown and rticles for reporting, devtools for R package development, as well as those written by software engineers within the MRC Centre for Global Infectious Disease Analysis such as orderly and didehpc. It is crucial that academia adjusts to appropriately incentivises software contributions, and encourages adaption of open science best practices. Work done to inform public health decision making should be held to high standards of transparency, reproducibility and collaboration. Especially so in an outbreak response scenario (Grieve et al. 2023), where time is limited and decisions may be of significant consequence. References Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Grieve, Richard, Youqi Yang, Sam Abbott, Giridhara R Babu, Malay Bhattacharyya, Natalie Dean, Stephen Evans, et al. 2023. “The Importance of Investing in Data, Models, Experiments, Team Science, and Public Trust to Help Policymakers Prepare for the Next Pandemic.” PLOS Global Public Health 3 (11): e0002601. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Rue, Havard. 2023. “‘R-INLA‘ Project - FAQ.” https://www.r-inla.org/faq. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. Wolock, Timothy M, Seth Flaxman, Kathryn A Risher, Tawanda Dadirai, Simon Gregson, and Jeffrey W Eaton. 2021. “Evaluating distributional regression strategies for modelling self-reported sexual age-mixing.” Edited by Eduardo Franco, Talía Malagón, and Adam Akullian. eLife 10 (June): e68318. https://doi.org/10.7554/eLife.68318. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. "],["models-for-areal-spatial-structure.html", "A Models for areal spatial structure A.1 Comparison of AGHQ to NUTS A.2 Lengthscale prior sensitivity A.3 Simulation study A.4 HIV study", " A Models for areal spatial structure A.1 Comparison of AGHQ to NUTS Figure A.1: A comparison of time taken to fit AGHQ via aghq as compared with NUTS via tmbstan for each inferential model. For the models run using NUTS via tmbstan there was significant variation in time taken depending on initial random seed. As such, these timings and more broadly the inferences obtained from NUTS in Appendix A.1 should be interpreted with appropriate skepticism. Figure A.2: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting an IID inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1686, and the maximum value of the potential scale reduction factor was 1.00. Figure A.3: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a Besag inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1056, and the maximum value of the potential scale reduction factor was 1.00. Figure A.4: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a BYM2 inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 35, and the maximum value of the potential scale reduction factor was 1.06. Figure A.5: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a FCK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 355, and the maximum value of the potential scale reduction factor was 1.01. Figure A.6: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a CK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1471, and the maximum value of the potential scale reduction factor was 1.00. Figure A.7: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a FIK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 289, and the maximum value of the potential scale reduction factor was 1.01. Figure A.8: A comparison of the posterior means and standard deviations obtained with AGHQ via aghq as compared with NUTS via tmbstan fitting a IK inferential model to IID synthetic data on the grid geometry (Panel 4.6E). For NUTS, the minimum ESS was 1623, and the maximum value of the potential scale reduction factor was 1.00. A.2 Lengthscale prior sensitivity Table A.1: Six lengthscale prior distributions were considered for use in the simulation (Section 4.3) and HIV prevalence (Section 4.4) studies. Description Prior Additional details Gamma \\(l \\sim \\text{Gamma}(1, 1)\\) \\(-\\) Geometry-informed inverse-gamma \\(l \\sim \\text{IG}(a, b)\\) The parameters \\(a\\) and \\(b\\) chosen such that 5% of the prior mass was below and above the 5% and 95% quantile for distance between points (Betancourt 2017) Geometry-informed normal \\(l \\sim \\mathcal{N}^{+}(0, \\sigma)\\) The parameter \\(\\sigma\\) set as one third the difference between the minimum and maximum distance between points (Betancourt 2017) Log-normal \\(l \\sim \\text{Log-normal}(0, 1)\\) \\(-\\) Non-informative \\(p(l) = 1\\) This is an improper prior in that it does not integrate to one Oracle normal \\(l \\sim \\mathcal{N}^{+}(2.5, 1)\\) The mean of this prior was set to the true value of the lengthscale Figure A.9: The probability density for each lengthscale prior distribution as given in Table A.1. Figure A.10: Lengthscale posterior distributions obtained using NUTS to fit a centroid kernel model to integrated kernel data. The true value, 2.5, is shown as a dashed vertical line. Six different lengthscale prior distributions were considered as given in Table A.1. The geometry used was the grid (Panel 4.6E). A.3 Simulation study A.3.1 Lengthscale recovery Figure A.11: The lengthscale posterior mean and 95% credible interval obtained using the centroid kernel model on integrated kernel data for the first 40 simulation replicates on each geometry. The true lengthscale, and lengthscale obtained using the heuristic method of N. Best et al. (1999), are shown as dashed horizontal lines. A.3.2 BYM2 proportion Figure A.12: The BYM2 proportion parameter posterior mean and 95% credible interval obtained for the first 40 simulation replicates for the realistic geometries. When the simulated data is IID, the BYM2 proportion parameter is in the majority of cases below 0.5, corresponding to have inferred that the noise is mostly IID (spatially unstructured) When the simulated data is either Besag or IK, the BYM2 proportion parameter is in the majority of cases above 0.5, corresponding to have inferred that the noise is mostly Besag (spatially structured). A.3.3 Mean squared error #qpkiqsdwcd table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #qpkiqsdwcd thead, #qpkiqsdwcd tbody, #qpkiqsdwcd tfoot, #qpkiqsdwcd tr, #qpkiqsdwcd td, #qpkiqsdwcd th { border-style: none; } #qpkiqsdwcd p { margin: 0; padding: 0; } #qpkiqsdwcd .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #qpkiqsdwcd .gt_caption { padding-top: 4px; padding-bottom: 4px; } #qpkiqsdwcd .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #qpkiqsdwcd .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #qpkiqsdwcd .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #qpkiqsdwcd .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #qpkiqsdwcd .gt_column_spanner_outer:first-child { padding-left: 0; } #qpkiqsdwcd .gt_column_spanner_outer:last-child { padding-right: 0; } #qpkiqsdwcd .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #qpkiqsdwcd .gt_spanner_row { border-bottom-style: hidden; } #qpkiqsdwcd .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #qpkiqsdwcd .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #qpkiqsdwcd .gt_from_md > :first-child { margin-top: 0; } #qpkiqsdwcd .gt_from_md > :last-child { margin-bottom: 0; } #qpkiqsdwcd .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #qpkiqsdwcd .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #qpkiqsdwcd .gt_row_group_first td { border-top-width: 2px; } #qpkiqsdwcd .gt_row_group_first th { border-top-width: 2px; } #qpkiqsdwcd .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #qpkiqsdwcd .gt_first_summary_row.thick { border-top-width: 2px; } #qpkiqsdwcd .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #qpkiqsdwcd .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #qpkiqsdwcd .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #qpkiqsdwcd .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #qpkiqsdwcd .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #qpkiqsdwcd .gt_left { text-align: left; } #qpkiqsdwcd .gt_center { text-align: center; } #qpkiqsdwcd .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #qpkiqsdwcd .gt_font_normal { font-weight: normal; } #qpkiqsdwcd .gt_font_bold { font-weight: bold; } #qpkiqsdwcd .gt_font_italic { font-style: italic; } #qpkiqsdwcd .gt_super { font-size: 65%; } #qpkiqsdwcd .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #qpkiqsdwcd .gt_asterisk { font-size: 100%; vertical-align: 0; } #qpkiqsdwcd .gt_indent_1 { text-indent: 5px; } #qpkiqsdwcd .gt_indent_2 { text-indent: 10px; } #qpkiqsdwcd .gt_indent_3 { text-indent: 15px; } #qpkiqsdwcd .gt_indent_4 { text-indent: 20px; } #qpkiqsdwcd .gt_indent_5 { text-indent: 25px; } Table A.2: The average mean squared error (MSE) of each inferential model in estimating \\(\\rho\\), under different simulation and geometry settings. Entries for FCK and CK on geometry 2 are empty because model was undefined in that case. The units used in this table are expressed in thousandths. Simulation model Inferential model IID Besag BYM2 FCK CK FIK IK 1 IID 8.20 7.56 7.99 7.84 7.67 7.90 7.61 Besag 7.31 6.39 7.15 7.31 6.76 7.27 6.63 IK 7.44 6.30 7.27 7.74 6.83 7.58 6.62 2 IID 8.43 7.62 8.23 - - 7.99 8.32 Besag 7.56 6.58 7.39 - - 7.25 6.42 IK 7.16 5.91 6.95 - - 6.91 4.95 3 IID 8.23 7.72 8.19 8.09 7.85 8.05 7.75 Besag 7.73 6.71 7.63 7.78 7.01 7.55 6.67 IK 7.56 6.24 7.30 7.75 6.78 7.53 6.18 4 IID 8.71 8.03 8.49 8.53 8.31 8.35 8.12 Besag 7.48 6.65 7.34 7.55 7.08 7.44 6.89 IK 7.38 6.11 7.12 7.60 6.71 7.45 6.36 Grid IID 7.63 7.65 7.66 7.72 7.79 7.89 7.84 Besag 4.06 3.29 3.77 3.94 3.36 3.71 3.32 IK 5.97 4.30 4.81 4.98 3.50 4.47 3.41 Cote d'Ivoire IID 7.72 7.78 7.74 7.89 7.99 8.08 7.96 Besag 4.88 3.96 4.45 4.62 4.07 4.36 4.00 IK 5.61 3.96 4.50 4.73 3.18 4.19 3.10 Texas IID 7.63 7.71 7.65 8.59 8.05 8.60 7.80 Besag 5.13 4.05 4.62 4.60 4.36 4.34 4.26 IK 6.29 4.51 5.06 4.44 3.45 4.04 3.37 A.3.4 Continuous ranked probability score #iajjvpgkrj table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #iajjvpgkrj thead, #iajjvpgkrj tbody, #iajjvpgkrj tfoot, #iajjvpgkrj tr, #iajjvpgkrj td, #iajjvpgkrj th { border-style: none; } #iajjvpgkrj p { margin: 0; padding: 0; } #iajjvpgkrj .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #iajjvpgkrj .gt_caption { padding-top: 4px; padding-bottom: 4px; } #iajjvpgkrj .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #iajjvpgkrj .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #iajjvpgkrj .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #iajjvpgkrj .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #iajjvpgkrj .gt_column_spanner_outer:first-child { padding-left: 0; } #iajjvpgkrj .gt_column_spanner_outer:last-child { padding-right: 0; } #iajjvpgkrj .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #iajjvpgkrj .gt_spanner_row { border-bottom-style: hidden; } #iajjvpgkrj .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #iajjvpgkrj .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #iajjvpgkrj .gt_from_md > :first-child { margin-top: 0; } #iajjvpgkrj .gt_from_md > :last-child { margin-bottom: 0; } #iajjvpgkrj .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #iajjvpgkrj .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #iajjvpgkrj .gt_row_group_first td { border-top-width: 2px; } #iajjvpgkrj .gt_row_group_first th { border-top-width: 2px; } #iajjvpgkrj .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #iajjvpgkrj .gt_first_summary_row.thick { border-top-width: 2px; } #iajjvpgkrj .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #iajjvpgkrj .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #iajjvpgkrj .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #iajjvpgkrj .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #iajjvpgkrj .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #iajjvpgkrj .gt_left { text-align: left; } #iajjvpgkrj .gt_center { text-align: center; } #iajjvpgkrj .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #iajjvpgkrj .gt_font_normal { font-weight: normal; } #iajjvpgkrj .gt_font_bold { font-weight: bold; } #iajjvpgkrj .gt_font_italic { font-style: italic; } #iajjvpgkrj .gt_super { font-size: 65%; } #iajjvpgkrj .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #iajjvpgkrj .gt_asterisk { font-size: 100%; vertical-align: 0; } #iajjvpgkrj .gt_indent_1 { text-indent: 5px; } #iajjvpgkrj .gt_indent_2 { text-indent: 10px; } #iajjvpgkrj .gt_indent_3 { text-indent: 15px; } #iajjvpgkrj .gt_indent_4 { text-indent: 20px; } #iajjvpgkrj .gt_indent_5 { text-indent: 25px; } Table A.3: The average continuous ranked probability score (CRPS) of each inferential model in estimating \\(\\rho\\), under different simulation and geometry settings. Entries for FCK and CK on geometry 2 are empty because model was undefined in that case. The units used in this table are thousandths. Simulation model Inferential model IID Besag BYM2 FCK CK FIK IK 1 IID 32.6 33.9 32.7 32.1 33.4 32.3 33.5 Besag 30.7 29.5 30.6 30.7 30.0 30.7 29.9 IK 31.2 29.1 31.1 32.1 30.1 31.7 29.7 2 IID 33.1 33.4 32.8 - - 32.7 39.9 Besag 32.0 30.6 31.6 - - 31.2 33.2 IK 28.9 26.2 28.6 - - 28.4 24.2 3 IID 32.9 33.8 33.1 32.4 33.5 32.6 35.0 Besag 32.9 31.1 32.4 33.0 31.5 32.2 31.6 IK 30.7 28.1 30.3 31.4 29.0 30.8 27.9 4 IID 34.3 34.9 34.2 34.2 34.8 33.8 34.7 Besag 32.3 31.2 31.9 32.1 31.8 31.9 31.7 IK 29.8 27.3 29.3 30.5 28.3 29.9 27.7 Grid IID 32.4 34.2 32.5 33.1 34.0 35.1 35.1 Besag 24.6 22.7 23.3 23.4 23.8 23.5 24.1 IK 28.7 23.7 24.6 24.4 21.1 23.1 21.0 Cote d'Ivoire IID 32.4 34.5 32.5 33.7 34.8 35.8 35.6 Besag 26.5 24.4 24.9 25.3 25.9 25.3 26.0 IK 27.7 22.2 23.4 23.6 19.6 22.2 19.6 Texas IID 32.1 34.0 32.3 39.2 35.7 40.0 35.6 Besag 27.3 24.7 25.3 27.1 27.5 26.9 27.0 IK 29.7 24.5 25.4 23.0 20.8 22.3 20.9 Figure A.13: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the first vignette geometry (Panel 4.6A). Credible intervals were generated using 1.96 times the standard error. Figure A.14: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the second vignette geometry (Panel 4.6B). Credible intervals were generated using 1.96 times the standard error. Figure A.15: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on third vignette geometry (Panel 4.6C). Credible intervals were generated using 1.96 times the standard error. Figure A.16: The mean CRPS with 95% credible interval in estimating \\(\\rho\\) using each inferential model and simulation model on the fourth vignette geometry (Panel 4.6D). Credible intervals were generated using 1.96 times the standard error. Figure A.17: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the first vignette geometry (Panel 4.6A). Figure A.18: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the second vignette geometry (Panel 4.6B). Figure A.19: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the third vignette geometry (Panel 4.6C). Figure A.20: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the fourth vignette geometry (Panel 4.6D). Figure A.21: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the grid geometry (Panel 4.6E). Figure A.22: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the Côte d’Ivoire geometry (Panel 4.6F). Figure A.23: Choropleths showing the mean value of the CRPS in estimating \\(\\rho\\), under each inferential model and simulation model, at each area of the Texas geometry (Panel 4.6G). A.3.5 Calibration Figure A.24: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the first vignette geometry (Panel 4.6A). Figure A.25: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the second vignette geometry (Panel 4.6B). Figure A.26: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the third vignette geometry (Panel 4.6C). Figure A.27: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the fourth vignette geometry (Panel 4.6D). Figure A.28: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the grid geometry (Panel 4.6E). Figure A.29: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the Côte d’Ivoire geometry (Panel 4.6F). Figure A.30: Probability integral transform histograms and empirical cumulative distribution function difference plots for \\(\\rho\\), under each inferential model and simulation model, for the Texas geometry (Panel 4.6G). A.4 HIV study A.4.1 Lengthscale Figure A.31: The lengthscale hyperparameter prior and posterior distributions for each of the four considered PHIA surveys (Table 4.3), using both the CK and IK inferential models. A.4.2 BYM2 proportion Figure A.32: The BYM2 proportion hyperparameter prior and posterior distributions for each of the four considered PHIA surveys (Table 4.3). A value of zero corresponds to IID noise. A value of one corresponds to Besag noise. For each survey, excluding the Côte d’Ivoire 2017 PHIA, the posterior distribution for the BYM2 proportion is concentrated towards a value of one. This result can be interpreted as suggesting that the variation in HIV prevalence from these surveys is spatially structured. A.4.3 Estimates Figure A.33: The HIV prevalence posterior mean and 95% credible interval for each area of Côte d’Ivoire, based on the 2017 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10A. Figure A.34: The HIV prevalence posterior mean and 95% credible interval for each area of Malawi, based on the 2016 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10B. Figure A.35: The HIV prevalence posterior mean and 95% credible interval for each area of Tanzania, based on the 2017 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10C. Figure A.36: The HIV prevalence posterior mean and 95% credible interval for each area of Zimbabwe, based on the 2016 PHIA survey. Direct estimates obtained from the survey are as shown in Panel 4.10D. A.4.4 Cross-validation A.4.4.1 Mean squared error #irhisghlpm table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #irhisghlpm thead, #irhisghlpm tbody, #irhisghlpm tfoot, #irhisghlpm tr, #irhisghlpm td, #irhisghlpm th { border-style: none; } #irhisghlpm p { margin: 0; padding: 0; } #irhisghlpm .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #irhisghlpm .gt_caption { padding-top: 4px; padding-bottom: 4px; } #irhisghlpm .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #irhisghlpm .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #irhisghlpm .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #irhisghlpm .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #irhisghlpm .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #irhisghlpm .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #irhisghlpm .gt_column_spanner_outer:first-child { padding-left: 0; } #irhisghlpm .gt_column_spanner_outer:last-child { padding-right: 0; } #irhisghlpm .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #irhisghlpm .gt_spanner_row { border-bottom-style: hidden; } #irhisghlpm .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #irhisghlpm .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #irhisghlpm .gt_from_md > :first-child { margin-top: 0; } #irhisghlpm .gt_from_md > :last-child { margin-bottom: 0; } #irhisghlpm .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #irhisghlpm .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #irhisghlpm .gt_row_group_first td { border-top-width: 2px; } #irhisghlpm .gt_row_group_first th { border-top-width: 2px; } #irhisghlpm .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #irhisghlpm .gt_first_summary_row.thick { border-top-width: 2px; } #irhisghlpm .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #irhisghlpm .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #irhisghlpm .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #irhisghlpm .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #irhisghlpm .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #irhisghlpm .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #irhisghlpm .gt_left { text-align: left; } #irhisghlpm .gt_center { text-align: center; } #irhisghlpm .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #irhisghlpm .gt_font_normal { font-weight: normal; } #irhisghlpm .gt_font_bold { font-weight: bold; } #irhisghlpm .gt_font_italic { font-style: italic; } #irhisghlpm .gt_super { font-size: 65%; } #irhisghlpm .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #irhisghlpm .gt_asterisk { font-size: 100%; vertical-align: 0; } #irhisghlpm .gt_indent_1 { text-indent: 5px; } #irhisghlpm .gt_indent_2 { text-indent: 10px; } #irhisghlpm .gt_indent_3 { text-indent: 15px; } #irhisghlpm .gt_indent_4 { text-indent: 20px; } #irhisghlpm .gt_indent_5 { text-indent: 25px; } Table A.4: The mean pointwise leave-one-out and spatial leave-one-out MSE in estimating \\(\\rho_i\\), with standard errors, for each inferential model across the four considered PHIA surveys. The units used in this table are thousandths. PHIA survey Mean squared error (units: 1/1000) IID Besag BYM2 FCK CK FIK IK LOO Côte d’Ivoire, 2017 0.21 0.22 0.20 0.21 0.19 0.21 0.20 Malawi, 2016 7.10 2.39 2.59 3.59 3.70 2.43 2.54 Tanzania, 2017 1.66 1.14 1.43 0.95 0.65 0.78 0.66 Zimbabwe, 2016 4.76 2.51 2.54 2.51 1.88 2.15 1.83 SLOO Côte d’Ivoire, 2017 0.20 0.22 0.21 0.24 0.25 0.26 0.25 Malawi, 2016 7.13 2.41 3.32 8.22 7.95 7.05 6.70 Tanzania, 2017 1.65 1.09 2.46 1.86 2.80 1.86 2.59 Zimbabwe, 2016 4.73 2.49 3.44 3.95 3.36 3.93 3.42 A.4.4.2 Continuous ranked probability score Figure A.37: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.38: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Malawi 2016 PHIA survey 4.10B. Figure A.39: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Tanzania 2017 PHIA survey 4.10C. Figure A.40: The pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation, with mean and 95% credible interval, for the Zimbabwe 2016 PHIA survey 4.10D. Figure A.41: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.42: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Malawi 2016 PHIA survey (Panel 4.10B). Figure A.43: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Tanzania 2017 PHIA survey (Panel 4.10C). Figure A.44: Choropleth showing the pointwise CRPS in estimating \\(\\rho_i\\) using either leave-one-out or spatial leave-one-out cross-validation for the Zimbabwe 2016 PHIA survey (Panel 4.10D). Figure A.45: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Côte d’Ivoire 2017 PHIA survey (Panel 4.10A). Figure A.46: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Malawi 2016 PHIA survey (Panel 4.10B). Figure A.47: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Tanzania 2017 PHIA survey (Panel 4.10C). Figure A.48: Probability integral transform histograms and empirical cumulative distribution function difference plots in estimating \\(\\rho\\) for the Zimbabwe 2016 PHIA survey (Panel 4.10D). References Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. "],["a-model-for-risk-group-proportions.html", "B A model for risk group proportions B.1 The Global AIDS Strategy B.2 Household survey data B.3 Spatial analysis levels B.4 Survey questions and risk group allocation B.5 Additional figures", " B A model for risk group proportions B.1 The Global AIDS Strategy Table B.1: Prioritisation strata for AGYW given by UNAIDS (2021b) based on to HIV incidence in the general population and behavioural risk. Prioritisation strata Criterion Low 0.3-1.0% incidence and low-risk behaviour, or <0.3% incidence and high-risk behaviour Moderate 1.0-3.0% incidence and low-risk behaviour, or 0.3-1.0% incidence and high-risk behaviour High 1.0-3.0% incidence and high-risk behaviour Very high >3.0% incidence Table B.2: Commitments recommended by UNAIDS (2021b) to be met for each HIV intervention, given in terms of the proportion of the AGYW prioritisation strata reached. The symbol “-” represents no commitment. Intervention Low Moderate High Very High Condoms and lube for those with non-regular partners(s), unknown STI status, not on PrEP 50% 70% 95% 95% STI screening and treatment 10% 10% 80% 80% Access to PEP - - 50% 90% PrEP use - 5% 50% 50% Economic empowerment - - 20% 20% B.2 Household survey data #ejwvuleznx table { font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji'; -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } #ejwvuleznx thead, #ejwvuleznx tbody, #ejwvuleznx tfoot, #ejwvuleznx tr, #ejwvuleznx td, #ejwvuleznx th { border-style: none; } #ejwvuleznx p { margin: 0; padding: 0; } #ejwvuleznx .gt_table { display: table; border-collapse: collapse; line-height: normal; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; } #ejwvuleznx .gt_caption { padding-top: 4px; padding-bottom: 4px; } #ejwvuleznx .gt_title { color: #333333; font-size: 125%; font-weight: initial; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; border-bottom-color: #FFFFFF; border-bottom-width: 0; } #ejwvuleznx .gt_subtitle { color: #333333; font-size: 85%; font-weight: initial; padding-top: 3px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; border-top-color: #FFFFFF; border-top-width: 0; } #ejwvuleznx .gt_heading { background-color: #FFFFFF; text-align: center; border-bottom-color: #FFFFFF; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_bottom_border { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_col_headings { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_col_heading { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; } #ejwvuleznx .gt_column_spanner_outer { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; padding-top: 0; padding-bottom: 0; padding-left: 4px; padding-right: 4px; } #ejwvuleznx .gt_column_spanner_outer:first-child { padding-left: 0; } #ejwvuleznx .gt_column_spanner_outer:last-child { padding-right: 0; } #ejwvuleznx .gt_column_spanner { border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 5px; overflow-x: hidden; display: inline-block; width: 100%; } #ejwvuleznx .gt_spanner_row { border-bottom-style: hidden; } #ejwvuleznx .gt_group_heading { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; text-align: left; } #ejwvuleznx .gt_empty_group_heading { padding: 0.5px; color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; vertical-align: middle; } #ejwvuleznx .gt_from_md > :first-child { margin-top: 0; } #ejwvuleznx .gt_from_md > :last-child { margin-bottom: 0; } #ejwvuleznx .gt_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; } #ejwvuleznx .gt_stub { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_stub_row_group { color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: initial; text-transform: inherit; border-right-style: solid; border-right-width: 2px; border-right-color: #D3D3D3; padding-left: 5px; padding-right: 5px; vertical-align: top; } #ejwvuleznx .gt_row_group_first td { border-top-width: 2px; } #ejwvuleznx .gt_row_group_first th { border-top-width: 2px; } #ejwvuleznx .gt_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_first_summary_row { border-top-style: solid; border-top-color: #D3D3D3; } #ejwvuleznx .gt_first_summary_row.thick { border-top-width: 2px; } #ejwvuleznx .gt_last_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_grand_summary_row { color: #333333; background-color: #FFFFFF; text-transform: inherit; padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_first_grand_summary_row { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-top-style: double; border-top-width: 6px; border-top-color: #D3D3D3; } #ejwvuleznx .gt_last_grand_summary_row_top { padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; border-bottom-style: double; border-bottom-width: 6px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_striped { background-color: rgba(128, 128, 128, 0.05); } #ejwvuleznx .gt_table_body { border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; } #ejwvuleznx .gt_footnotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_footnote { margin: 0px; font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_sourcenotes { color: #333333; background-color: #FFFFFF; border-bottom-style: none; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; } #ejwvuleznx .gt_sourcenote { font-size: 90%; padding-top: 4px; padding-bottom: 4px; padding-left: 5px; padding-right: 5px; } #ejwvuleznx .gt_left { text-align: left; } #ejwvuleznx .gt_center { text-align: center; } #ejwvuleznx .gt_right { text-align: right; font-variant-numeric: tabular-nums; } #ejwvuleznx .gt_font_normal { font-weight: normal; } #ejwvuleznx .gt_font_bold { font-weight: bold; } #ejwvuleznx .gt_font_italic { font-style: italic; } #ejwvuleznx .gt_super { font-size: 65%; } #ejwvuleznx .gt_footnote_marks { font-size: 75%; vertical-align: 0.4em; position: initial; } #ejwvuleznx .gt_asterisk { font-size: 100%; vertical-align: 0; } #ejwvuleznx .gt_indent_1 { text-indent: 5px; } #ejwvuleznx .gt_indent_2 { text-indent: 10px; } #ejwvuleznx .gt_indent_3 { text-indent: 15px; } #ejwvuleznx .gt_indent_4 { text-indent: 20px; } #ejwvuleznx .gt_indent_5 { text-indent: 25px; } Table B.3: The sample size by age group for each included survey in the analysis. The column “TS question” refers to whether or not the survey included a specific question about transactional sex (TS). Type Year TS question Sample size 15-19 20-24 25-29 Total Botswana BAIS 2013 ✓ 557 588 649 1794 Cameroon DHS 2004 ✗ 2678 2210 1732 6620 DHS 2011 ✗ 3588 3115 2656 9359 PHIA 2017 ✗ 2140 1923 1851 5914 DHS 2018 ✓ 3349 2463 2345 8157 Kenya DHS 2003 ✗ 1819 1709 1391 4919 DHS 2008 ✗ 1767 1743 1420 4930 DHS 2014 ✗ 2861 2534 2858 8253 Lesotho DHS 2004 ✗ 1761 1456 1026 4243 DHS 2009 ✗ 1834 1545 1195 4574 DHS 2014 ✗ 1537 1293 1069 3899 PHIA 2017 ✓ 1156 1202 1054 3412 Mozambique AIS 2009 ✗ 1031 1106 987 3124 DHS 2011 ✗ 3065 2468 2340 7873 AIS 2015 ✗ 1554 1390 1080 4024 Malawi DHS 2000 ✗ 2914 2998 2358 8270 DHS 2004 ✗ 2407 2823 2135 7365 DHS 2010 ✗ 5032 4387 4309 13728 DHS 2015 ✓ 5273 5094 3976 14343 PHIA 2016 ✓ 1646 1934 1511 5091 Namibia DHS 2000 ✗ 1428 1313 1099 3840 DHS 2006 ✗ 2203 1870 1544 5617 DHS 2013 ✗ 1852 1709 1482 5043 PHIA 2017 ✓ 1491 1525 1370 4386 Eswatini DHS 2006 ✗ 1265 1027 731 3023 PHIA 2017 ✗ 1031 895 811 2737 Tanzania AIS 2003 ✗ 1466 1377 1270 4113 AIS 2007 ✗ 2137 1676 1509 5322 DHS 2010 ✗ 2221 1860 1613 5694 AIS 2012 ✗ 2474 1923 1815 6212 Uganda DHS 2000 ✗ 1687 1541 1326 4554 DHS 2006 ✗ 1948 1661 1406 5015 AIS 2011 ✗ 2451 2164 1921 6536 DHS 2011 ✗ 2025 1664 1614 5303 DHS 2016 ✓ 4276 3782 3014 11072 PHIA 2016 ✗ 3289 3059 2574 8922 South Africa DHS 2016 ✓ 1505 1408 1397 4310 Zambia DHS 2007 ✗ 1598 1405 1373 4376 DHS 2013 ✗ 3685 3036 2789 9510 PHIA 2016 ✓ 2120 2045 1619 5784 DHS 2018 ✓ 3112 2687 2166 7965 Zimbabwe DHS 1999 ✗ 1468 1230 1011 3709 DHS 2005 ✗ 2128 1943 1438 5509 DHS 2010 ✗ 1966 1796 1680 5442 DHS 2015 ✓ 2154 1779 1647 5580 PHIA 2016 ✓ 2114 1817 1573 5504 Total 103063 92173 79734 274970 Table B.4: All of that household surveys that were excluded from the risk group model in Section 5.3. Survey Reason for exclusion Mozambique 2003 DHS No GPS coordinates available to place survey clusters within districts. Tanzania 2015 DHS Insufficient sexual behaviour questions. Uganda 2004 AIS Unable to download region boundaries. Zambia 2002 DHS No GPS coordinates available to place survey clusters within districts. B.3 Spatial analysis levels Table B.5: The number of areas and analysis level for each country that was used in the analysis. Country Number of areas Analysis level Botswana 27 Health district Cameroon 58 Department Kenya 47 County Lesotho 10 District Mozambique 161 District Malawi 33 Health district and cities Namibia 38 District Eswatini 4 Region Tanzania 195 District Uganda 136 District South Africa 52 District Zambia 116 District Zimbabwe 63 District B.4 Survey questions and risk group allocation Table B.6: The behavioural survey questions included in AIDS Indicator Survey (AIS) and Demographic and Health Surveys (DHS) used to determine AGYW risk group membership. Variable(s) Description \\(\\texttt{v501}\\) Current marital status of the respondent. \\(\\texttt{v529}\\) Computed time since last sexual intercourse. \\(\\texttt{v531}\\) Age at first sexual intercourse–imputed. \\(\\texttt{v766b}\\) Number of sexual partners during the last 12 months (including husband). \\(\\texttt{v767[a, b, c]}\\) Relationship with last three sexual partners. Options are: spouse, boyfriend not living with respondent, other friend, casual acquaintance, relative, commercial sex worker, live-in partner, other. \\(\\texttt{v791a}\\) Had sex in return for gifts, cash or anything else in the past 12 months. (Asked only to women 15-24 who are not in a union.) Table B.7: The behavioural survey questions included in Population-Based HIV Impact Assessment (PHIA) surveys used to determine AGYW risk group membership. Variable(s) Description \\(\\texttt{part12monum}\\) Number of sexual partners during the last 12 months (including husband). \\(\\texttt{part12modkr}\\) Reason for leaving blank. \\(\\texttt{partlivew[1, 2, 3]}\\) Does the person you had sex with live in this household? \\(\\texttt{partrelation[1, 2, 3]}\\) Relationship with last three sexual partners. Options are: husband, live-in partner, partner (not living with), ex-spouse/partner, friend/acquaintance, sex worker, sex worker client, stranger, other, don’t know, refused. \\(\\texttt{sellsx12mo}\\) Had sex for money and/or gifts in the last 12 months. \\(\\texttt{buysx12mo}\\) Paid money or given gifts for sex in the last 12 months. B.5 Additional figures Figure B.1: The proportion of posterior variance explained by each random effect, calculated as a ratio of the random effect variance posterior mean to the sum of all random effect variance posterior means. To allow calculation of this metric by country, the model was run for each country individually. Figure B.2: For the 20-24 and 25-29 age groups, the proportion of AGYW in the one cohabiting partner and non-regular or multiple partner(s) risk groups was bimodal. References UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” "],["naomi-aghq-appendix.html", "C Fast approximate Bayesian inference C.1 Epilepsy example C.2 Loa loa example C.3 AGHQ with Laplace marginals algorithm C.4 Simplified Naomi model description C.5 NUTS convergence and suitability C.6 Use of PCA-AGHQ C.7 Inference comparison", " C Fast approximate Bayesian inference C.1 Epilepsy example C.1.1 TMB C++ template // epil.cpp #include <TMB.hpp> template <class Type> Type objective_function<Type>::operator()() { DATA_INTEGER(N); DATA_INTEGER(J); DATA_INTEGER(K); DATA_MATRIX(X); DATA_VECTOR(y); DATA_MATRIX(E); // Epsilon matrix PARAMETER_VECTOR(beta); PARAMETER_VECTOR(epsilon); PARAMETER_VECTOR(nu); PARAMETER(l_tau_epsilon); PARAMETER(l_tau_nu); Type tau_epsilon = exp(l_tau_epsilon); Type tau_nu = exp(l_tau_nu); Type sigma_epsilon = sqrt(1 / tau_epsilon); Type sigma_nu = sqrt(1 / tau_nu); vector<Type> eta(X * beta + nu + E * epsilon); vector<Type> lambda(exp(eta)); Type nll; nll = Type(0.0); // Note: dgamma() is parameterised as (shape, scale) // R-INLA is parameterised as (shape, rate) nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true); nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true); nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum(); nll -= dnorm(nu, Type(0), sigma_nu, true).sum(); nll -= dnorm(beta, Type(0), Type(100), true).sum(); nll -= dpois(y, lambda, true).sum(); ADREPORT(tau_epsilon); ADREPORT(tau_nu); return(nll); } C.1.2 Modified TMB C++ template // epil_modified.cpp #include <TMB.hpp> template <class Type> Type objective_function<Type>::operator()() { DATA_INTEGER(N); DATA_INTEGER(J); DATA_INTEGER(K); DATA_MATRIX(X); DATA_VECTOR(y); DATA_MATRIX(E); // Epsilon matrix DATA_IVECTOR(x_starts); // Start index of each subvector of x DATA_IVECTOR(x_lengths); // Length of each subvector of x DATA_INTEGER(i); // Index i PARAMETER(x_i); PARAMETER_VECTOR(x_minus_i); vector<Type> x(301); int k = 0; for (int j = 0; j < 301; j++) { if (j + 1 == i) { // +1 because C++ does zero-indexing x(j) = x_i; } else { x(j) = x_minus_i(k); k++; } } vector<Type> beta = x.segment(x_starts(0), x_lengths(0)); vector<Type> epsilon = x.segment(x_starts(1), x_lengths(1)); vector<Type> nu = x.segment(x_starts(2), x_lengths(2)); PARAMETER(l_tau_epsilon); PARAMETER(l_tau_nu); Type tau_epsilon = exp(l_tau_epsilon); Type tau_nu = exp(l_tau_nu); Type sigma_epsilon = sqrt(1 / tau_epsilon); Type sigma_nu = sqrt(1 / tau_nu); vector<Type> eta(X * beta + nu + E * epsilon); vector<Type> lambda(exp(eta)); Type nll; nll = Type(0.0); // Note: dgamma() is parameterised as (shape, scale) // R-INLA is parameterised as (shape, rate) nll -= dlgamma(l_tau_epsilon, Type(0.001), Type(1.0 / 0.001), true); nll -= dlgamma(l_tau_nu, Type(0.001), Type(1.0 / 0.001), true); nll -= dnorm(epsilon, Type(0), sigma_epsilon, true).sum(); nll -= dnorm(nu, Type(0), sigma_nu, true).sum(); nll -= dnorm(beta, Type(0), Type(100), true).sum(); nll -= dpois(y, lambda, true).sum(); ADREPORT(tau_epsilon); ADREPORT(tau_nu); return(nll); } C.1.3 Stan C++ template // epil.stan data { int<lower=0> N; // Number of patients int<lower=0> J; // Number of clinic visits int<lower=0> K; // Number of predictors (inc. intercept) matrix[N * J, K] X; // Design matrix int<lower=0> y[N * J]; // Outcome variable matrix[N * J, N] E; // Epsilon matrix } parameters { vector[K] beta; // Vector of coefficients vector[N] epsilon; // Patient specific errors vector[N * J] nu; // Patient-visit errors real<lower=0> tau_epsilon; // Precision of epsilon real<lower=0> tau_nu; // Precision of nu } transformed parameters { vector[N * J] eta = X * beta + nu + E * epsilon; } model { beta ~ normal(0, 100); tau_epsilon ~ gamma(0.001, 0.001); tau_nu ~ gamma(0.001, 0.001); epsilon ~ normal(0, sqrt(1 / tau_epsilon)); nu ~ normal(0, sqrt(1 / tau_nu)); y ~ poisson_log(eta); } C.1.4 NUTS convergence and suitability C.1.4.1 tmbstan Figure C.1: Traceplots for the tmbstan parameters with the lowest ESS and highest potential scale reduction factor. These were l_tau_nu (an \\(\\text{ESS}\\) of 377) and beta[3] (an \\(\\hat R\\) of 1.006). C.1.4.2 rstan Figure C.2: Traceplots for the rstan parameters with the lowest ESS and highest potential scale reduction factor. These were tau_nu (an \\(\\text{ESS}\\) of 437) and tau_nu (an \\(\\hat R\\) of 1.009). Rather than plotting the traceplot for tau_nu twice, the parameter epsilon[18] is included, which had the second highest \\(\\hat R\\) of 1.008. C.2 Loa loa example C.2.1 NUTS convergence and suitability Figure C.3: Traceplots for the parameters with the lowest ESS and highest potential scale reduction factor for the Loa loa ELGM example. C.2.2 Inference comparison Figure C.4: Relative difference between the Gaussian and Laplace marginal posterior means and standard deviations to NUTS results at each \\(u(s_i), v(s_i): i \\in [190]\\). Absolute differences are in Figure 6.14. C.3 AGHQ with Laplace marginals algorithm This section provides the INLA-like algorithm for AGHQ with Laplace marginals used in this thesis. The algorithm for AGHQ with Gaussian marginals used in this thesis is as given in Stringer, Brown, and Stafford (2022), and implemented in the aghq package. Calculate the mode, Hessian at the mode, lower Cholesky, and Laplace approximation \\[\\begin{align} \\hat{\\boldsymbol{\\mathbf{\\theta}}} &= \\arg \\max_{\\boldsymbol{\\mathbf{\\theta}}} {\\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}, \\\\ \\hat{\\mathbf{H}} &= - \\frac{\\partial^2}{\\partial \\boldsymbol{\\mathbf{\\theta}} \\partial \\boldsymbol{\\mathbf{\\theta}}^\\top} \\log \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) \\rvert_{\\boldsymbol{\\mathbf{\\theta}} = \\hat{\\boldsymbol{\\mathbf{\\theta}}}}, \\\\ \\hat{\\mathbf{H}}^{-1} &= \\hat{\\mathbf{L}} \\hat{\\mathbf{L}}^\\top, \\\\ \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) &= \\frac{p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}})}{\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}, \\end{align}\\] where \\(\\tilde p_\\texttt{G}(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\mathcal{N}(\\mathbf{x} \\, | \\, \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}), \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1})\\) is a Gaussian approximation to \\(p(\\mathbf{x} \\, | \\, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})\\) with mode and precision matrix given by \\[\\begin{align} \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}}) &= \\arg \\max_\\mathbf{x} \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}), \\\\ \\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}}) &= - \\frac{\\partial^2}{\\partial \\mathbf{x} \\partial \\mathbf{x}^\\top} \\log p(\\mathbf{y}, \\mathbf{x}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x} = \\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})}. \\end{align}\\] Generate a set of nodes \\(\\mathbf{u} \\in \\mathcal{Q}(m, k)\\) and weights \\(\\omega: \\mathbf{u} \\to \\mathbb{R}\\) from a Gauss-Hermite quadrature rule with \\(k\\) nodes per dimension. Adapt these nodes based on the mode and lower Cholesky via \\(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}) = \\hat{\\boldsymbol{\\mathbf{\\theta}}} + \\mathbf{L} \\mathbf{u}\\). Use this quadrature rule to calculate the normalising constant \\(\\tilde p_{\\texttt{AQ}}(\\mathbf{y})\\) as follows \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = \\sum_{\\mathbf{u} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(\\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}), \\mathbf{y}) \\omega(\\mathbf{u}). \\tag{C.1} \\end{equation}\\] For \\(i \\in [N]\\) generate \\(l\\) nodes \\(x_i(\\mathbf{v})\\) via a Gauss-Hermite quadrature rule \\(\\mathbf{v} \\in \\mathcal{Q}(1, l)\\) adapted based on the mode \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_i\\) and standard deviation \\(\\sqrt{\\text{diag}[\\hat{\\mathbf{H}}(\\boldsymbol{\\mathbf{\\theta}})^{-1}]_i}\\) of the Gaussian marginal. A value of \\(l \\geq 4\\) is recommended to enable B-spline interpolation. For \\(x_i \\in \\{ x_i(\\mathbf{v}) \\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) and \\(\\boldsymbol{\\mathbf{\\theta}} \\in \\{ \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}) \\}_{\\mathbf{u} \\in \\mathcal{Q}(m, k)}\\) calculate the modes and Hessians \\[\\begin{align} \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) &= \\arg \\max_{\\mathbf{x}_{-i}} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}), \\\\ \\hat{\\mathbf{H}}_{-i, -i}(x_i, \\boldsymbol{\\mathbf{\\theta}}) &= - \\frac{\\partial^2}{\\partial \\mathbf{x}_{-i} \\partial \\mathbf{x}_{-i}^\\top} \\log p(\\mathbf{y}, x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}) \\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}, \\end{align}\\] where optimisation to obtain \\(\\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})\\) can be initialised at \\(\\hat{\\mathbf{x}}(\\boldsymbol{\\mathbf{\\theta}})_{-i}\\). For \\(x_i \\in \\{ x_i(\\mathbf{v}) \\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) calculate \\[\\begin{equation} p_\\texttt{AQ}(x_i \\, | \\, \\mathbf{y}) = \\frac{\\tilde p_\\texttt{LA}(x_i, \\mathbf{y})}{\\tilde p_{\\texttt{AQ}}(\\mathbf{y})}, \\tag{C.2} \\end{equation}\\] where \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\mathbf{y}) = \\sum_{\\mathbf{u} \\in \\mathcal{Q}(m, k)} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}(\\mathbf{u}), \\mathbf{y}) \\omega(\\mathbf{u}). \\end{equation}\\] and \\[\\begin{equation} \\tilde p_\\texttt{LA}(x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y}) = \\frac{p(x_i, \\mathbf{x}_{-i}, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})}{\\tilde p_\\texttt{G}(\\mathbf{x}_{-i} \\, | \\, x_i, \\boldsymbol{\\mathbf{\\theta}}, \\mathbf{y})} \\Big\\rvert_{\\mathbf{x}_{-i} = \\hat{\\mathbf{x}}_{-i}(x_i, \\boldsymbol{\\mathbf{\\theta}})}. \\end{equation}\\] Equation (C.2) can be calculated using the estimate of the evidence given in Equation (C.1), but it is more numerically accurate, and requires little extra computation, to use the estimate \\[\\begin{equation} \\tilde p_{\\texttt{AQ}}(\\mathbf{y}) = \\sum_{\\mathbf{v} \\in \\mathcal{Q}(1, l)} \\tilde p_\\texttt{LA}(x_i(\\mathbf{v}), \\mathbf{y}) \\omega(\\mathbf{v}) \\end{equation}\\] Given \\(\\{x_i(\\mathbf{v}), \\tilde p_\\texttt{AQ}(x_i(\\mathbf{v}) \\, | \\, \\mathbf{y})\\}_{\\mathbf{v} \\in \\mathcal{Q}(1, l)}\\) create a spline interpolant to each posterior marginal on the log-scale. Samples, and thereby relevant posterior marginal summaries, may be obtained using inverse transform sampling. C.4 Simplified Naomi model description This section describes the simplified version of the Naomi model (Eaton et al. 2021) in more detail. The concise \\(i\\) indexing used in Section 6.3 is replaced by a more complete \\(x, s, a\\) indexing. There are four sections: Section C.4.1 gives the process specifications, giving the terms in each structured additive predictor, along with their distributions. Section C.4.2 gives additional details about the likelihood terms not provided in Section 6.3. Section C.4.3 gives identifiability constraints used in circumstances where incomplete data is available for the country. Section C.4.4 provides details of the TMB implementation. C.4.1 Process specification Table C.1: The Naomi model can be conceptualised as having five processes. This table gives the number of latent field parameters and hyperparameters in each process, where \\(n\\) is the number of districts in the country. Model component Latent field Hyperparameter Section C.4.1.1 HIV prevalence \\(22 + 5n\\) 9 Section C.4.1.2 ART coverage \\(25 + 5n\\) 9 Section C.4.1.3 HIV incidence rate \\(2 + n\\) 3 Section C.4.1.4 ANC testing \\(2 + 2n\\) 2 Section C.4.1.5 ART attendance \\(n\\) 1 Total \\(51 + 14n\\) 24 C.4.1.1 HIV prevalence HIV prevalence \\(\\rho_{x, s, a} \\in [0, 1]\\) was modelled on the logit scale using the structured additive predictor \\[\\begin{equation} \\text{logit}(\\rho_{x, s, a}) = \\beta^\\rho_0 + \\beta_{S}^{\\rho, s = \\text{M}} + \\mathbf{u}^\\rho_a + \\mathbf{u}_a^{\\rho, s = \\text{M}} + \\mathbf{u}^\\rho_x + \\mathbf{u}_x^{\\rho, s = \\text{M}} + \\mathbf{u}_x^{\\rho, a < 15} + \\boldsymbol{\\mathbf{\\eta}}^\\rho_{R_x, s, a}. \\tag{C.3} \\end{equation}\\] Table C.2 provides a description of the terms included in Equation (C.3). Independent half-normal prior distributions were chosen for the five standard deviation terms \\[\\begin{equation} \\{\\sigma_A^\\rho, \\sigma_{AS}^\\rho, \\sigma_X^\\rho, \\sigma_{XS}^\\rho, \\sigma_{XA}^\\rho\\} \\sim \\mathcal{N}^{+}(0, 2.5), \\end{equation}\\] independent uniform prior distributions for the two AR1 correlation parameters \\[\\begin{equation} \\{\\phi_A^\\rho, \\phi_{AS}^\\rho\\} \\sim \\mathcal{U}(-1, 1), \\end{equation}\\] and independent beta prior distributions for the two BYM2 proportion parameters \\[\\begin{equation} \\{\\phi_X^\\rho, \\phi_{XS}^\\rho\\} \\sim \\text{Beta}(0.5, 0.5). \\end{equation}\\] Table C.2: Each term in Equation (C.3) together with, where applicable, its prior distribution and a written description of its role. Term Distribution Description \\(\\beta^\\rho_0\\) \\(\\mathcal{N}(0, 5)\\) Intercept \\(\\beta_{s}^{\\rho, s = \\text{M}}\\) \\(\\mathcal{N}(0, 5)\\) The difference in logit prevalence for men compared to women \\(\\mathbf{u}^\\rho_a\\) \\(\\text{AR}1(\\sigma_A^\\rho, \\phi_A^\\rho)\\) Age random effects for women \\(\\mathbf{u}_a^{\\rho, s = \\text{M}}\\) \\(\\text{AR}1(\\sigma_{AS}^\\rho, \\phi_{AS}^\\rho)\\) Age random effects for the difference in logit prevalence for men compared to women age \\(a\\) \\(\\mathbf{u}^\\rho_x\\) \\(\\text{BYM}2(\\sigma_X^\\rho, \\phi_X^\\rho)\\) Spatial random effects for women \\(\\mathbf{u}_x^{\\rho, s = \\text{M}}\\) \\(\\text{BYM}2(\\sigma_{XS}^\\rho, \\phi_{XS}^\\rho)\\) Spatial random effects for the difference in logit prevalence for men compared to women in district \\(x\\) \\(\\mathbf{u}_x^{\\rho, a < 15}\\) \\(\\text{ICAR}(\\sigma_{XA}^\\rho)\\) Spatial random effects for the difference in logit paediatric prevalence to adult women prevalence in district \\(x\\) \\(\\boldsymbol{\\mathbf{\\eta}}^\\rho_{R_x, s, a}\\) \\(-\\) Fixed offsets specifying assumed odds ratios for prevalence outside the age ranges for which data were available. Calculated from Spectrum model (Stover et al. 2019) outputs for region \\(R_x\\) C.4.1.2 ART coverage ART coverage \\(\\alpha_{x, s, a} \\in [0, 1]\\) was modelled on the logit scale using the structured additive predictor \\[\\begin{equation} \\text{logit}(\\alpha_{x, s, a}) = \\beta^\\alpha_0 + \\beta_{S}^{\\alpha, s = \\text{M}} + \\mathbf{u}^\\alpha_a + \\mathbf{u}_a^{\\alpha, s = \\text{M}} + \\mathbf{u}^\\alpha_x + \\mathbf{u}_x^{\\alpha, s = \\text{M}} + \\mathbf{u}_x^{\\alpha, a < 15} + \\boldsymbol{\\mathbf{\\eta}}^\\alpha_{R_x, s, a} \\end{equation}\\] with terms and prior distributions analogous to the HIV prevalence process model in Section C.4.1.1 above. C.4.1.3 HIV incidence rate HIV incidence rate \\(\\lambda_{x, s, a} > 0\\) was modelled on the log scale using the structured additive predictor \\[\\begin{equation} \\log(\\lambda_{x, s, a}) = \\beta_0^\\lambda + \\beta_S^{\\lambda, s = \\text{M}} + \\log(\\rho_{x}^{\\text{15-49}}) + \\log(1 - \\omega \\cdot \\alpha_{x}^{\\text{15-49}}) + \\mathbf{u}_x^\\lambda + \\boldsymbol{\\mathbf{\\eta}}_{R_x, s, a}^\\lambda. \\tag{C.4} \\end{equation}\\] Table C.3 provides a description of the terms included in Equation (C.4). Table C.3: Each term in Equation (C.4) together with, where applicable, its prior distribution and a written description of its role. Term Distribution Description \\(\\beta^\\lambda_0\\) \\(\\mathcal{N}(0, 5)\\) Intercept term proportional to the average HIV transmission rate for untreated HIV positive adults \\(\\beta_S^{\\lambda, s = \\text{M}}\\) \\(\\mathcal{N}(0, 5)\\) The log incidence rate ratio for men compared to women \\(\\rho_{x}^{\\text{15-49}}\\) \\(-\\) The HIV prevalence among adults 15-49 in district \\(x\\) calculated by aggregating age-specific HIV prevalences \\(\\alpha_{x}^{\\text{15-49}}\\) \\(-\\) The ART coverage among adults 15-49 in district \\(x\\) calculated by aggregating age-specific ART coverages \\(\\omega = 0.7\\) \\(-\\) Average reduction in HIV transmission rate per increase in population ART coverage fixed based on inputs to the Estimation and Projection Package (EPP) model \\(\\mathbf{u}_x^\\lambda\\) \\(\\mathcal{N}(0, \\sigma^\\lambda)\\) IID spatial random effects with \\(\\sigma^\\lambda \\sim \\mathcal{N}^+(0, 1)\\) \\(\\boldsymbol{\\mathbf{\\eta}}^\\lambda_{R_x, s, a}\\) \\(-\\) Fixed log incidence rate ratios by sex and age group calculated from Spectrum model outputs for region \\(R_x\\) The proportion recently infected among HIV positive persons \\(\\kappa_{x, s, a} \\in [0, 1]\\) was modelled as \\[\\begin{equation} \\kappa_{x, s, a} = 1 - \\exp \\left(- \\lambda_{x, s, a} \\cdot \\frac{1 - \\rho_{x, s, a}}{\\rho_{x, s, a}} \\cdot (\\Omega_T - \\beta_T ) - \\beta_T \\right), \\end{equation}\\] where \\(\\Omega_T \\sim \\mathcal{N}(\\Omega_{T_0}, \\sigma^{\\Omega_T})\\) is the mean duration of recent infection, and \\(\\beta_T \\sim \\mathcal{N}^{+}(\\beta_{T_0}, \\sigma^{\\beta_T})\\) is the false recent ratio. The prior distribution for \\(\\Omega_T\\) was informed by the characteristics of the recent infection testing algorithm. For PHIA surveys this was \\(\\Omega_{T_0} = 130 \\text{ days}\\) and \\(\\sigma^{\\Omega_T} = 6.12 \\text{ days}\\). For PHIA surveys there was assumed to be no false recency, such that \\(\\beta_{T_0} = 0.0\\), \\(\\sigma^{\\beta_T} = 0.0\\), and \\(\\beta_T = 0\\). C.4.1.4 ANC testing HIV prevalence \\(\\rho_{x, a}^\\text{ANC}\\) and ART coverage \\(\\alpha_{x, a}^\\text{ANC}\\) among pregnant women were modelled as being offset on the logit scale from the corresponding district-age indicators \\(\\rho_{x, F, a}\\) and \\(\\alpha_{x, F, a}\\) according to \\[\\begin{align} \\text{logit}(\\rho_{x, a}^{\\text{ANC}}) &= \\text{logit}(\\rho_{x, F, a}) + \\beta^{\\rho^{\\text{ANC}}} + \\mathbf{u}_x^{\\rho^{\\text{ANC}}} + \\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\rho^{\\text{ANC}}}, \\tag{C.5} \\\\ \\text{logit}(\\alpha_{x, a}^{\\text{ANC}}) &= \\text{logit}(\\alpha_{x, F, a}) + \\beta^{\\alpha^{\\text{ANC}}} + \\mathbf{u}_x^{\\alpha^{\\text{ANC}}} + \\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\alpha^{\\text{ANC}}} \\tag{C.6}. \\end{align}\\] Table C.4 provides a description of the terms included in Equation (C.5) and Equation (C.6). Table C.4: Each term in Equations (C.5) and (C.6) together with (where applicable) its prior distribution and a written description of its role. The notation \\(\\theta\\) is used as stand in for \\(\\theta \\in \\{\\rho, \\alpha\\}\\). Term Distribution Description \\(\\beta^{\\theta^{\\text{ANC}}}\\) \\(\\mathcal{N}(0, 5)\\) Intercept giving the average difference between population and ANC outcomes \\(\\mathbf{u}_x^{\\theta^{\\text{ANC}}}\\) \\(\\mathcal{N}(0, \\sigma_X^{\\theta^{\\text{ANC}}})\\) IID district random effects with \\(\\sigma_X^{\\theta^{\\text{ANC}}} \\sim \\mathcal{N}^+(0, 1)\\) \\(\\boldsymbol{\\mathbf{\\eta}}_{R_x, a}^{\\theta^{\\text{ANC}}}\\) \\(-\\) Offsets for the log fertility rate ratios for HIV positive women compared to HIV negative women and for women on ART to HIV positive women not on ART, calculated from Spectrum model outputs for region \\(R_x\\) In the full Naomi model, for adult women 15-49 the number of ANC clients \\(\\Psi_{x, a} > 0\\) were modelled as \\[\\begin{equation} \\log (\\Psi_{x, a}) = \\log (N_{x, \\text{F}, a}) + \\psi_{R_x, a} + \\beta^\\psi + \\mathbf{u}_x^\\psi, \\end{equation}\\] where \\(N_{x, \\text{F}, a}\\) are the female population sizes, \\(\\psi_{R_x, a}\\) are fixed age-sex fertility ratios in Spectrum region \\(R_x\\), \\(\\beta^\\psi\\) are log rate ratios for the number of ANC clients relative to the predicted fertility, and \\(\\mathbf{u}_x^\\psi \\sim \\mathcal{N}(0, \\sigma^\\psi)\\) are district random effects. Here these terms are fixed to \\(\\beta^\\psi = 0\\) and \\(\\mathbf{u}_x^\\psi = \\mathbf{0}\\) such that \\(\\Psi_{x, a}\\) are simply constants. C.4.1.5 ART attendance Let \\(\\gamma_{x, x'} \\in [0, 1]\\) be the probability that a person on ART residing in district \\(x\\) receives ART in district \\(x'\\). Assume that \\(\\gamma_{x, x'} = 0\\) for \\(x \\notin \\{x, \\text{ne}(x)\\}\\) such that individuals seek treatment only in their residing district or its neighbours \\(\\text{ne}(x) = \\{x': x' \\sim x\\}\\), where \\(\\sim\\) is an adjacency relation, and \\(\\sum_{x' \\in \\{x, \\text{ne}(x)\\}} \\gamma_{x, x'} = 1\\). The probabilities \\(\\gamma_{x, x'}\\) for \\(x \\sim x'\\) were modelled using multinomial logistic regression model, based on the log-odds ratios \\[\\begin{equation} \\tilde \\gamma_{x, x'} = \\log \\left( \\frac{\\gamma_{x, x'}}{1 - \\gamma_{x, x'}} \\right) = \\tilde \\gamma_0 + \\mathbf{u}_x^{\\tilde \\gamma}. \\tag{C.7} \\end{equation}\\] Table C.5 provides a description of the terms included in Equation (C.7). Fixing \\(\\tilde \\gamma_{x, x} = 0\\) then the multinomial probabilities may be recovered using the softmax \\[\\begin{equation} \\gamma_{x, x'} = \\frac{\\exp(\\tilde \\gamma_{x, x'})}{\\sum_{x^\\star \\in \\{x, \\text{ne}(x)\\}} \\exp(\\tilde \\gamma_{x, x^\\star})}. \\end{equation}\\] Table C.5: Each term in Equation (C.7) together with, where applicable, its prior distribution and a written description of its role. As no terms include \\(x'\\), \\(\\gamma_{x, x'}\\) is only a function of \\(x\\). Term Distribution Description \\(\\tilde \\gamma_0\\) \\(-\\) Fixed intercept \\(\\tilde \\gamma_0 = -4\\). Implies a prior mean on \\(\\gamma_{x, x'}\\) of 1.8%, such that a-priori \\((100 - 1.8 \\times \\text{ne}(x))\\%\\) of ART clients in district \\(x\\) obtain treatment in their home district \\(\\mathbf{u}_x^{\\tilde \\gamma}\\) \\(\\mathcal{N}(0, \\sigma_X^{\\tilde \\gamma})\\) District random effects, with \\(\\sigma_X^{\\tilde \\gamma} \\sim \\mathcal{N}^+(0, 2.5)\\) C.4.2 Additional likelihood specification Though Section 6.3 provides a complete description of Naomi’s likelihood specification, any additional useful details are provided here. C.4.2.1 Household survey data The generalised binomial \\(y \\sim \\text{xBin}(m, p)\\) is defined for \\(y, m \\in \\mathbb{R}^+\\) with \\(y \\leq m\\) such that \\[\\begin{align} \\log p(y) = &\\log \\Gamma(m + 1) - \\log \\Gamma(y + 1) \\\\ &- \\log \\Gamma(m - y + 1) + y \\log p + (m - y) \\log(1 - p), \\end{align}\\] where the gamma function \\(\\Gamma\\) is such that \\(\\forall n \\in \\mathbb{N}\\), \\(\\Gamma(n) = (n - 1)!\\). C.4.3 Identifiability constraints If data are missing, some parameters are fixed to default values to help with identifiability. In particular: If survey data on HIV prevalence or ART coverage by age and sex are not available then \\(\\mathbf{u}_a^\\theta = 0\\) and \\(\\mathbf{u}_{a, s = \\text{M}}^\\theta = 0\\). In this case, the average age-sex pattern from the Spectrum is used. For the Malawi case-study (Section 6.5), HIV prevalence and ART coverage data are not available for those aged 65+. As a result, there are \\(|\\{\\text{0-4}, \\ldots, \\text{50-54}\\}| = 13\\) age groups included for the age random effects. If no ART data, either survey or ART programme, are available but data on ART coverage among ANC clients are available, the level of ART coverage is not identifiable, but spatial variation is identifiable. In this instance, overall ART coverage is determined by the Spectrum offset, and only area random effects are estimated such that \\[\\begin{equation} \\text{logit} \\left(\\alpha_{x, s, a} \\right) = \\mathbf{u}_x^\\alpha + \\boldsymbol{\\mathbf{\\eta}}_{R_x, s, a}^\\alpha. \\end{equation}\\] If survey data on recent HIV infection are not included in the model, then \\(\\beta_0^\\lambda = \\beta_S^{\\lambda, s = \\text{M}} = 0\\) and \\(\\mathbf{u}_x^\\lambda = \\mathbf{0}\\). The sex ratio for HIV incidence is determined by the sex incidence rate ratio from Spectrum, and the incidence rate in all districts is modelled assuming the same average HIV transmission rate for untreated adults, but varies according to district-level estimates of HIV prevalence and ART coverage. C.4.4 Implementation The TMB C++ code for the negative log-posterior of the simplified Naomi model is available from https://github.com/athowes/naomi-aghq. For ease of understanding, Table C.6 provides correspondence between the mathematical notation used in Section C.4 and the variable names used in the TMB code, for all hyperparameters and latent field parameters. For further reference on the TMB software see Kristensen (2021). Table C.6: Correspondence between the variable name used in the Naomi TMB template and the mathematical notation used in Appendix C.4. The parameter type, either a hyperparameter or element of the latent field, is also given. All of the parameters are defined on the real-scale in some dimension. In the final three columns (\\(\\rho\\), \\(\\alpha\\), and \\(\\lambda\\)) indication is given as to which component of the model the parameter is primarily used in. Variable name Notation Type Domain \\(\\rho\\) \\(\\alpha\\) \\(\\lambda\\) logit_phi_rho_x \\(\\text{logit}(\\phi_X^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_x \\(\\log(\\sigma_X^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_xs \\(\\text{logit}(\\phi_{XS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_xs \\(\\log(\\sigma_{XS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_a \\(\\text{logit}(\\phi_A^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_a \\(\\log(\\sigma_A^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_rho_as \\(\\text{logit}(\\phi_{AS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_as \\(\\log(\\sigma_{AS}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_rho_xa \\(\\log(\\sigma_{XA}^\\rho)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_x \\(\\text{logit}(\\phi_X^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_x \\(\\log(\\sigma_X^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_xs \\(\\text{logit}(\\phi_{XS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_xs \\(\\log(\\sigma_{XS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_a \\(\\text{logit}(\\phi_A^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_a \\(\\log(\\sigma_A^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes logit_phi_alpha_as \\(\\text{logit}(\\phi_{AS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_as \\(\\log(\\sigma_{AS}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_alpha_xa \\(\\log(\\sigma_{XA}^\\alpha)\\) Hyper \\(\\mathbb{R}\\) Yes OmegaT_raw \\(\\Omega_T\\) Hyper \\(\\mathbb{R}\\) Yes log_betaT \\(\\log(\\beta_T)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_lambda_x \\(\\log(\\sigma^\\lambda)\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_ancrho_x \\(\\log(\\sigma_X^{\\rho^{\\text{ANC}}})\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_ancalpha_x \\(\\log(\\sigma_X^{\\alpha^{\\text{ANC}}})\\) Hyper \\(\\mathbb{R}\\) Yes log_sigma_or_gamma \\(\\log(\\sigma_X^{\\tilde \\gamma})\\) Hyper \\(\\mathbb{R}\\) beta_rho \\((\\beta^\\rho_0, \\beta_{s}^{\\rho, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_alpha \\((\\beta^\\alpha_0, \\beta_{S}^{\\alpha, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_lambda \\((\\beta_0^\\lambda, \\beta_S^{\\lambda, s = \\text{M}})\\) Latent \\(\\mathbb{R}^2\\) Yes beta_anc_rho \\(\\beta^{\\rho^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}\\) Yes beta_anc_alpha \\(\\beta^{\\alpha^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}\\) Yes u_rho_x \\(\\mathbf{w}^\\rho_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_rho_x \\(\\mathbf{v}^\\rho_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_rho_xs \\(\\mathbf{w}_x^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_rho_xs \\(\\mathbf{v}_x^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_rho_a \\(\\mathbf{u}^\\rho_a\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_rho_as \\(\\mathbf{u}_a^{\\rho, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_rho_xa \\(\\mathbf{u}_x^{\\rho, a < 15}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_x \\(\\mathbf{w}^\\alpha_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_alpha_x \\(\\mathbf{v}^\\alpha_x\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_xs \\(\\mathbf{w}_x^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes us_alpha_xs \\(\\mathbf{v}_x^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes u_alpha_a \\(\\mathbf{u}^\\alpha_a\\) Latent \\(\\mathbb{R}^{13}\\) Yes u_alpha_as \\(\\mathbf{u}_a^{\\alpha, s = \\text{M}}\\) Latent \\(\\mathbb{R}^{10}\\) Yes u_alpha_xa \\(\\mathbf{u}_x^{\\alpha, a < 15}\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_lambda_x \\(\\mathbf{u}_x^\\lambda\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_anc_rho_x \\(\\mathbf{u}_x^{\\rho^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes ui_anc_alpha_x \\(\\mathbf{u}_x^{\\alpha^{\\text{ANC}}}\\) Latent \\(\\mathbb{R}^{n}\\) Yes log_or_gamma \\(\\mathbf{u}_x^{\\tilde \\gamma}\\) Latent \\(\\mathbb{R}^{n}\\) C.5 NUTS convergence and suitability Figure C.5: For NUTS run on the Naomi ELGM, the maximum potential scale reduction factor was 1.021, below the value of 1.05 typically used as a cutoff for acceptable chain mixing, indicating that the results are acceptable to use. Additionally, the vast majority (93.7%) of \\(\\hat R\\) values were less than 1.1. Figure C.6: The efficiency of the NUTS, as measured by the ratio of effective sample size to total number of iterations run, was low for most parameters (Panel A). As a result, the number of iterations required for the the effective number of samples (mean 1265) to be satisfactory was high (Panel B). Figure C.7: Traceplots for the parameter with the lowest ESS which was log_sigma_alpha_xs (an \\(\\text{ESS}\\) of 208, Panel A) and highest potential scale reduction factor which was ui_lambda_x[10] (an \\(\\hat R\\) of 1.021, Panel B). Figure C.8: Pairs plots for the parameters \\(\\log(\\sigma_{A}^\\rho)\\) and \\(\\text{logit}(\\phi_{A}^\\rho)\\), or log_sigma_rho_a and logit_phi_rho_a as implemented in code. These parameters are the log standard deviation and logit lag-one correlation parameter of an AR1 process. In the posterior distribution obtained with NUTS, they have a high degree of correlation. Figure C.9: Pairs plots for the parameters \\(\\log(\\sigma_X^\\alpha)\\) and \\(\\text{logit}(\\phi_X^\\alpha)\\), or log_sigma_alpha_x and logit_phi_alpha_x as implemented in code. These parameters are the log standard deviation and logit BYM2 proportion parameter of a BYM2 process. In the posterior distribution obtained with NUTS, they are close to uncorrelated. Figure C.10: Prior standard deviations were calculated by using NUTS to simulate from the prior distribution. This approach is more convenient than simulating directly from the model, but can lead to inaccuracies. Figure C.11: The posterior contraction for each parameter in the model. Values are averaged for parameters of length greater than one. The posterior contraction is zero when the prior distribution and posterior distribution have the same standard deviation. This could indicate that the data is not informative about the parameter. The closer the posterior contraction is to one, the more than the marginal posterior distribution has concentrated about a single point. C.6 Use of PCA-AGHQ Figure C.12: The standard deviation of the quadrature nodes can be used as a measure of coverage of the posterior marginal distribution. Nodes spaced evenly within the marginal distribution would be expected to uniformly distributed quantile, corresponding to a standard deviation of 0.2867, shown as a dashed line. Figure C.13: The estimated posterior marginal standard deviation of each hyperparameter varied substantially based on its scale, either logarithmic or logistic. Figure C.14: The logarithm of the normalising constant estimated using PCA-AGHQ and a range of possible values of \\(k = 2, 3, 5\\) and \\(s \\leq 8\\). Using this range of settings, there was not convergence of the logarithm of the normalising constant estimate. The time taken by GPCA-AGHQ increases exponentially with number of PCA-AGHQ dimensions kept. C.7 Inference comparison C.7.1 Point estimates Figure C.15: Differences in Naomi model output posterior means as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. In all cases, error is reduced by GPCA-AGHQ, most of all for ART coverage. Figure C.16: Differences in Naomi model output posterior standard deviations as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. Error is increased by GPCA-AGHQ for HIV prevalence and HIV incidence, and reduced for ART coverage. C.7.2 Distributional quantities Figure C.17: The Kolmogorov-Smirnov (KS) test statistic for each latent field parameter is correlated with the effective sample size (ESS) from NUTS, for both GEB and GPCA-AGHQ. This may be because parameters which are harder to estimate with INLA-like methods also have posterior distributions which are more difficult to sample from. Alternatively, it may be that high KS values are caused by inaccurate NUTS estimates generated by limited effective samples. Akaike, Hirotugu. 1973. “Information theory as an extension of the maximum likelihood principle–In: Second International Symposium on Information Theory (Eds) BN Petrov, F.” Csaki. BNPBF Csaki Budapest: Academiai Kiado. Aldor-Noiman, Sivan, Lawrence D Brown, Andreas Buja, Wolfgang Rolke, and Robert A Stine. 2013. “The power to see: A new graphical test of normality.” The American Statistician 67 (4): 249–60. Arambepola, Rohan, Tim CD Lucas, Anita K Nandi, Peter W Gething, and Ewan Cameron. 2022. “A simulation study of disaggregation regression for spatial disease mapping.” Statistics in Medicine 41 (1): 1–16. Auvert, Bertran, Dirk Taljaard, Emmanuel Lagarde, Joelle Sobngwi-Tambekou, Rémi Sitta, and Adrian Puren. 2005. “Randomized, controlled intervention trial of male circumcision for reduction of HIV infection risk: the ANRS 1265 Trial.” PLOS Medicine 2 (11): e298. Bachl, Fabian E, Finn Lindgren, David L Borchers, and Janine B Illian. 2019. “inlabru: an R package for Bayesian spatial modelling from ecological survey data.” Methods in Ecology and Evolution 10 (6): 760–66. Baeten, Jared M, Deborah Donnell, Patrick Ndase, Nelly R Mugo, James D Campbell, Jonathan Wangisi, Jordan W Tappero, et al. 2012. “Antiretroviral Prophylaxis for HIV Prevention in Heterosexual Men and Women.” New England Journal of Medicine 367 (5): 399–410. Bailey, Michael A. 2023. “A New Paradigm for Polling.” Harvard Data Science Review 5 (3). Bailey, Robert C, Stephen Moses, Corette B Parker, Kawango Agot, Ian Maclean, John N Krieger, Carolyn FM Williams, Richard T Campbell, and Jeckoniah O Ndinya-Achola. 2007. “Male circumcision for HIV prevention in young men in Kisumu, Kenya: a randomised controlled trial.” The Lancet 369 (9562): 643–56. Baker, Stuart G. 1994. “The multinomial-Poisson transformation.” Journal of the Royal Statistical Society: Series D (The Statistician) 43 (4): 495–504. Baral, Stefan, Chris Beyrer, Kathryn Muessig, Tonia Poteat, Andrea L Wirtz, Michele R Decker, Susan G Sherman, and Deanna Kerrigan. 2012. “Burden of HIV among female sex workers in low-income and middle-income countries: a systematic review and meta-analysis.” The Lancet Infectious Diseases 12 (7): 538–49. Barré-Sinoussi, Françoise, Jean-Claude Chermann, Fran Rey, Marie Therese Nugeyre, Sophie Chamaret, Jacqueline Gruest, Charles Dauguet, et al. 1983. “Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS).” Science 220 (4599): 868–71. Baydin, Atılım Günes, Barak A Pearlmutter, Alexey Andreyevich Radul, and Jeffrey Mark Siskind. 2017. “Automatic differentiation in machine learning: a survey.” The Journal of Machine Learning Research 18 (1): 5595–5637. Bell, Bradley. 2023. “CppAD: a package for C++ algorithmic differentiation.” http://www.coin-or.org/CppAD. Bennett, James E, Helen Tamura-Wicks, Robbie M Parks, Richard T Burnett, C Arden Pope III, Matthew J Bechle, Julian D Marshall, Goodarz Danaei, and Majid Ezzati. 2019. “Particulate matter air pollution and national and county life expectancy loss in the USA: A spatiotemporal analysis.” PLOS Medicine 16 (7): e1002856. Berger, James. 2006. “The Case for objective Bayesian analysis.” Bayesian Analysis 1 (3): 385–402. Berild, Martin Outzen, Sara Martino, Virgilio Gómez-Rubio, and Håvard Rue. 2022. “Importance Sampling with the Integrated Nested Laplace Approximation.” Journal of Computational and Graphical Statistics 31 (4): 1225–37. Bernardo, José M, and Adrian FM Smith. 2001. Bayesian theory. John Wiley & Sons. Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. Best, N, N Arnold, A Thomas, L Waller, and E Conlon. 1999. “Bayesian models for spatially correlated disease and exposure data.” In Bayesian Statistics 6: Proceedings of the Sixth Valencia International Meeting, 6:131. Oxford University Press. Best, Nicky, Sylvia Richardson, and Andrew Thomson. 2005. “A comparison of Bayesian spatial models for disease mapping.” Statistical Methods in Medical Research 14 (1): 35–59. Betancourt, Michael. 2017. “Robust Gaussian processes in Stan.” https://betanalpha.github.io/assets/case\\%5Fstudies/gp\\%5Fpart3/part3.html. Bhatt, Samir, DJ Weiss, E Cameron, D Bisanzio, B Mappin, U Dalrymple, KE Battle, et al. 2015. “The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015.” Nature 526 (7572): 207–11. Bilodeau, Blair, Alex Stringer, and Yanbo Tang. 2022. “Stochastic convergence rates and applications of adaptive quadrature in Bayesian inference.” Journal of the American Statistical Association, 1–11. Bivand, Roger S, Edzer J Pebesma, Virgilio Gómez-Rubio, and Edzer Jan Pebesma. 2008. Applied spatial data analysis with R. Springer. Blangiardo, Marta, Michela Cameletti, Gianluca Baio, and Håvard Rue. 2013. “Spatial and spatio-temporal models with R-INLA.” Spatial and Spatio-Temporal Epidemiology 4: 33–49. Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational inference: A review for statisticians.” Journal of the American Statistical Association 112 (518): 859–77. Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12. Bollhöfer, Matthias, Olaf Schenk, Radim Janalik, Steve Hamm, and Kiran Gullapalli. 2020. “State-of-the-art sparse direct solvers.” Parallel Algorithms in Computational Science and Engineering, 3–33. Bosse, Nikos I, Sam Abbott, Anne Cori, Edwin van Leeuwen, Johannes Bracher, and Sebastian Funk. 2023. “Scoring epidemiological forecasts on transformed scales.” PLOS Computational Biology 19 (8): e1011393. Bosse, Nikos I., Hugo Gruson, Anne Cori, Edwin van Leeuwen, Sebastian Funk, and Sam Abbott. 2022. “Evaluating Forecasts with scoringutils in R.” arXiv. https://arxiv.org/abs/2205.07090. Box, George EP, and Kenneth B Wilson. 1992. “On the experimental attainment of optimum conditions.” In Breakthroughs in Statistics: Methodology and Distribution, 270–310. Springer. Bradley, Valerie C, Shiro Kuriwaki, Michael Isakov, Dino Sejdinovic, Xiao-Li Meng, and Seth Flaxman. 2021. “Unrepresentative Big Surveys Significantly Overestimated US Vaccine Uptake.” Nature 600 (7890): 695–700. Breslow, Norman E, and David G Clayton. 1993. “Approximate inference in generalized linear mixed models.” Journal of the American Statistical Association 88 (421): 9–25. Brier, Glenn W. 1950. “Verification of forecasts expressed in terms of probability.” Monthly Weather Review 78 (1): 1–3. Brooks, Mollie E, Kasper Kristensen, Koen J Van Benthem, Arni Magnusson, Casper W Berg, Anders Nielsen, Hans J Skaug, Martin Machler, and Benjamin M Bolker. 2017. “glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling.” The R Journal 9 (2): 378–400. Brown, Patrick E. 2015. “Model-based geostatistics the easy way.” Journal of Statistical Software 63: 1–24. Broyles, Laura N, Robert Luo, Debi Boeras, and Lara Vojnov. 2023. “The risk of sexual transmission of HIV in individuals with low-level HIV viraemia: a systematic review.” The Lancet. Brugh, Kristen N, Quinn Lewis, Cameron Haddad, Jon Kumaresan, Timothy Essam, and Michelle S Li. 2021. “Characterizing and mapping the spatial variability of HIV risk among adolescent girls and young women: A cross-county analysis of population-based surveys in Eswatini, Haiti, and Mozambique.” PLOS One 16 (12): e0261520. Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01. Bürkner, Paul-Christian, Jonah Gabry, and Aki Vehtari. 2020. “Approximate Leave-Future-Out Cross-Validation for Bayesian Time Series Models.” Journal of Statistical Computation and Simulation 90 (14): 2499–2523. Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A probabilistic programming language.” Journal of Statistical Software 76 (1). Casella, George. 1985. “An introduction to empirical Bayes data analysis.” The American Statistician 39 (2): 83–87. CDC. 2014. “Understanding the HIV Care Continuum.” CDC. http://www.cdc.gov/hiv/pdf/dhap_continuum.pdf. Chau, Siu Lun, Shahine Bouabid, and Dino Sejdinovic. 2021. “Deconditional downscaling with Gaussian processes.” Advances in Neural Information Processing Systems 34: 17813–25. Chen, Cici, Jon Wakefield, and Thomas Lumely. 2014. “The use of sampling weights in Bayesian hierarchical models for small area estimation.” Spatial and Spatio-Temporal Epidemiology 11: 33–43. Chiuchiolo, Cristian, Janet van Niekerk, and Håvard Rue. 2023. “Joint Posterior Inference for Latent Gaussian Models with r-INLA.” Journal of Statistical Computation and Simulation 93 (5): 723–52. Chopin, Nicolas, Omiros Papaspiliopoulos, et al. 2020. An introduction to sequential Monte Carlo. Vol. 4. Springer. Cleland, John, J Ties Boerma, Michel Caraël, and Sharon S Weir. 2004. “Monitoring sexual behaviour in general populations: a synthesis of lessons of the past decade.” Sexually Transmitted Infections 80 (suppl 2): ii1–7. Cohen, Myron S, Ying Q Chen, Marybeth McCauley, Theresa Gamble, Mina C Hosseinipour, Nagalingeswaran Kumarasamy, James G Hakim, et al. 2011. “Prevention of HIV-1 infection with early antiretroviral therapy.” New England Journal of Medicine 365 (6): 493–505. Cooper, Alex, Dan Simpson, Lauren Kennedy, Catherine Forbes, and Aki Vehtari. 2024. “Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors.” Bayesian Analysis 1 (1): 1–25. Cramb, SM, EW Duncan, PD Baade, and KL Mengersen. 2018. “Investigation of Bayesian spatial models.” Cancer Council Queensland; Queensland University of Technology (QUT). Crampin, Amelia C, Albert Dube, Sebastian Mboma, Alison Price, Menard Chihana, Andreas Jahn, Angela Baschieri, et al. 2012. “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology 41 (3): 676–85. Cressie, Noel, and Christopher K Wikle. 2015. Statistics for spatio-temporal data. John Wiley & Sons. Csárdi, Gábor. 2023. cranlogs: Download Logs from the ’RStudio’ ’CRAN’ Mirror. Davis, Philip J, and Philip Rabinowitz. 1975. Methods of numerical integration. Academic Press. Dawid, A Philip. 1984. “Present position and potential developments: Some personal views statistical theory the prequential approach.” Journal of the Royal Statistical Society: Series A (General) 147 (2): 278–90. de Valpine, Perry, Christopher Paciorek, Daniel Turek, Nick Michaud, Cliff Anderson-Bergman, Fritz Obermeyer, Claudia Wehrhahn Cortes, Abel Rodrìguez, Duncan Temple Lang, and Sally Paganin. 2023. NIMBLE User Manual (version 1.0.1). https://doi.org/10.5281/zenodo.1211190. Dean, CB, MD Ugarte, and AF Militino. 2001. “Detecting interaction between random region and fixed age effects in disease mapping.” Biometrics 57 (1): 197–202. Dempster, Arthur P, Nan M Laird, and Donald B Rubin. 1977. “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society: Series B (Methodological) 39 (1): 1–22. Dennis Jr, John E, David M Gay, and Roy E Walsh. 1981. “An adaptive nonlinear least-squares algorithm.” ACM Transactions on Mathematical Software (TOMS) 7 (3): 348–68. Diaz, Jose Monsalve, Swaroop Pophale, Oscar Hernandez, David E Bernholdt, and Sunita Chandrasekaran. 2018. “OpenMP 4.5 Validation and Verification Suite for Device Offload.” In Evolving OpenMP for Evolving Architectures: 14th International Workshop on OpenMP, IWOMP 2018, Barcelona, Spain, September 26–28, 2018, Proceedings 14, 82–95. Springer. Diggle, Peter J, and Emanuele Giorgi. 2016. “Model-based geostatistics for prevalence mapping in low-resource settings.” Journal of the American Statistical Association 111 (515): 1096–1120. Diggle, Peter J, Paula Moraga, Barry Rowlingson, Benjamin M Taylor, et al. 2013. “Spatial and spatio-temporal log-Gaussian Cox processes: extending the geostatistical paradigm.” Statistical Science 28 (4): 542–63. Dominguez, Kenneth L., Dawn K. Smith, Vasavi Thomas, Nicole Crepaz, Karen Lang, Walid Heneine, Janet M. McNicholl, et al. 2016. “Updated Guidelines for Antiretroviral Postexposure Prophylaxis After Sexual, Injection Drug Use, or Other Nonoccupational Exposure to HIV—United States, 2016.” https://stacks.cdc.gov/view/cdc/38856. Donegan, Connor. 2022. “geostan: An R package for Bayesian spatial analysis.” The Journal of Open Source Software 7 (79): 4716. https://doi.org/10.21105/joss.04716. Duane, Simon, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195 (2): 216–22. Duncan, Earl W, Nicole M White, and Kerrie Mengersen. 2017. “Spatial smoothing in Bayesian models: a comparison of weights matrix specifications and their impact on inference.” International Journal of Health Geographics 16 (1): 1–16. Dwyer-Lindgren, Laura, Michael A Cork, Amber Sligar, Krista M Steuben, Kate F Wilson, Naomi R Provost, Benjamin K Mayala, et al. 2019. “Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017.” Nature 570 (7760): 189–93. Dwyer-Lindgren, Laura, Abraham D Flaxman, Marie Ng, Gillian M Hansen, Christopher JL Murray, and Ali H Mokdad. 2015. “Drinking patterns in US counties from 2002 to 2012.” American Journal of Public Health 105 (6): 1120–27. Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Economist Impact. 2023. “A triple dividend: the health, social and economic gains from financing the HIV response in Africa.” Esra, Rachel, Mpho Mmelesi, Akeem T. Ketlogetswe, Timothy M. Wolock, Adam Howes, Tlotlo Nong, Matshelo Tina Matlhaga, Siphiwe Ratladi, Dinah Ramaabya, and Jeffrey W. Imai-Eaton. 2024. “Improved Indicators for Subnational Unmet Antiretroviral Therapy Need in the Health System: Updates to the Naomi Model in 2023.” Journal of Acquired Immune Deficiency Syndromes 95 (1S): e24–33. https://doi.org/10.1097/QAI.0000000000003324. Fattah, EA, JV Niekerk, and H Rue. 2022. “Smart gradient-an adaptive technique for improving gradient estimation.” Foundations of Data Science. Fay, Robert E, and Roger A Herriot. 1979. “Estimates of income for small places: an application of James-Stein procedures to census data.” Journal of the American Statistical Association 74 (366a): 269–77. Fisher, Ronald Aylmer. 1936. “Design of experiments.” British Medical Journal 1 (3923): 554. FitzJohn, Rich, Robert Ashton, Alex Hill, Martin Eden, Wes Hinsley, Emma Russell, and James Thompson. 2023. Orderly: Lightweight Reproducible Reporting. Flaxman, Seth R, Yu-Xiang Wang, and Alexander J Smola. 2015. “Who supported Obama in 2012? Ecological inference through distribution regression.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 289–98. Follestad, Turid, and Håvard Rue. 2003. “Modelling spatial variation in disease risk using Gaussian Markov random field proxies for Gaussian random fields.” Fournier, David A, Hans J Skaug, Johnoel Ancheta, James Ianelli, Arni Magnusson, Mark N Maunder, Anders Nielsen, and John Sibert. 2012. “AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models.” Optimization Methods and Software 27 (2): 233–49. Freni-Sterrantino, Anna, Massimo Ventrucci, and Håvard Rue. 2018. “A note on intrinsic conditional autoregressive models for disconnected graphs.” Spatial and Spatio-Temporal Epidemiology 26: 25–34. Fuglstad, Geir-Arne, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2019. “Constructing priors that penalize the complexity of Gaussian random fields.” Journal of the American Statistical Association 114 (525): 445–52. Gaedke-Merzhäuser, Lisa, Janet van Niekerk, Olaf Schenk, and Håvard Rue. 2023. “Parallelized integrated nested Laplace approximations for fast Bayesian inference.” Statistics and Computing 33 (1): 25. Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, et al. 2023. viridis(Lite) - Colorblind-Friendly Color Maps for R. https://doi.org/10.5281/zenodo.4679423. Gärtner, Thomas, Peter A Flach, Adam Kowalczyk, and Alexander J Smola. 2002. “Multi-instance kernels.” In ICML, 2:7. 3. Gelfand, Alan E, Li Zhu, and Bradley P Carlin. 2001. “On the change of support problem for spatio-temporal data.” Biostatistics 2 (1): 31–45. Gelman, Andrew. 2005. “Analysis of variance—why it is more important than ever.” ———. 2006. “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34. ———. 2007. “Struggles with survey weighting and regression modeling.” Gelman, Andrew, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin. 2013. Bayesian data analysis. CRC press. Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding predictive information criteria for Bayesian models.” Statistics and Computing 24 (6): 997–1016. Gelman, Andrew, and Donald B Rubin. 1992. “Inference from iterative simulation using multiple sequences.” Statistical Science, 457–72. Gelman, Andrew, Daniel Simpson, and Michael Betancourt. 2017. “The prior can often only be understood in the context of the likelihood.” Entropy 19 (10): 555. Gelman, Andrew, Aki Vehtari, Daniel Simpson, Charles C Margossian, Bob Carpenter, Yuling Yao, Lauren Kennedy, Jonah Gabry, Paul-Christian Bürkner, and Martin Modrák. 2020. “Bayesian workflow.” arXiv Preprint arXiv:2011.01808. Geman, Stuart, and Donald Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 6: 721–41. Giordano, Ryan, Tamara Broderick, and Michael I. Jordan. 2018. “Covariances, Robustness, and Variational Bayes.” Journal of Machine Learning Research 19 (51): 1–49. http://jmlr.org/papers/v19/17-670.html. Global Burden of Disease Collaborative Network. 2019. “Global Burden of Disease Study 2019 (GBD 2019) Results.” Institute for Health Metrics and Evaluation (IHME). https://vizhub.healthdata.org/gbd-results/. Glynn, Judith R, Ndoliwe Kayuni, Emmanuel Banda, Fiona Parrott, Sian Floyd, Monica Francis-Chizororo, Misheck Nkhata, et al. 2011. “Assessing the validity of sexual behaviour reports in a whole population survey in rural Malawi.” PLOS One 6 (7): e22840. Gneiting, Tilmann, and Adrian E Raftery. 2007. “Strictly proper scoring rules, prediction, and estimation.” Journal of the American Statistical Association 102 (477): 359–78. Godfrey-Faussett, Peter, Luisa Frescura, Quarraisha Abdool Karim, Michaela Clayton, Peter D Ghys, and 2025 prevention targets working group). 2022. “HIV Prevention for the Next Decade: Appropriate, Person-Centred, Prioritised, Effective, Combination Prevention.” PLOS Medicine 19 (9): e1004102. Goldstein, Michael. 2006. “Subjective Bayesian analysis: principles and practice.” Gómez-Rubio, Virgilio. 2020. Bayesian inference with INLA. CRC Press. Gómez-Rubio, Virgilio, and Håvard Rue. 2018. “Markov Chain Monte Carlo with the Integrated Nested Laplace Approximation.” Statistics and Computing 28: 1033–51. Goodrich, Ben, Jonah Gabry, Imad Ali, and Sam Brilleman. 2020. “Rstanarm: Bayesian Applied Regression Modeling via Stan.” https://mc-stan.org/rstanarm. Gössl, Christoff, Dorothee P Auer, and Ludwig Fahrmeir. 2001. “Bayesian spatiotemporal inference in functional magnetic resonance imaging.” Biometrics 57 (2): 554–62. Gottlieb, Michael S, Howard M Schanker, Peng Thim Fan, Andrew Saxon, Joel D Weisman, Irving Pozalski, et al. 1981. “Pneumocystis pneumonia—Los Angeles.” Morbidity and Mortality Weekly Report 30 (21): 1–3. Grabowski, M Kate, David M Serwadda, Ronald H Gray, Gertrude Nakigozi, Godfrey Kigozi, Joseph Kagaayi, Robert Ssekubugu, et al. 2017. “HIV prevention efforts and incidence of HIV in Uganda.” New England Journal of Medicine 377 (22): 2154–66. Gray, Ronald H, Godfrey Kigozi, David Serwadda, Frederick Makumbi, Stephen Watya, Fred Nalugoda, Noah Kiwanuka, et al. 2007. “Male circumcision for HIV prevention in men in Rakai, Uganda: a randomised trial.” The Lancet 369 (9562): 657–66. Gregson, Simon, Geoffrey P Garnett, Constance A Nyamukapa, Timothy B Hallett, James JC Lewis, Peter R Mason, Stephen K Chandiwana, and Roy M Anderson. 2006. “HIV decline associated with behavior change in eastern Zimbabwe.” Science 311 (5761): 664–66. Gretton, Arthur, Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex Smola. 2006. “A Kernel Method for the Two-Sample-Problem.” Advances in Neural Information Processing Systems 19. Grieve, Richard, Youqi Yang, Sam Abbott, Giridhara R Babu, Malay Bhattacharyya, Natalie Dean, Stephen Evans, et al. 2023. “The Importance of Investing in Data, Models, Experiments, Team Science, and Public Trust to Help Policymakers Prepare for the Next Pandemic.” PLOS Global Public Health 3 (11): e0002601. Haining, Robert P. 2003. Spatial data analysis: theory and practice. Cambridge University Press. Hájek, Jaroslav. 1971. “Discussion of ‘An essay on the logical foundations of survey sampling, part I’.” Foundations of Statistical Inference (Proc. Sympos., Univ. Waterloo, Ontario, 1970), 236. Hamelijnck, O, T Damoulas, K Wang, and MA Girolami. 2019. “Multi-resolution multi-task Gaussian processes.” Advances in Neural Information Processing Systems 32. Hastie, Trevor, and Robert Tibshirani. 1987. “Generalized additive models: some applications.” Journal of the American Statistical Association 82 (398): 371–86. Hastings, W. K. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Applications.” Biometrika 57 (1): 97–109. http://www.jstor.org/stable/2334940. Helleringer, Stéphane, Hans-Peter Kohler, Linda Kalilani-Phiri, James Mkandawire, and Benjamin Armbruster. 2011. “The reliability of sexual partnership histories: implications for the measurement of partnership concurrency during surveys.” AIDS (London, England) 25 (4): 503. Hodgins, Caroline, James Stannah, Salome Kuchukhidze, Lycias Zembe, Jeffrey W Eaton, Marie-Claude Boily, and Mathieu Maheu-Giroux. 2022. “Population sizes, HIV prevalence, and HIV prevention among men who paid for sex in sub-Saharan Africa (2000–2020): A meta-analysis of 87 population-based surveys.” PLOS Medicine 19 (1): e1003861. Hoffman, Matthew D, Andrew Gelman, et al. 2014. “The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.” J. Mach. Learn. Res. 15 (1): 1593–623. Howes, Adam. 2023a. arealutils: Utility functions for beyond-borders. ———. 2023b. multi.utils: Utility functions for multi-agyw. Howes, Adam, Kathryn A. Risher, Van Kính Nguyen, Oliver Stevens, Katherine M. Jia, Timothy M. Wolock, Rachel T. Esra, et al. 2023. “Spatio-temporal estimates of HIV risk group proportions for adolescent girls and young women across 13 priority countries in sub-Saharan Africa.” PLOS Global Public Health 3 (4): 1–14. https://doi.org/10.1371/journal.pgph.0001731. ICAP. 2023. “Population-based HIV impact assessment: guiding the global HIV response.” https://phia.icap.columbia.edu. Jäckel, Peter. 2005. “A note on multivariate Gauss-Hermite quadrature.” London: ABN-Amro. Re. Jia, Katherine M, Hallie Eilerts, Olanrewaju Edun, Kevin Lam, Adam Howes, Matthew L Thomas, and Jeffrey W Eaton. 2022. “Risk scores for predicting HIV incidence among adult heterosexual populations in sub-Saharan Africa: a systematic review and meta-analysis.” Journal of the International AIDS Society 25 (1): e25861. Jin, Harry, Arjee Restar, and Chris Beyrer. 2021. “Overview of the Epidemiological Conditions of HIV Among Key Populations in Africa.” Journal of the International AIDS Society 24: e25716. Johnson, L, and RE Dorrington. 2020. “Thembisa version 4.3: A model for evaluating the impact of HIV/AIDS in South Africa.” View Article. Johnson, Olatunji, Peter Diggle, and Emanuele Giorgi. 2019. “A spatially discrete approximation to log-Gaussian Cox processes for modelling aggregated disease count data.” Statistics in Medicine 38 (24): 4871–87. Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Maintainer Alexandros Karatzoglou. 2019. “Package ‘Kernlab’.” CRAN R Project. Kassanjee, Reshma, Thomas A. McWalter, Till Bärnighausen, and Alex Welte. 2012. “A New General Biomarker-Based Incidence Estimator.” Epidemiology 23 (5). Kelsall, Julia, and Jonathan Wakefield. 2002. “Modeling spatial variation in disease risk: a geostatistical approach.” Journal of the American Statistical Association 97 (459): 692–701. Khoury, Muin J, Michael F Iademarco, and William T Riley. 2016. “Precision public health for the era of precision medicine.” American Journal of Preventive Medicine 50 (3): 398–401. Kish, Leslie. 1965. Survey sampling. 04; HN29, K5. Knorr-Held, Leonhard. 2000. “Bayesian modelling of inseparable space-time variation in disease risk.” Statistics in Medicine 19 (17-18): 2555–67. Konstantinoudis, Garyfallos, Dominic Schuhmacher, Håvard Rue, and Ben D Spycher. 2020. “Discrete versus continuous domain models for disease mapping.” Spatial and Spatio-Temporal Epidemiology 32: 100319. Kristensen, Kasper. 2021. “The comprehensive TMB documentation.” https://kaskr.github.io/adcomp/_book/Introduction.html. Kristensen, Kasper, Anders Nielsen, Casper W Berg, Hans Skaug, Bradley M Bell, et al. 2016. “TMB: Automatic Differentiation and Laplace Approximation.” Journal of Statistical Software 70 (i05). Laplace, P. S. 1774. “Memoire sur la probabilite de causes par les evenements.” Memoire de l’Academie Royale Des Sciences. Law, Ho Chung, Dino Sejdinovic, Ewan Cameron, Tim Lucas, Seth Flaxman, Katherine Battle, and Kenji Fukumizu. 2018. “Variational learning on aggregate outputs with Gaussian processes.” Advances in Neural Information Processing Systems 31. Lee, Duncan. 2011. “A comparison of conditional autoregressive models used in Bayesian disease mapping.” Spatial and Spatio-Temporal Epidemiology 2 (2): 79–89. Lenth, Russell. 2009. “Response-Surface Methods in R, Using rsm.” Journal of Statistical Software 32 (7): 1–17. https://doi.org/10.18637/jss.v032.i07. Leppik, IE, FE Dreifuss, T Bowman-Cloyd, N Santilli, M Jacobs, C Crosby, J Cloyd, et al. 1985. “A double-blind crossover evaluation of progabide in partial seizures.” Neurology 35 (4): 285. Leroux, Brian G, Xingye Lei, and Norman Breslow. 2000. “Estimation of disease rates in small areas: a new mixed model for spatial dependence.” In Statistical Models in Epidemiology, the Environment, and Clinical Trials, 179–91. Springer. Li, Ye, Patrick Brown, Dionne C Gesink, and Håvard Rue. 2012. “Log Gaussian Cox processes and spatially aggregated disease incidence data.” Statistical Methods in Medical Research 21 (5): 479–507. https://doi.org/10.1177/0962280212446326. Lindgren, Finn, Håvard Rue, and Johan Lindström. 2011. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (4): 423–98. Margossian, Charles C. 2019. “A review of automatic differentiation and its efficient implementation.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (4): e1305. Margossian, Charles C, and Andrew Gelman. 2023. “For How Many Iterations Should We Run Markov Chain Monte Carlo?” arXiv Preprint arXiv:2311.02726. Margossian, Charles, Aki Vehtari, Daniel Simpson, and Raj Agrawal. 2020. “Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond.” Advances in Neural Information Processing Systems 33: 9086–97. Martin, Gael M, David T Frazier, and Christian P Robert. 2023. “Computing Bayes: From then ‘til now.” Statistical Science 1 (1): 1–17. Martino, Sara, and Andrea Riebler. 2020. “Integrated Nested Laplace Approximations (INLA).” In Wiley StatsRef: Statistics Reference Online, 1–19. John Wiley & Sons, Ltd. https://doi.org/https://doi.org/10.1002/9781118445112.stat08212. Martino, Sara, and Håvard Rue. 2009. “Implementing approximate Bayesian inference using Integrated Nested Laplace Approximation: A manual for the inla program.” Department of Mathematical Sciences, NTNU, Norway. Martins, Thiago G, Daniel Simpson, Finn Lindgren, and Håvard Rue. 2013. “Bayesian computing with INLA: new features.” Computational Statistics & Data Analysis 67: 68–83. Matheson, James E, and Robert L Winkler. 1976. “Scoring rules for continuous probability distributions.” Management Science 22 (10): 1087–96. Mayala, Benjamin K., Samir Bhatt, and Peter Gething. 2020. “Predicting HIV/AIDS at Subnational Levels using DHS Covariates related to HIV.” DHS Spatial Analysis Reports 18. Rockville, Maryland, USA: ICF. McCullagh, Peter, and John A Nelder. 1989. Generalized linear models. Routledge. McElreath, Richard. 2020. Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. McGillen, Jessica B, John Stover, Daniel J Klein, Sinokuthemba Xaba, Getrude Ncube, Mutsa Mhangara, Geraldine N Chipendo, et al. 2018. “The Emerging Health Impact of Voluntary Medical Male Circumcision in Zimbabwe: An Evaluation Using Three Epidemiological Models.” PLOS One 13 (7): e0199453. Meng, Xiao-Li. 2018. “Statistical paradises and paradoxes in big data (i) law of large populations, big data paradox, and the 2016 US presidential election.” The Annals of Applied Statistics 12 (2): 685–726. Metropolis, Nicholas, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. “Equation of State Calculations by Fast Computing Machines.” J. Chem. Phys 21: 1087. Meyer-Rath, Gesine, Jessica B McGillen, Diego F Cuadros, Timothy B Hallett, Samir Bhatt, Njeri Wabiri, Frank Tanser, and Thomas Rehle. 2018. “Targeting the Right Interventions to the Right People and Places: The Role of Geospatial Analysis in HIV Program Planning.” AIDS (London, England) 32 (8): 957. Minka, Thomas P. 2001. “Expectation Propagation for approximate Bayesian inference.” In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 362–69. Monnahan, Cole C, and Kasper Kristensen. 2018. “No-U-turn sampling for fast Bayesian inference in ADMB and TMB: Introducing the adnuts and tmbstan R packages.” PLOS One 13 (5): e0197954. Monod, Mélodie, Andrea Brizzi, Ronald M. Galiwango, Robert Ssekubugu, Yu Chen, Xiaoyue Xi, Edward Nelson Kankaka, et al. 2023. “Longitudinal Population-Level HIV Epidemiologic and Genomic Surveillance Highlights Growing Gender Disparity of HIV Transmission in Uganda.” Nature Microbiology. Morris, Mitzi, Katherine Wheeler-Martin, Dan Simpson, Stephen J. Mooney, Andrew Gelman, and Charles DiMaggio. 2019. “Bayesian hierarchical spatial models: Implementing the Besag York Mollié model in stan.” Spatial and Spatio-Temporal Epidemiology 31: 100301. https://doi.org/https://doi.org/10.1016/j.sste.2019.100301. Nandi, Anita K, Tim CD Lucas, Rohan Arambepola, Peter Gething, and Daniel J Weiss. 2023. “disaggregation: An R Package for Bayesian Spatial Disaggregation Modeling.” Journal of Statistical Software 106: 1–19. Naylor, John C, and Adrian FM Smith. 1982. “Applications of a method for the efficient computation of posterior distributions.” Journal of the Royal Statistical Society Series C: Applied Statistics 31 (3): 214–25. Neal, Radford M. 2003. “Slice sampling.” The Annals of Statistics 31 (3): 705–67. Neal, Radford M et al. 2011. “MCMC using Hamiltonian dynamics.” Handbook of Markov Chain Monte Carlo 2 (11): 2. Nguyen, Van Kính, and Jeffrey W. Eaton. 2022. “Trends and country-level variation in age at first sex in sub-Saharan Africa among birth cohorts entering adulthood between 1985 and 2020.” BMC Public Health 22 (1): 1120. https://doi.org/10.1186/s12889-022-13451-y. Nnko, Soori, J Ties Boerma, Mark Urassa, Gabriel Mwaluko, and Basia Zaba. 2004. “Secretive females or swaggering males?: An assessment of the quality of sexual partnership reporting in rural Tanzania.” Social Science & Medicine 59 (2): 299–310. Noor, Abdisalan Mohamed. 2022. “Country Ownership in Global Health.” PLOS Global Public Health 2 (2): e0000113. Okabe, Masataka, and Kei Ito. 2008. “Color Universal Design (CUD): How to Make Figures and Presentations That Are Friendly to Colorblind People.” 2008. http://jfly.iam.u-tokyo.ac.jp/color/. Openshaw, S, and P. J. Taylor. 1979. “A million or so correlation coefficients, three experiments on the modifiable areal unit problem.” Statistical Applications in the Spatial Science, 127–44. Ord, Toby. 2013. “The moral imperative toward cost-effectiveness in global health.” Center for Global Development 12. Organization, World Health et al. 2022. Consolidated Guidelines on HIV, Viral Hepatitis and STI Prevention, Diagnosis, Treatment and Care for Key Populations. World Health Organization. Osgood-Zimmerman, Aaron, and Jon Wakefield. 2023. “A Statistical Review of Template Model Builder: A Flexible Tool for Spatial Modelling.” International Statistical Review 91 (2): 318–42. Paciorek, Christopher J et al. 2013. “Spatial models for point and areal data using Markov random fields on a fine grid.” Electronic Journal of Statistics 7: 946–72. Paciorek, Christopher J., and Mark J. Schervish. 2006. “Spatial modelling using a new class of nonstationary covariance functions.” Environmetrics 17 (5): 483–506. https://doi.org/https://doi.org/10.1002/env.785. Parks, Robbie M, James E Bennett, Helen Tamura-Wicks, Vasilis Kontis, Ralf Toumi, Goodarz Danaei, and Majid Ezzati. 2020. “Anomalously warm temperatures are associated with increased injury deaths.” Nature Medicine 26 (1): 65–70. Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009. Pebesma, Edzer J. 2004. “Multivariable geostatistics in S: the gstat package.” Computers & Geosciences 30: 683–91. Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With Applications in R. Chapman; Hall/CRC. https://doi.org/10.1201/9780429459016. Pettit, LI. 1990. “The conditional predictive ordinate for the normal distribution.” Journal of the Royal Statistical Society: Series B (Methodological) 52 (1): 175–84. Pfeffermann, Danny et al. 2013. “New Important Developments in Small Area Estimation.” Statistical Science 28 (1): 40–68. Pisani, Elizabeth, Stefano Lazzari, Neff Walker, and Bernhard Schwartländer. 2003. “HIV surveillance: a global perspective.” Journal of Acquired Immune Deficiency Syndromes 32: S3–11. Porcu, Emilio, Reinhard Furrer, and Douglas Nychka. 2021. “30 Years of space–time covariance functions.” Wiley Interdisciplinary Reviews: Computational Statistics 13 (2): e1512. Press, William H, Teukolsky Saul A, William T Vetterling, and Brian P Flannery. 2007. Numerical recipes 3rd edition: The art of scientific computing. Cambridge University Press. R Core Team. 2022. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. Rashid, T, JE Bennett, D Muller, A Cross, J Pearson-Stuttard, H Daby, D Fecht, B Davies, and M Ezzati. 2023. “Inequalities in mortality from leading cancers in districts of England from 2002 to 2019: population-based high-resolution spatiotemporal analysis of vital registration data.” The Lancet Oncology. http://hdl.handle.net/10044/1/107364. Riebler, Andrea, Sigrunn H Sørbye, Daniel Simpson, and Håvard Rue. 2016. “An intuitive Bayesian spatial model for disease mapping that accounts for scaling.” Statistical Methods in Medical Research 25 (4): 1145–65. Risher, Kathryn A, Anne Cori, Georges Reniers, Milly Marston, Clara Calvert, Amelia Crampin, Tawanda Dadirai, et al. 2021. “Age patterns of HIV incidence in eastern and southern Africa: a modelling analysis of observational population-based cohort studies.” The Lancet HIV 8 (7): e429–39. Robert, Christian P, and George Casella. 2005. “Monte Carlo Statistical Methods (Springer Texts in Statistics).” Springer. Roberts, Gareth O., and Jeffrey S. Rosenthal. 2004. “General state space Markov chains and MCMC algorithms.” Probability Surveys 1 (none): 20–71. https://doi.org/10.1214/154957804100000024. Roy, Vivekananda. 2020. “Convergence diagnostics for Markov chain Monte Carlo.” Annual Review of Statistics and Its Application 7: 387–412. Rue, Havard. 2023. “‘R-INLA‘ Project - FAQ.” https://www.r-inla.org/faq. Rue, Håvard. 2001. “Fast sampling of Gaussian Markov random fields.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63 (2): 325–38. ———. 2020. “Comment on R-INLA Discussion Group thread.” Rue, Håvard, and Turid Follestad. 2001. “GMRFLib: a C-library for fast and exact simulation of Gaussian Markov random fields.” SIS-2002-236. Rue, Havard, and Leonhard Held. 2005. Gaussian Markov random fields: theory and applications. CRC press. Rue, Håvard, and Sara Martino. 2007. “Approximate Bayesian inference for hierarchical Gaussian Markov random field models.” Journal of Statistical Planning and Inference 137 (10): 3177–92. Rue, Håvard, Sara Martino, and Nicolas Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71 (2): 319–92. Rue, Håvard, Andrea Riebler, Sigrunn H Sørbye, Janine B Illian, Daniel P Simpson, and Finn K Lindgren. 2017. “Bayesian computing with INLA: a review.” Annual Review of Statistics and Its Application 4: 395–421. Säilynoja, Teemu, Paul-Christian Bürkner, and Aki Vehtari. 2022. “Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison.” Statistics and Computing 32 (2): 32. Saracco, James F, J Andrew Royle, David F DeSante, and Beth Gardner. 2010. “Modeling spatial variation in avian survival and residency probabilities.” Ecology 91 (7): 1885–91. Saul, Janet, Gretchen Bachman, Shannon Allen, Nora F Toiv, Caroline Cooney, and Ta’Adhmeeka Beamon. 2018. “The DREAMS core package of interventions: a comprehensive approach to preventing HIV among adolescent girls and young women.” PLOS One 13 (12): e0208167. Saunders, Daniel. 2023. “The Besag-York-Mollie Model for Spatial Data.” In PyMC Examples, edited by PyMC Team. https://doi.org/10.5281/zenodo.5654871. Schad, Daniel J, Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103. Schlüter, Daniela K, Martial L Ndeffo-Mbah, Innocent Takougang, Tony Ukety, Samuel Wanji, Alison P Galvani, and Peter J Diggle. 2016. “Using community-level prevalence of Loa loa infection to predict the proportion of highly-infected individuals: statistical modelling to support lymphatic filariasis and onchocerciasis elimination programs.” PLOS Neglected Tropical Diseases 10 (12): e0005157. Schmid, Volker J, Brandon Whitcher, Anwar R Padhani, N Jane Taylor, and Guang-Zhong Yang. 2006. “Bayesian methods for pharmacokinetic models in dynamic contrast-enhanced magnetic resonance imaging.” IEEE Transactions on Medical Imaging 25 (12): 1627–36. Shapley, Lloyd S et al. 1953. “A value for n-person games.” Princeton University Press Princeton. Shumway, Robert H, and David S Stoffer. 2017. Time Series Analysis and Its Applications With R Examples. Springer. Siegfried, Nandi, Lize van der Merwe, Peter Brocklehurst, and Tin Tin Sint. 2011. “Antiretrovirals for reducing the risk of mother-to-child transmission of HIV infection.” Cochrane Database of Systematic Reviews, no. 7. Simpson, Daniel, Håvard Rue, Andrea Riebler, Thiago G Martins, Sigrunn H Sørbye, et al. 2017. “Penalising model component complexity: A principled, practical approach to constructing priors.” Statistical Science 32 (1): 1–28. Sisson, Scott A, Yanan Fan, and Mark Beaumont. 2018. Handbook of approximate Bayesian computation. CRC Press. Skaug, Hans J. 2009. “Discussion of \"Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations\".” In Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:319–92. 2. Wiley Online Library. Slaymaker, Emma, Kathryn A. Risher, Ramadhani Abdul, Milly Marston, Keith Tomlin, Robert Newton, Anthony Ndyanabo, et al. 2020. “Risk factors for new HIV infections in the general population in sub-Saharan Africa.” Smirnov, N. 1948. “Table for Estimating the Goodness of Fit of Empirical Distributions.” Annals of Mathematical Statistics 19 (2): 279–81. Smith, Nathaniel, and Stéfan van der Walt. 2015. “A Better Default Colormap for Matplotlib.” In Proceedings of the 14th Python in Science Conference (SciPy). Sørbye, Sigrunn Holbek, and Håvard Rue. 2014. “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.” Spatial Statistics 8: 39–51. ———. 2017. “Penalised complexity priors for stationary autoregressive processes.” Journal of Time Series Analysis 38 (6): 923–35. Spiegelhalter, David J, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde. 2002. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. Spiegelhalter, David, Andrew Thomas, Nicky Best, and Wally Gilks. 1996. “BUGS 0.5 Examples.” MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK 256. Stan Development Team. 2023. Stan Reference Manual. https://mc-stan.org/docs/reference-manual/index.html. Stein, Michael L. 1999. “Interpolation of spatial data: some theory for kriging.” Stevens, Oliver, Keith Sabin, Rebecca Anderson, Sonia Arias Garcia, Kalai Willis, Amrita Rao, Anne F. McIntyre, et al. 2023. “Population size, HIV prevalence, and antiretroviral therapy coverage among key populations in sub-Saharan Africa: collation and synthesis of survey data 2010-2023.” medRxiv. https://www.medrxiv.org/content/early/2023/11/22/2022.07.27.22278071. Stover, John, Robert Glaubius, Lynne Mofenson, Caitlin M Dugdale, Mary-Ann Davies, Gabriela Patten, and Constantin Yiannoutsos. 2019. “Updates to the Spectrum/AIM model for estimating key HIV indicators at national and subnational levels.” AIDS (London, England) 33 (Suppl 3): S227. Stover, John, and Yu Teng. 2021. “The impact of condom use on the HIV epidemic.” Gates Open Research 5. Stringer, Alex. 2021. “Implementing Approximate Bayesian Inference using Adaptive Quadrature: the aghq Package.” arXiv Preprint arXiv:2101.04468. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. Tanaka, Yusuke, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, and Hiroyuki Toda. 2019. “Spatially aggregated Gaussian processes with multivariate areal outputs.” In Advances in Neural Information Processing Systems, 3005–15. Tanser, Frank, Tulio de Oliveira, Mathieu Maheu-Giroux, and Till Bärnighausen. 2014. “Concentrated HIV sub-epidemics in generalized epidemic settings.” Current Opinion in HIV and AIDS 9 (2): 115. Tatem, Andrew J. 2017. “WorldPop, open data for spatial demography.” Scientific Data 4 (1): 1–4. Teh, Yee Whye, Bryn Elesedy, Bobby He, Michael Hutchinson, Sheheryar Zaidi, Avishkar Bhoopchand, Ulrich Paquet, Nenad Tomasev, Jonathan Read, and Peter J. Diggle. 2022. “Efficient Bayesian inference of Instantaneous Reproduction Numbers at Fine Spatial Scales, with an Application to Mapping and Nowcasting the Covid-19 Epidemic in British Local Authorities.” Journal of the Royal Statistical Society Series A: Statistics in Society 185 (1): S65–85. https://doi.org/10.1111/rssa.12971. Thall, Peter F, and Stephen C Vail. 1990. “Some covariance models for longitudinal count data with overdispersion.” Biometrics, 657–71. The Global Fund. 2018. The Global Fund Measurement Framework for Adolescent Girls and Young Women Programs. https://www.theglobalfund.org/media/8076/me\\%5Fadolescentsgirlsandyoungwomenprograms\\%5Fframeworkmeasurement\\%5Fen.pdf. Thigpen, Michael C, Poloko M Kebaabetswe, Lynn A Paxton, Dawn K Smith, Charles E Rose, Tebogo M Segolodi, Faith L Henderson, et al. 2012. “Antiretroviral Preexposure Prophylaxis for Heterosexual HIV Transmission in Botswana.” New England Journal of Medicine 367 (5): 423–34. Thyng, Kristen M, Chad A Greene, Robert D Hetland, Heather M Zimmerle, and Steven F DiMarco. 2016. “True Colors of Oceanography: Guidelines for Effective and Accurate Colormap Selection.” Oceanography 29 (3): 9–13. Tierney, Luke, and Joseph B Kadane. 1986. “Accurate approximations for posterior moments and marginal densities.” Journal of the American Statistical Association 81 (393): 82–86. Tobler, Waldo R. 1970. “A computer movie simulating urban growth in the Detroit region.” Economic Geography 46 (sup1): 234–40. Tokdar, Surya T, and Robert E Kass. 2010. “Importance sampling: a review.” Wiley Interdisciplinary Reviews: Computational Statistics 2 (1): 54–60. UN General Assembly. 2016. “Political Declaration on HIV and AIDS: On the Fast Track to Accelerate the Fight Against HIV and to End the AIDS Epidemic by 2030.” In. UNAIDS. 2014. “90-90-90. An ambitious treatment target to help end the AIDS epidemic.” UNAIDS. 2021a. “2021 UNAIDS Global AIDS Update - Confronting Inequalities - Lessons for pandemic responses from 40 Years of AIDS.” Geneva, Switzerland. UNAIDS. 2021b. “Global AIDS strategy 2021–2026. End inequalities. End AIDS.” UNAIDS. 2022. “In Danger: UNAIDS Global AIDS Update 2022.” https://www.unaids.org/en/resources/documents/2022/in-danger-global-aids-update. ———. 2023a. “AIDSinfo: Global data on HIV epidemiology and response.” https://aidsinfo.unaids.org/. ———. 2023b. “The path that ends AIDS: UNAIDS Global AIDS Update 2023.” https://www.unaids.org/en/resources/documents/2023/global-aids-update-2023. UNAIDS and WHO. 2021. “Voluntary Medical Male Circumcision Progress Brief.” UNAIDS. https://hivpreventioncoalition.unaids.org/wp-content/uploads/2021/04/JC3022_VMMC_4-pager_En_v3.pdf. UNAIDS, WHO, et al. 2022. Using Recency Assays for HIV Surveillance: 2022 Technical Guidance. World Health Organization. UNICEF. 2019. “Adolescent & social norms situation in Mozambique.” https://www.unicef.org/mozambique/en/adolescent-social-norms. U.S. Department of State. 2022. “Latest Global Program Results.” https://www.state.gov/wp-content/uploads/2022/11/PEPFAR-Latest-Global-Results_December-2022.pdf. USAID. 2012. “Sampling and Household Listing Manual: Demographic and Health Surveys Methodology.” https://dhsprogram.com/pubs/pdf/DHSM4/DHS6_Sampling_Manual_Sept2012_DHSM4.pdf. Utazi, C Edson, Julia Thorley, VA Alegana, MJ Ferrari, Kristine Nilsen, Saki Takahashi, CJE Metcalf, Justin Lessler, and AJ Tatem. 2019. “A spatial regression model for the disaggregation of areal unit based data to high-resolution grids with application to vaccination coverage mapping.” Statistical Methods in Medical Research 28 (10-11): 3226–41. Valpine, Perry de, Daniel Turek, Christopher J Paciorek, Clifford Anderson-Bergman, Duncan Temple Lang, and Rastislav Bodik. 2017. “Programming with models: writing statistical algorithms for general model structures with NIMBLE.” Journal of Computational and Graphical Statistics 26 (2): 403–13. Van Niekerk, Janet, Elias Krainski, Denis Rustand, and Håvard Rue. 2023. “A new avenue for Bayesian inference with INLA.” Computational Statistics & Data Analysis 181: 107692. Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–32. Vehtari, Aki, and Janne Ojanen. 2012. “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys 6 (none): 142–228. https://doi.org/10.1214/12-SS102. Wakefield, J, and S Morris. 1999. “Spatial dependence and errors-in-variables in environmental epidemiology.” Bayesian Statistics 6: 657–84. Wakefield, Jonathan, and Hilary Lyons. 2010. “Spatial Aggregation and the Ecological Fallacy.” In Chapman & Hall/CRC Handbooks of Modern Statistical Methods, 2010:541–58. https://doi.org/10.1201/9781420072884-c30. Ward, Brian. 2023. bridgestan: BridgeStan, Accessing Stan Model Functions in R. Watanabe, Sumio. 2013. “A widely applicable Bayesian information criterion.” Journal of Machine Learning Research 14 (Mar): 867–97. Weiser, Constantin. 2016. mvQuad: Methods for Multivariate Quadrature. http://CRAN.R-project.org/package=mvQuad. Weiss, Daniel J, Bonnie Mappin, Ursula Dalrymple, Samir Bhatt, Ewan Cameron, Simon I Hay, and Peter W Gething. 2015. “Re-examining environmental correlates of Plasmodium falciparum malaria endemicity: a data-intensive variable selection approach.” Malaria Journal 14 (1): 1–18. WHO and UNAIDS. 2007. “New Data on Male Circumcision and HIV Prevention: Policy and Programme Implications.” Geneva: World Health Organization. Wilke, Claus O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. Wilson, Katie, and Jon Wakefield. 2018. “Pointless spatial modeling.” Biostatistics 21 (2): e17–32. https://doi.org/10.1093/biostatistics/kxy041. Wolock, Timothy M, Seth Flaxman, Kathryn A Risher, Tawanda Dadirai, Simon Gregson, and Jeffrey W Eaton. 2021. “Evaluating distributional regression strategies for modelling self-reported sexual age-mixing.” Edited by Eduardo Franco, Talía Malagón, and Adam Akullian. eLife 10 (June): e68318. https://doi.org/10.7554/eLife.68318. Wood, Simon N. 2017. Generalized additive models: an introduction with R. CRC press. ———. 2020. “Simplified integrated nested Laplace approximation.” Biometrika 107 (1): 223–30. Wringe, A, I Cremin, J Todd, N McGrath, I Kasamba, K Herbst, P Mushore, B Żaba, and E Slaymaker. 2009. “Comparative assessment of the quality of age-at-event reporting in three HIV cohort studies in sub-Saharan Africa.” Sexually Transmitted Infections 85 (Suppl 1): i56–63. Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Yes, but did it work?: Evaluating variational inference.” In International Conference on Machine Learning, 5581–90. PMLR. Yousefi, Fariba, Michael T Smith, and Mauricio Alvarez. 2019. “Multi-task learning for aggregated data using Gaussian processes.” Advances in Neural Information Processing Systems 32. Zaba, Basia, Elizabeth Pisani, Emma Slaymaker, and J Ties Boerma. 2004. “Age at first sex: understanding recent trends in African demographic surveys.” Sexually Transmitted Infections 80 (suppl 2): ii28–35. References Eaton, Jeffrey W, Laura Dwyer-Lindgren, Steve Gutreuter, Megan O’Driscoll, Oliver Stevens, Sumali Bajaj, Rob Ashton, et al. 2021. “Naomi: a new modelling tool for estimating HIV epidemic indicators at the district level in sub-Saharan Africa.” Journal of the International AIDS Society 24: e25788. Kristensen, Kasper. 2021. “The comprehensive TMB documentation.” https://kaskr.github.io/adcomp/_book/Introduction.html. Stover, John, Robert Glaubius, Lynne Mofenson, Caitlin M Dugdale, Mary-Ann Davies, Gabriela Patten, and Constantin Yiannoutsos. 2019. “Updates to the Spectrum/AIM model for estimating key HIV indicators at national and subnational levels.” AIDS (London, England) 33 (Suppl 3): S227. Stringer, Alex, Patrick Brown, and Jamie Stafford. 2022. “Fast, scalable approximations to posterior distributions in extended latent Gaussian models.” Journal of Computational and Graphical Statistics, 1–15. "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]]