How to Conduct a Bayesian Network Meta-Analysis

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Associated Data

The datasets generated for this study can be found in the GitHub [https://github.com/a-oconnor/NETWORK_MA_FRONTIERS_TUTORIAL].

Abstract

Network meta-analysis is a general approach to integrate the results of multiple studies in which multiple treatments are compared, often in a pairwise manner. In this tutorial, we illustrate the procedures for conducting a network meta-analysis for binary outcomes data in the Bayesian framework using example data. Our goal is to describe the workflow of such an analysis and to explain how to generate informative results such as ranking plots and treatment risk posterior distribution plots. The R code used to conduct a network meta-analysis in the Bayesian setting is provided at GitHub.

Keywords: network meta-analysis, Bayesian, systematic review, tutorial, veterinary science

1. Introduction

Meta-analysis is a quantitative method commonly used to combine the results of multiple studies in the medical and veterinary sciences. There are several common types of meta-analysis. A pairwise meta-analysis compares two treatments across multiple studies, whereas a network meta-analysis involves the simultaneous synthesis of multiple studies to create pairwise comparisons of more than two treatments. A third type of meta-analysis is multivariate meta-analysis, which is far less common than the other two types (1, 2). Regardless of the type, meta-analyses can be conducted using study-level summary data, which are usually reported in the literature. In the human health sciences, it is also possible to perform meta-analyses using data from individual patients, but meta-analysis using individual-level data is very rare in veterinary science (3).

In this tutorial, we focus on network meta-analysis, which is becoming increasingly common in both human health and the veterinary sciences (4–10). Although frequently used as a synonym for network meta-analysis, a mixed treatment comparisons meta-analysis is a type of network meta-analysis that can be described as a “A statistical approach used to analyze a network of evidence with more than two interventions which are being compared indirectly, and at least one pair of interventions compared both directly and indirectly” (1). Direct comparisons of interventions are obtained from trials or observational studies that include both interventions and compare them directly. Indirect comparisons of interventions, on the other hand, are made based on multiple trials that each included one, but not both, of the interventions of interest and therefore did not compare the interventions directly as part of the original study. In general, network meta-analysis offers the advantage of enabling the combined assessment of more than two treatments. A network meta-analysis that includes the mixed treatment comparisons “component” has the additional feature of enabling a formal statistical estimation of indirect treatment comparisons that might not be available in the literature (4, 7). Most network meta-analyses include a mixed treatment comparisons component, so we use the term network meta-analysis to refer to mixed treatment comparisons meta-analyses throughout this manuscript. There are some R (11) packages available for conducting Bayesian network meta-analysis such as gemtc (12) and BUGSnet (13). The output given by gemtc is limited. For example, gemtc does not have the option to report the summary effect as either the relative risk or absolute risk. Further, the output is not available in a table format. While BUGSnet is limited to analyzing arm-level data which could be a limitation for veterinary data which is often reported at the contrast level.

1.1. Rationale

Currently, only a few systematic reviews in veterinary science have employed network meta-analysis. However, if the trend in the human health sciences is indicative of what will occur in veterinary science, we can expect to see more network meta-analyses of veterinary studies in the future. For example, in 2010, a PubMed search with the terms “network meta-analysis” OR “mixed treatment comparison” yielded 10 citations, whereas by 2018, the same search returned 618 citations. The rise in the use of network meta-analysis is a function of the value that such an analysis provides to the decision-making community. Instead of limiting comparisons to those that are made across just two interventions and published in the literature, as is the case for pairwise meta-analysis, network meta-analysis allows the simultaneous comparison of multiple treatments, including comparisons that are not directly available in the literature. For many clinical decisions in veterinary medicine, there are multiple interventions that could be used to prevent or treat a specific disease or condition. Therefore, decision-makers are interested in the comparative efficacy of all the options rather than just pairwise comparisons. To illustrate the limitations of pairwise meta-analysis, we can use the choice of which antibiotic to use to treat bovine respiratory disease as an example. The vast majority of publicly available trials involving antibiotics for bovine respiratory disease were conducted in order to register and license a particular product. In those types of trials, the antibiotic of interest is typically compared with a placebo to demonstrate that the antibiotic has a significant beneficial effect. Veterinarians are actually interested in the comparative efficacy of all the available antibiotics, but for a variety of reasons (e.g., economic, marketing, and regulatory), few head-to-head comparisons of antibiotics are available. A network meta-analysis can fill that information gap for veterinarians by providing head-to-head estimates of the comparative efficacy of antibiotics, even though those comparisons are not available in the literature.

1.2. Objectives

Our objective is to provide a tutorial illustrating how to conduct a network meta-analysis of study-level results from multiple sources. Network meta-analysis can be conducted using a frequentist approach or a Bayesian approach. We focus on the Bayesian approach for three reasons:

First, Bayesian approaches to network meta-analysis are currently more common than frequentist approaches (14–16).

Second, the learning curve for the Bayesian approach is steeper than that for the frequentist approach. There are several standard packages that can be used to conduct a frequentist analysis, and the examples provided with the packages are usually sufficient to enable the analysis to be conducted (17, 18). Therefore a tutorial for the Bayesian approach fills a larger gap.

Third, the Bayesian approach allows for many outputs that enhance understanding of the data. For example, the point estimate, as well as the posterior distribution of the absolute risk of each treatment can be obtained from the results of the Bayesian approach. Therefore, a tutorial focused on the Bayesian approach to network meta-analysis has greater utility.

1.3. Target Audience

We describe the step-wise workflow of a network meta-analysis, and we provide R, JAGS (19) and BUGS (20) code for end-users interested in troubleshooting or optimizing their own analyses (see Appendix for link). It is not our intention to teach the statistical foundations of network meta-analysis. We believe that this tutorial will fill a gap between papers that explain the underlying statistical methodology and the “black box” tutorials that typically come with statistical packages. Our tutorial is intended for readers interested in understanding the software-coding and data-management processes that underlie a network meta-analysis. It is our hope that by using our tutorial, a reader would be able to find errors in his or her own network meta-analysis or modify existing code to produce a new output. We assume that the reader is familiar with pairwise meta-analysis [see the companion paper in the frontiers series (21) and the paper about synthesizing data from intervention studies using meta-analysis (22) for more details].

2. Organization

The tutorial is organized in three parts. First, we provide a basic introduction to Bayesian network meta-analysis and the concepts in the underlying model. Second, we discuss how to conduct the analysis, with a focus on the software processes involved. Third (in the Appendix), we provide actual code that can be used to conduct a Bayesian network meta-analysis. The Appendix contains detailed instructions on how to run the R code that will perform the analysis and produce the desired outputs. The code includes R and jags scripts for executing a network meta-analysis in an R project, which contains several scripts that the reader can run to better understand the processes associated with conducting the analysis and obtaining the output. Not all readers will want to delve into the mechanisms of the Appendix code. For readers who want to conduct a network meta-analysis but are not interested in the mechanics of coding the analysis, we suggest that they read the first two parts of the tutorial and then use an R package that includes functions for running a network meta-analysis, such as gemtc.

3. The Basics of Network Meta-Analysis

3.1. Arm-Level Data and Contrast-Level Data

The first part of a network meta-analysis is data extraction from the primary sources, preferably based on a systematic review conducted using an a priori protocol. The data extracted from the primary sources are study-level summary data (also called aggregated data) in one of two forms: arm-level data, which report the effect measures (i.e., absolute odds or absolute risk) for each arm, or contrast-level data, which show the contrast of effects, or the effect size (23), between treatment arms (i.e., the odds ratio, relative risk, or log odds ratio). Either type of summary data can be used in a network meta-analysis using either the Bayesian or the frequentist approach (7).

It is essential, however, that the data extracted for a network meta-analysis meet the transitivity assumption, that is, that each enrolled subject in a given study would be eligible for enrollment in the other studies. For example, in a previous network meta-analysis of antibiotic treatments for bovine respiratory disease, data from studies that included antibiotic metaphylaxis were excluded, because animals that received prior antibiotic treatment would have limited eligibility for subsequent antibiotic treatments and would therefore violate the transitivity assumption (5, 6). Animals that received an antibiotic as a metaphylactic treatment would be unlikely to receive the same antibiotic as the first treatment of choice once bovine respiratory disease was diagnosed. Moreover, the effect of an antibiotic might be different if the antibiotic was previously used for metaphylactic treatment in the same animal, so the results from studies with and without metaphylaxis would not be the same. By limiting the network of eligible studies for the meta-analysis to those without metaphylaxis, the transitivity assumption would be more likely to hold.

3.2. The Comparative Effects Model

A key aspect of network meta-analysis is the comparative effects model. The comparative effects model forms the basis for the estimation of the relative treatment effects, which make up the main output of the network meta-analysis. A commonly used approach to network meta-analysis is to directly describe the distributions of the log odds ratio as the measures of the relative treatment effects and then to transform the log odds ratios into more interpretable metrics such as odds ratios or risk ratios. The goal of the comparative effects model is to provide a mechanism to estimate the comparative treatment effects. A critical aspect of the comparative effects model and its relation to network meta-analysis is the consistency assumption. The comparative effects model provides estimates of basic parameters in the form of log odds ratios based on comparisons between each treatment of interest and a baseline treatment. The consistency assumption allows pairwise comparisons between the treatments of interest to be estimated as functions of the basic parameters estimated in the comparative effects model. This consistency assumption is written as:

d k 1 , k 2 = d b k 2 - d b k 1 ,

where b is the baseline treatment, k1 and k2 are treatments other than the baseline, and dbk2 is the true effect size (log odds ratio in this case) of treatment k2 compared with the baseline b. In lay terms, using the example of bovine respiratory disease, the consistency assumption says that we can compare the effect of oxytetracycline (k2) with that of tulathrymycin (k1) if we have comparisons of the effects of oxytetracycline (k2) and a placebo (b) and of tulathrymycin (k1) and a placebo (b).

3.2.1. The Fixed Effects Model and the Random Effects Model

The first factor to consider in the comparative effects model is whether the intervention effects are fixed effects or random effects. Suppose there are N studies in a network, which is composed of K treatments. Let b denote the baseline treatment of the whole network, and let bi denote the trial-specific baseline treatment in trial i. It might be the case that bib. In other words, the baseline treatment of the model is a placebo, because most of the studies include a placebo group, but a few studies lack a placebo arm and therefore use a different treatment as the baseline comparator. Let yibik be the trial-specific log odds ratio of treatment k compared with bi in trial i, and let Vibik be its within-trial variance. Assume a normal distribution for yibik, such that

y i b i k ~ N ( θ i b i k , V i b i k ) .

The difference between a fixed effects model and a random effects model lies in the assumptions about the nature of the between-trial variability (24). The choice of the fixed effects or random effects model depends on the interpretation of the log odds ratio (θibik) and the assumptions behind that interpretation. A fixed effects model assumes that there is one true effect size underlying the trials for each comparison. It follows that all of the differences in the observed effect sizes are due to random variation (sampling error) (25), which is akin to assuming that if all the studies were of infinite size, each would result in the same effect size. In that scenario, under the consistency assumption, the model would be:

θ i , b i k = d b i k = < d b k , for b i = b , d b k - d b b i , for b i ≠ b , .

In this model, dbk (k ∈ K>) are called basic parameters, whereas dbik (k ∈ K>, bib) are called functional parameters, because they are a function of the basic parameters (e.g., dbik = dbkdbbi). For example, consider a trial (i = 1) that compared treatment A with treatment B. We might designate treatment A as the baseline treatment (b) and treatment B as k. The model assumes that the log odds ratio observed in study i = 1 is dbk. Any difference between the observed log odds ratio and dbk is assumed to be due to sampling error. In another trial (i = 2) that compared treatment B to treatment C, we might designate treatment C as the baseline treatment (bi). When modeling the data, we would retain treatment B as k. The model then assumes that the observed log odds ratio in study i = 2 (i.e., treatment C compared with treatment B) is given by dbk - dbki. Again, any difference between the observed log odds ratio and dbki is assumed to be due to sampling error in a fixed effects model.

A random effects model, on the other hand, assumes that the true effect size can differ from trial to trial, because the effect sizes in each trial are derived from a distribution of effect sizes, which is akin to saying that even if the studies were all of infinite size, there would still be different estimates of the effect size due to the distribution of effect sizes in addition to sampling error. Therefore, in a random effects model, there is an additional source of variation that needs to be accounted for, that is, the between-trial variation. The random effects model has been recommended for cases in which there is heterogeneity among the results of multiple trials (26). The common distribution of the between-trial variation is usually assumed to be a normal distribution (7), so that

θ i , b i k ~ < N ( d b k , σ b i k 2 ) , for b i = b , N ( d b k - d b b i , σ b i k 2 ) , for b i ≠ b , ,

where σ b i k 2 is the between-trial variance. In a pairwise meta-analysis, because there is only one effect size of interest, there is inherently only one between-trial variance. By contrast, in a network meta-analysis, there are at least two, and often many more, effect sizes, because we have (k ∈ K>). It is often assumed, however, that there is still only a single between-trial variance for all the treatments, which is referred to as the homogeneous variance assumption (i.e., σ b i k 2 = σ 2 ). In lay terms, this means that if we employ a random effects model that has three treatments and therefore two effect sizes, we assume the same σ b i k 2 for dbk1 and dbk2. Although models that allow heterogeneous between-trial variances have been proposed (4, 27), we use a random effects model with an assumption of homogeneous variance as our example in this tutorial, because such a model is consistent with our biological understanding of the types of interventions used in veterinary science.

3.3. Handling Multi-Arm Trials

In a pairwise meta-analysis, only one effect size is obtained from each study, which means that each effect size is independent of the others. However, in a network meta-analysis, there is the potential, and often the desire, to include multi-arm trials, which creates non-independent observations. For example, a single trial might compare treatments A, B, and C, resulting in three comparisons (A to B, B to C, and B to C). If A is the baseline treatment, then the comparisons between A and B and between A and C are basic parameters. When data from such a trial are included in a network meta-analysis, the assumption of independence is not valid and needs to be adjusted. A term to adjust for the co-variance of data from multi-arm trials must be incorporated into the comparative effects model to correctly reflect the data-generating mechanism. For a single multi-arm trial with ki treatments, there are (ki − 1) comparisons ( y i , b 2 , y i , b 3 , … , y i , b k i ) T . The joint distribution of the comparisons is given by

( y i , b 2 y i , b 3 ⋮ y i , b k i ) ~ N k i - 1 ( ( θ i , b 2 θ i , b 3 ⋮ θ i , b k i ) , [ V i , b 2 V i , b ⋯ V i , b V i , b V i , b 3 ⋯ V i , b ⋮ ⋮ ⋱ ⋮ V i , b V i , b ⋯ V i , b k i ] ) ,

where Vi, b is the observed variance in the baseline arm in trial i. The derivation of the value of the co-variance can be found elsewhere (7). For a random effects model, assuming a homogeneous between-trial variance for all trial-specific effects, the joint distribution of ( θ i , b 2 , θ i , b 3 , … , θ i , b k i ) T is

( θ i , b 2 ⋮ θ i , b k i ) ~ N k i - 1 ( ( d b 2 ⋮ d b k i ) , ( σ 2 σ 2 / 2 … σ 2 / 2 ⋮ ⋮ ⋱ ⋮ σ 2 / 2 σ 2 / 2 ⋯ σ 2 ) ) .

The reason that the off-diagonal values in the variance–covariance matrix are equal to half the diagonal values (28) (i.e., the correlation is 0.5) is that we want to keep the assumption of homogeneous between-trial variance valid. For example,

Var ( θ i , 23 ) = Var ( θ i , b 3 - θ i , b 2 ) = Var ( θ i , b 3 ) + Var ( θ i , b 2 ) - 2 Cov ( θ i , b 3 , θ i , b 2 ) = σ 2 + σ 2 - 2 * σ 2 / 2 = σ 2 .

3.4. Choice of Priors

So far, we have described the comparative effects model, which describes how the data were generated. The next step is to estimate the parameters of the distributions of interest, that is, the basic parameters for each treatment and the between-trial variance. For a frequentist approach, model parameters are regarded as unknown fixed population characteristics (14) and estimation could be performed using a likelihood approach. The frequentist approach does not use prior information to estimate the parameters. By contrast, the Bayesian approach to estimation calculates the posterior distribution of the parameters by using the data (likelihood) to update prior information. In the Bayesian approach, it is necessary obtain a prior distribution of the parameters, so that the prior distribution can be updated to give the posterior distribution.

Prior distributions must be selected for the basic parameters dbk (k ∈ K>) and, if a random effects model is employed, also the between-trial variance σ 2 . There is no need to select a prior for the correlation of multi-arm trials, because that correlation is constrained to 0.5 by the homogeneous variance assumption. Vague or flat priors such as N(0, 10, 000) are recommended for the basic parameters (7). However, The induced prior on odds ratio (OR), has a big probability on an unrealistic region of odds ratio such that Pr(OR > 1,000) ≈ 0.47 and Pr(OR > 10 29 ) ≈ 0.25. However, it provides vague information on the realistic region of the odds ratio and as a result, the posterior distribution depends little on such prior distribution (29). There is no strict rule for selecting a prior for σ 2 . The general practice is to set weakly informative priors, such as σ ~ Unif(0, 2) or σ ~ Unif(0, 5), or non-informative priors such as 1/σ 2 ~ Gamma(0.001, 0.001). In cases where the data are insufficient, a non-informative prior for σ 2 would be likely to make the posterior distribution include extremely large or small values (30, 31). Lambert et al. (31) conducted a simulation study using 13 vague priors and found that the use of different vague prior distributions led to markedly different results, particularly in small studies. On the basis of those results, Lambert et al. (31) suggested that in any Bayesian analysis, researchers should assess the sensitivity of the results to the choice of the prior distribution for σ 2 , because “vague” is not the same in all cases. For example, if the prior chosen for the between-study variance is Unif(0,5), a sensitivity analysis for that prior could look at how the posterior estimates of the treatment effects (e.g., the log odds ratios or the absolute risks) in the network meta-analysis would change if the prior is changed to Unif(0,2), Unif(0,10), or some other distribution. If the posterior estimates do not change substantially, the results can be considered insensitive to the choice of prior parameter values. Informative priors can be considered if there are reasonable estimates of σ 2 available from another, larger network meta-analysis that has the same context and similar treatments as the analysis under construction (7, 32). Having considered the choice of a random effects or fixed effects model, the handling of multi-arm trials, and the choice of priors, the specification of the comparative effects model is complete. Figure 1 illustrates an example of the coding of a comparative effects model in the general_model.bug code.

An external file that holds a picture, illustration, etc. Object name is fvets-07-00271-g0001.jpg

An example of the formatting of the BUGS code for the comparative model. This code was modified from code originally published elsewhere (7).

3.5. The Baseline Effects Model–the Log Odds of the Event

After defining the comparative effects model and the priors for the parameters to be estimated, the next step in a Bayesian network meta-analysis is to model the baseline effect. Although it is possible to conduct a Bayesian network meta-analysis without a baseline effects model, the baseline effects model allows for some unique and informative outputs from the analysis. If we are only interested in the estimates of the log odds ratios and the odds ratios, then there is no need to make a baseline effects model. The baseline effects model refers to the distribution of the event for the baseline treatment, that is, the log odds of the event for the baseline treatment. For example, if in one study 40 out of 100 animals in the baseline group experienced the event, the trial-specific log odds of the event would be log(( 40 100 ) / ( 60 100 ) ). A different trial would have different log odds of the event; however, the log odds of the event are assumed to arise from the same distribution in all trials. The reason for modeling the distribution of the event risk in the baseline group is to enable absolute effects (i.e., absolute risk) and comparative effects to be estimated on a risk scale rather than on an odds scale. For example, if we know the log odds ratio for all treatments compared with the baseline treatment, then, given the absolute risk for any one treatment, we can know the absolute risk for every treatment. For example, if we have a log odds ratio of 0.9809 for the comparison between treatment A and the baseline treatment, and if the baseline event risk is 0.2 (e.g., 20 out of 100 exposed subjects experienced the event), then we can determine that the absolute risk for treatment A is 0.4, using the formula

p = OR × p b 1 - p b + OR × p b ,

where pb is the absolute risk for the baseline treatment, and p is the absolute risk for any non-baseline treatment. The absolute effect of the baseline treatment is often selected for baseline effect modeling, because the baseline treatment is usually the most common treatment in the network meta-analysis, which means that it has the most data available for estimation of the posterior distribution of the log odds of the event. Suppose there are Nb studies that have the baseline arm. Let θi, b (i ∈ Nb>) be the trial-specific baseline effect (log odds of the event) in a trial i (i.e., the log odds). We can use the following formulation to model the baseline effect:

θ i , b ~ N ( m , σ m 2 ) .

This means that the trial-specific baseline effects come from a normal distribution with mean m and variance σ m 2 . As with the comparative effects model, we need to select priors for the baseline effects model. The selection of prior distributions for m and σ m 2 follows the same considerations as the selection of priors for the effect parameters, that is, the priors should be weakly informative or non-informative [e.g., m ~ N(0, 10000), and σm ~ Unif(0, 5)]. From a coding perspective, there are two ways to incorporate a baseline effects model into the comparative effects model. The first approach is to run separate models, beginning with the baseline effects model. The baseline model yields the posterior distribution summaries of m and σm (or σ m 2 ). The posterior means (denoted by m ^ , σ ^ m 2 ) are then inserted into the comparative effects model and the baseline effect can be generated from N ( m ^ , σ ^ m 2 ) in the comparative effects model. Other quantities of interest (e.g., the absolute risk for the other treatments) can then be estimated. The first approach relies on the assumption that the posterior distribution of the baseline effect is approximately normal. Dias et al. (33) suggests checking that assumption (e.g., with Q-Q plot or Kolmogorov–Smirnov test), although the assumption is usually found to hold. The second approach to incorporate the baseline effects model into the comparative effects model is simultaneous modeling of the baseline effect and the comparative effects. That approach can have a substantial impact on the relative effect estimates, however. For more details on the simultaneous modeling of baseline and comparative effects, refer to Dias et al. (33). Figure 2 shows the incorporation of a baseline effects model, which can be used to obtain m and σm (or σ m 2 ).

An external file that holds a picture, illustration, etc. Object name is fvets-07-00271-g0002.jpg

An example of the formatting of the BUGS code for the baseline effects model. This code was modified from code originally published elsewhere (7).