Treatment Effect Heterogeneity

CATE estimation, causal forests, meta-learners, and subgroup analysis.

Evidence briefs

Reviewed claims

Claim-level summaries connect a practical takeaway to the papers that actually support it.

High confidencePublished

Causal forests positive Consistency and asymptotic normality of heterogeneous treatment effect estimates

Causal forests provide consistent and asymptotically normal estimates of heterogeneous treatment effects, with valid confidence intervals, whereas classical methods break down due to the curse of dimensionality.

Population: High-dimensional observational studies with unconfoundedness · Comparator: Classical nonparametric methods (nearest-neighbor matching, kernel methods, series estimation)

Primary evidence

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

High confidencePublished

Local polynomial doubly robust (LP-DR) estimator positive Mean squared error of CATE estimates

LP-DR achieves minimax optimal error rates across smooth and sparse CATE classes, outperforming standard DR which is suboptimal in high-dimensional or non-smooth settings.

Population: Heterogeneous treatment effect estimation with smooth or sparse CATE functions · Comparator: Standard doubly robust estimator

Primary evidence

Towards optimal doubly robust estimation of heterogeneous causal effects

LP-DR achieves minimax optimal error rates across smooth and sparse CATE classes, outperforming standard DR which is suboptimal in high-dimensional or non-smooth settings.

High confidencePublished

Cross-fitting (sample-splitting) in doubly robust estimation positive Oracle efficiency and overfitting bias

Cross-fitting ensures that the estimator achieves oracle efficiency (same error as if nuisance functions were known) and avoids overfitting bias from using the same data for both stages.

Population: Observational studies with estimated nuisance functions · Comparator: No sample-splitting (same-sample estimation)

Primary evidence

Towards optimal doubly robust estimation of heterogeneous causal effects

Cross-fitting ensures that the estimator achieves oracle efficiency (same error as if nuisance functions were known) and avoids overfitting bias from using the same data for both stages.

High confidencePublished

Honest trees in causal forests positive Bias reduction in treatment effect estimation

Using separate data for tree splitting and effect estimation (honesty) reduces bias and is critical for the asymptotic theory, enabling valid inference.

Population: Observational studies with unconfoundedness · Comparator: Standard (dishonest) trees in random forests

Primary evidence

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Using separate data for tree splitting and effect estimation (honesty) reduces bias and is critical for the asymptotic theory, enabling valid inference.

High confidencePublished

Causal tree positive Mean squared error (MSE) in estimating conditional average treatment effects (CATE)

Causal trees reduce MSE by up to 30% compared to standard regression trees in simulations with known ground truth, due to splitting criteria and cross-validation tailored for causal effects.

Population: Randomized experiments with heterogeneous treatment effects · Comparator: Standard regression tree (CART)

Primary evidence

Recursive Partitioning for Heterogeneous Causal Effects

Causal trees reduce MSE by up to 30% compared to standard regression trees in simulations with known ground truth, due to splitting criteria and cross-validation tailored for causal effects.

High confidencePublished

R-learner positive Error bounds for CATE estimation

The R-learner achieves oracle-level error bounds that depend only on the complexity of the CATE function, not on the complexity of nuisance functions, provided nuisance functions are estimated at o(n^{-1/4}) rates.

Population: Observational studies with unconfoundedness and overlap · Comparator: Naive two-step estimation (separately fitting response surfaces)

Primary evidence

Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Evidence base

Min quality:

50 papers

StudyWikiCanonicalModerate

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey · Journal of the American Statistical Association · 2017 · 2,737 citations

Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment effect heterogeneity. In this article, we develop a nonparametric causal forest for estimating heterogeneous treatment effects that extends Breiman’s widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Read the breakdown →

StudyWikiCanonicalModerate

Causal inference in statistics: An overview

Judea Pearl · Statistics Surveys · 2009 · 2,309 citations

This review presents empirical researchers with recent advances in causal inference, and stresses the paradigmatic shifts that must be undertaken in moving from traditional statistical analysis to causal analysis of multivariate data. Special emphasis is placed on the assumptions that underly all causal inferences, the languages used in formulating those assumptions, the conditional nature of all causal and counterfactual claims, and the methods that have been developed for the assessment of such claims. These advances are illustrated using a general theory of causation based on the Structural Causal Model (SCM) described in Pearl (2000a), which subsumes and unifies other approaches to causation, and provides a coherent mathematical foundation for the analysis of causes and counterfactuals. In particular, the paper surveys the development of mathematical tools for inferring (from a combination of data and assumptions) answers to three types of causal queries: (1) queries about the effects of potential interventions, (also called “causal effects” or “policy evaluation”) (2) queries about probabilities of counterfactuals, (including assessment of “regret,” “attribution” or “causes of effects”) and (3) queries about direct and indirect effects (also known as “mediation”). Finally, the paper defines the formal and conceptual relationships between the structural and potential-outcome frameworks and presents tools for a symbiotic analysis that uses the strong features of both.

Read the breakdown →

StudyTop journalWikiCanonicalHigh confidence

Heterogeneous Treatment Effects Analysis for Social Scientists: A Review

Anning Hu · Social Science Research · 2023 · 29 citations

An accessible review of heterogeneous treatment effect methods including interactions, GAMs, propensity scores, causal trees, causal forests, BART, and meta-learners.

Read the breakdown →

StudyTop journalWikiCanonicalHigh confidence

Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning

Soren R. Kunzel, Jasjeet S. Sekhon, Peter J. Bickel +1 more · Proceedings of the National Academy of Sciences · 2019 · 1,243 citations

The standard S-learner, T-learner, and X-learner framing for estimating heterogeneous treatment effects with machine learning.

Read the breakdown →

StudyPreprintWikiCanonicalModerate

Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Xinkun Nie, Stefan Wager · 2017

Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. Our approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: In both steps, we can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property: Even if the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle who has a priori knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation setups, and find promising performance relative to existing baselines.

Read the breakdown →

StudyPreprintWikiCanonicalModerate

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey · 2015

Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Read the breakdown →

Meta-analysisHigh evidence score

Post hoc analysis of SUSTAIN 6 and PIONEER 6 trials suggests that people with type 2 diabetes at high cardiovascular risk treated with semaglutide experience more stable kidney function compared with placebo

Katherine R. Tuttle, Heidrun Bosch‐Traberg, David Z.I. Cherney +5 more · Kidney International · 2023 · 125 citations

Glucagon-like peptide-1 receptor agonists reduce albuminuria and may stabilize the estimated glomerular filtration rate (eGFR) in people with type 2 diabetes (T2D). In this post hoc analysis of the SUSTAIN 6/PIONEER 6 trials encompassing 6480 participants at high cardiovascular risk (semaglutide, 3239 participants; placebo, 3241 participants), we investigated the effects of semaglutide versus placebo on eGFR decline. Pooled data by treatment were evaluated for annual eGFR change (total annual eGFR slope in ml/min per 1.73 m2) from baseline to end of treatment and time to persistent eGFR reductions of 30%, 40%, 50% and 57% or more, including subgroup analyses by baseline eGFR (30 to under 60 or 60 and over ml/min per 1.73 m2). In the overall population, the estimated treatment difference (ETD; semaglutide versus placebo) in annual eGFR slope was significant at 0.59 ml/min per 1.73 m2 (95% confidence interval 0.29; 0.89). The ETD was numerically largest in the 30 to under 60 ml/min per 1.73 m2 eGFR subgroup, 1.06 ml/min per 1.73 m2 (0.45; 1.67), but no significant interaction was observed for treatment effect by subgroup. Hazard ratios (semaglutide versus placebo) for time to persistent eGFR decline were under 1.0 for all eGFR thresholds in the overall population; and were numerically lower in the baseline eGFR 30 to under 60 ml/min per 1.73 m2 subgroup versus the overall population, although no significant interaction was observed for treatment effect by subgroup. Thus, pooled analyses of clinical trial data in patients with T2D suggest that semaglutide may reduce the rate of eGFR decline. Glucagon-like peptide-1 receptor agonists reduce albuminuria and may stabilize the estimated glomerular filtration rate (eGFR) in people with type 2 diabetes (T2D). In this post hoc analysis of the SUSTAIN 6/PIONEER 6 trials encompassing 6480 participants at high cardiovascular risk (semaglutide, 3239 participants; placebo, 3241 participants), we investigated the effects of semaglutide versus placebo on eGFR decline. Pooled data by treatment were evaluated for annual eGFR change (total annual eGFR slope in ml/min per 1.73 m2) from baseline to end of treatment and time to persistent eGFR reductions of 30%, 40%, 50% and 57% or more, including subgroup analyses by baseline eGFR (30 to under 60 or 60 and over ml/min per 1.73 m2). In the overall population, the estimated treatment difference (ETD; semaglutide versus placebo) in annual eGFR slope was significant at 0.59 ml/min per 1.73 m2 (95% confidence interval 0.29; 0.89). The ETD was numerically largest in the 30 to under 60 ml/min per 1.73 m2 eGFR subgroup, 1.06 ml/min per 1.73 m2 (0.45; 1.67), but no significant interaction was observed for treatment effect by subgroup. Hazard ratios (semaglutide versus placebo) for time to persistent eGFR decline were under 1.0 for all eGFR thresholds in the overall population; and were numerically lower in the baseline eGFR 30 to under 60 ml/min per 1.73 m2 subgroup versus the overall population, although no significant interaction was observed for treatment effect by subgroup. Thus, pooled analyses of clinical trial data in patients with T2D suggest that semaglutide may reduce the rate of eGFR decline. Lay SummaryPatients with type 2 diabetes (T2D) often develop chronic kidney disease. Semaglutide is a medicine used to treat T2D; previous studies have shown these medicines may also reduce the decline of kidney function. However, more studies are needed to confirm the kidney benefits with semaglutide. In 2 clinical trials, 6480 patients with T2D and at high risk of a cardiovascular event were treated with semaglutide or placebo. We used the results of kidney function tests from these studies to assess how fast kidney function declined in those treated with semaglutide or placebo. Our analysis showed that semaglutide significantly slowed the rate of kidney function decline and non-significantly extended the time taken to reach specified estimated glomerular filtration rate thresholds. We also saw that kidney function at the start of the trial did not impact these findings. Patients with type 2 diabetes (T2D) often develop chronic kidney disease. Semaglutide is a medicine used to treat T2D; previous studies have shown these medicines may also reduce the decline of kidney function. However, more studies are needed to confirm the kidney benefits with semaglutide. In 2 clinical trials, 6480 patients with T2D and at high risk of a cardiovascular event were treated with semaglutide or placebo. We used the results of kidney function tests from these studies to assess how fast kidney function declined in those treated with semaglutide or placebo. Our analysis showed that semaglutide significantly slowed the rate of kidney function decline and non-significantly extended the time taken to reach specified estimated glomerular filtration rate thresholds. We also saw that kidney function at the start of the trial did not impact these findings. Type 2 diabetes (T2D) markedly increases the risk of both cardiovascular (CV) disease and chronic kidney disease (CKD).1Thomas M.C. Cooper M.E. Zimmet P. Changing epidemiology of type 2 diabetes mellitus and associated chronic kidney disease.Nat Rev Nephrol. 2016; 12: 73-81Crossref PubMed Scopus (371) Google Scholar,2Sarwar N. Gao P. Seshasai S.R. et al.Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies.Lancet. 2010; 375: 2215-2222Abstract Full Text Full Text PDF PubMed Scopus (3347) Google Scholar Approximately 40% of patients with T2D will develop CKD, and T2D is now the most common cause of progression to kidney failure worldwide.3Drüeke T.B. Floege J. Cardiovascular complications of chronic kidney disease: pioneering studies.Kidney Int. 2020; 98: 522-526Abstract Full Text Full Text PDF PubMed Scopus (9) Google Scholar,4Alicic R.Z. Rooney M.T. Tuttle K.R. Diabetic kidney disease: challenges, progress, and possibilities.Clin J Am Soc Nephrol. 2017; 12: 2032-2045Crossref PubMed Scopus (1284) Google Scholar Moreover, most of the diabetes-associated excess CV disease risk in with with the population, in those with T2D also have Cardiovascular complications of kidney Full Text Full Text PDF PubMed Scopus Google M.C. et disease and risk in type 2 Am Soc Nephrol. PubMed Scopus Google Scholar Glucagon-like peptide-1 receptor agonists are treatment for people with T2D and have shown to have a effect on and of and in type 2 diabetes 2020; Scopus Google Scholar from CV trials with have the CV of these with the CV benefits in with et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google et risk with analysis of the 2020; PubMed Scopus Google et and cardiovascular in type 2 diabetes a Full Text Full Text PDF PubMed Scopus Google Scholar a clinical that in with T2D at high risk or with CV disease or CKD, or with CV clinical for diabetes in chronic kidney Int. Scopus Google Scholar the estimated glomerular filtration rate (eGFR) is ml/min per 1.73 et al.Diabetes in chronic kidney disease: a by the and PubMed Scopus Google Scholar of the CV trials of kidney disease and the that have effects on R.Z. et in kidney disease: and clinical Rev Nephrol. PubMed Scopus Google The effect of peptide-1 receptor agonists on in type 2 2020; PubMed Scopus Google Scholar In a in the and progression of and in the rate of decline in eGFR R.Z. et in kidney disease: and clinical Rev Nephrol. PubMed Scopus Google The effect of peptide-1 receptor agonists on in type 2 2020; PubMed Scopus Google Scholar However, more is needed to confirm these Semaglutide is a that the risk of CV and of in patients with T2D at high CV et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google Scholar Semaglutide is a or a The and clinical effects of semaglutide have shown to of of et and of and of the in and with type 2 PubMed Scopus Google et of semaglutide with placebo and semaglutide on in patients with type 2 a clinical 2017; PubMed Scopus Google et of analyses of data from clinical PubMed Scopus Google Scholar The 2 were in CV the to Cardiovascular and Semaglutide in Type 2 and the the Cardiovascular of Semaglutide in Type 2 The trials trial and but of et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google Scholar pooled analysis of SUSTAIN 6 and 6 showed effects on cardiovascular et and cardiovascular in type 2 diabetes cardiovascular 2020; PubMed Scopus Google Scholar and these CV trials also showed reductions in et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google a analysis of SUSTAIN 6 and and in of Cardiovascular data showed that reductions in the kidney effects of semaglutide and et kidney with and PubMed Scopus Google Scholar The of this post hoc pooled analysis of SUSTAIN 6 and 6 data was to the effects of semaglutide versus placebo on eGFR over time by a more and of the kidney benefits of eGFR slope a of et slope a end for kidney disease progression in clinical a meta-analysis of treatment effects of Am Soc Nephrol. PubMed Scopus Google Scholar The trial for SUSTAIN 6 and 6 have et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google Scholar In with T2D at high risk of a CV event were to semaglutide or placebo in to a high CV risk was and CV disease or CKD, or with CV risk for both trials kidney failure treated by chronic or 6 those with eGFR ml/min per 1.73 et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google Scholar In SUSTAIN participants semaglutide or 1.0 of participants in 6 semaglutide of et and cardiovascular in patients with type 2 J 2016; 375: PubMed Scopus Google et semaglutide and cardiovascular in patients with type 2 J PubMed Scopus Google Scholar were to semaglutide with to of in a fasting in the and to fast or for at 30 In both trials, a was used for and the was placebo. trials were by and at and were in with the on and the of participants In this post hoc data from the SUSTAIN 6 and 6 trials were pooled by treatment (semaglutide and 1.0 or placebo) and overall or in on eGFR at baseline ml/min per 1.73 m2 or ml/min per 1.73 m2 for and ml/min per 1.73 m2 or ml/min per 1.73 m2 for were also by or receptor at from participants with a baseline eGFR ml/min per 1.73 m2 were in the However, baseline eGFR ml/min per 1.73 m2 was in results from interaction to eGFR ml/min per 1.73 m2 are in eGFR was evaluated over in the overall and in the 2 eGFR both in a pooled analysis and in SUSTAIN 6 and 6 data from the 2 trials were used for SUSTAIN 6 and for The eGFR was the evaluated a treatment effect on annual eGFR shown to of kidney failure on a meta-analysis of clinical trials in et slope a end for kidney disease progression in clinical a meta-analysis of treatment effects of Am Soc Nephrol. PubMed Scopus Google Scholar In SUSTAIN at and In at and eGFR slope was post hoc in the overall and in the 2 eGFR both in a pooled analysis and in SUSTAIN 6 and 6 in the treatment effect of have observed to patients were also et of on eGFR in type 2 of and PubMed Scopus Google Scholar the annual eGFR slope analyses and eGFR were also evaluated in 2 to of at The change over time in estimated was also post hoc in the overall of the SUSTAIN 6 and in the 2 eGFR data were not in the 6 was by of to treatment and failure kidney baseline were on the analysis eGFR from the analysis were of participants kidney or a analyses were data from the analysis and overall and per subgroup. eGFR and over time were for with treatment subgroup, and the interaction treatment and subgroup and baseline a all The change from baseline in eGFR and the change from baseline in were evaluated at and in the semaglutide and placebo eGFR slope was a time a slope with interaction slope and treatment and with and time slope for baseline eGFR and trial a The and time were to a The subgroup analyses interaction slope and treatment by subgroup. The annual eGFR slope was estimated treatment difference with confidence The for interaction evaluated the treatment the 2 eGFR ml/min per m2 and ml/min per m2). to persistent reductions in eGFR was on the of in for the and the eGFR in the in SUSTAIN to the eGFR decline thresholds and and was in the overall and the 2 eGFR both pooled data and data by trial 6 and persistent eGFR decline was the time from to the at the eGFR change was with this at the no of the eGFR was not Hazard ratios and for time to persistent reductions in eGFR were estimated a with treatment and eGFR and the interaction treatment and eGFR by The for interaction evaluated treatment the 2 eGFR ml/min per m2 and ml/min per m2). the treatment effect on eGFR may by in 2 analyses were the slope for change from baseline in trial and for from baseline in and blood and baseline trial analysis eGFR data or was also to assess the chronic effect of semaglutide on annual eGFR decline in eGFR was observed with in participants with eGFR ml/min per 1.73 m2 in et of semaglutide on kidney function and in patients with type 2 a analysis of the SUSTAIN 2020; Full Text Full Text PDF PubMed Scopus Google Scholar in the chronic slope change from or was with eGFR at or the was with a was for SUSTAIN 6 and 6 6480 of 3239 semaglutide and 3241 baseline were in the 2 The of participants in the ml/min per 1.73 m2 and ml/min per 1.73 m2 eGFR and for the eGFR are in In SUSTAIN 6 and a of of and of were on at ml/min per 1.73 pooled pooled pooled pooled pooled pooled blood blood ml/min per 1.73 ml/min per 1.73 from participants with a baseline eGFR ml/min per 1.73 m2 were in the and are for the overall analyses but not for the subgroup is on of participants from SUSTAIN 6 with a at baseline (semaglutide, placebo, these data were not in is on of participants from SUSTAIN 6 with a at baseline (semaglutide, placebo, these data were not in of estimated glomerular filtration of participants; the Cardiovascular of Semaglutide in Type 2 SUSTAIN a to Cardiovascular and Semaglutide in Type 2 from the analysis are function is on eGFR ml/min per 1.73 m2 the from participants with a baseline eGFR ml/min per 1.73 m2 were in the and are for the overall analyses but not for the subgroup is on of participants from SUSTAIN 6 with a at baseline (semaglutide, placebo, these data were not in in a of estimated glomerular filtration of participants; the Cardiovascular of Semaglutide in Type 2 SUSTAIN a to Cardiovascular and Semaglutide in Type 2 from the analysis are function is on eGFR ml/min per 1.73 m2 the in eGFR and change in eGFR by subgroup, in the pooled and by are shown in and eGFR from baseline to by ml/min per 1.73 m2 with and by ml/min per 1.73 m2 with placebo, in the pooled from the SUSTAIN 6 and 6 trials eGFR from baseline by ml/min per 1.73 m2 with and by ml/min per 1.73 m2 with placebo eGFR also over time trial was although the change was in SUSTAIN 6 in 6 and that eGFR ml/min per 1.73 m2 was in trial effect 6 was estimated and this trial effect was used to the annual eGFR slope In the overall and in both eGFR semaglutide was associated with a significantly annual rate of eGFR with placebo In the overall population, annual eGFR with semaglutide and placebo and ml/min per 1.73 m2 0.59 0.29; eGFR per 1.73 m2) in the subgroup with baseline eGFR ml/min per 1.73 m2 were with semaglutide and with placebo 1.06 and in the subgroup with baseline eGFR ml/min per 1.73 m2 were with semaglutide and with placebo results were observed the analyses were for change from baseline in and for from baseline in and and baseline In the analysis data results were with semaglutide versus placebo although in both treatment annual eGFR were observed from to end of treatment from baseline to end of treatment in the ml/min per 1.73 m2 subgroup. The interaction treatment effect and eGFR in the eGFR slope analysis was not significant results were also the analyses were by change from baseline in from baseline in and baseline and in the analysis data The interaction treatment effect and was also not significant results were also observed the data were by although annual eGFR were observed in SUSTAIN 6 in 6 treatment effect and eGFR in the and eGFR were in the baseline ml/min per 1.73 m2 subgroup and the overall population, and were to in the ml/min per 1.73 m2 the significant treatment difference was for risk of a persistent eGFR in the ml/min per 1.73 m2 subgroup significant treatment effect and eGFR for eGFR for reductions in eGFR in trial showed results and although a significant interaction treatment effect and eGFR subgroup for the eGFR in SUSTAIN 6 In the overall SUSTAIN 6 trial population, semaglutide significantly estimated from baseline to end of with placebo significant reductions in were also in both eGFR in both a significant interaction eGFR subgroup, the of participants or kidney with semaglutide and placebo were eGFR subgroup, the of participants or treatment to were those semaglutide those placebo However, the of participants was in the 2 eGFR of the treatment and to treatment and kidney were more often in the subgroup with baseline eGFR ml/min per 1.73 m2 in the subgroup with baseline eGFR ml/min per 1.73 m2 data by estimated glomerular filtration rate (eGFR) ml/min per 1.73 to treatment kidney of participants with overall of participants per of participants with from participants with a baseline eGFR ml/min per 1.73 m2 were in the analyses and are for the overall analyses but not for the subgroup are and are from the analysis in a of participants with overall of participants per of participants with from participants with a baseline eGFR ml/min per 1.73 m2 were in the analyses and are for the overall analyses but not for the subgroup are and are from the analysis The analyses of SUSTAIN 6 and 6 data a annual rate of eGFR decline with semaglutide with placebo, of these CV Our results suggest that semaglutide kidney in with placebo, in the overall However, the kidney to for those with lower baseline although no significant the 2 eGFR in of the results were by trial and for change from baseline in and for from baseline in and and baseline in the overall and eGFR in eGFR to in the semaglutide by the eGFR slope the effect of the in eGFR by that and to assess impact on the analysis of eGFR slope data from to end of was analysis data from the treatment a decline in eGFR with et of semaglutide on kidney function and in patients with type 2 a analysis of the SUSTAIN 2020; Full Text Full Text PDF PubMed Scopus Google Scholar analysis showed results to those of the a in the rate of eGFR decline with semaglutide versus placebo. the with the eGFR slope and the change in eGFR analyses to this difference is to of of analysis and and from J. et in failure with a J PubMed Scopus Google et and kidney benefits of in failure the of kidney from PubMed Scopus Google Scholar In to a rate of eGFR for all eGFR thresholds were in the overall and ml/min per 1.73 m2 subgroup and were to in the ml/min per 1.73 m2 subgroup. that the kidney with semaglutide to for those with lower baseline in albuminuria were also with semaglutide versus placebo in SUSTAIN in the overall and in the 2 eGFR from studies with also suggest that this of may have benefits to kidney function and albuminuria in patients with T2D and et kidney with and PubMed Scopus Google et of semaglutide on kidney function and in patients with type 2 a analysis of the SUSTAIN 2020; Full Text Full Text PDF PubMed Scopus Google J. et in failure with a J PubMed Scopus Google et and kidney benefits of in failure the of kidney from PubMed Scopus Google et of semaglutide and on pooled analysis of SUSTAIN 6 and K.R. M.C. et versus in patients with type 2 diabetes and chronic kidney disease a Full Text Full Text PDF PubMed Scopus Google et and in type 2 J 2017; PubMed Scopus Google et and of semaglutide in patients with type 2 diabetes and a Full Text Full Text PDF PubMed Scopus Google et and in type 2 analysis of the Full Text Full Text PDF PubMed Scopus Google Scholar post hoc analysis of change in kidney function over time in the analysis to eGFR or ml/min per 1.73 m2) that eGFR declined significantly over with did with placebo in participants with baseline eGFR of ml/min per 1.73 m2 but not in the eGFR et and in type 2 J 2017; PubMed Scopus Google Scholar The of patients and in the of from this in the in Type 2 and or participants with T2D and significantly in eGFR with the or lower versus at K.R. M.C. et versus in patients with type 2 diabetes and chronic kidney disease a Full Text Full Text PDF PubMed Scopus Google Scholar the of that was with or with In the Cardiovascular a in CV trial that evaluated versus placebo in with a of post hoc analyses showed that was a lower of eGFR decline and with with placebo, a analysis on persistent reductions in eGFR did not of a difference treatment et and in type 2 analysis of the Full Text Full Text PDF PubMed Scopus Google Scholar In a meta-analysis of CV trials treatment with a versus placebo was associated with a in a kidney that N. et and kidney with receptor agonists in patients with type 2 a and meta-analysis of Full Text Full Text PDF PubMed Scopus Google Scholar Our analyses for change from baseline in and for from baseline in and and baseline that annual eGFR slope was by these from a analysis of SUSTAIN 6 and to kidney disease that and the kidney effects of semaglutide and et kidney with and PubMed Scopus Google Scholar blood and and blood were shown to have a impact or no et kidney with and PubMed Scopus Google Scholar However, post hoc analyses of data from SUSTAIN 6 and that the kidney effects of semaglutide and of baseline diabetes and blood et of peptide-1 receptor agonists and semaglutide on cardiovascular and in type 2 results of the and SUSTAIN 6 2020; PubMed Scopus Google et effect of peptide-1 receptor agonists and semaglutide on cardiovascular and baseline blood analysis of the and SUSTAIN 6 2020; PubMed Scopus Google et of diabetes and of and a post hoc analysis of the and SUSTAIN 6 clinical PubMed Scopus Google Scholar The SUSTAIN 6 and 6 trials were not to the in kidney function. However, kidney in The effect of peptide-1 receptor agonists on in type 2 2020; PubMed Scopus Google Scholar and are associated with and in previous of semaglutide to J. et and semaglutide reduce in and by a that PubMed Scopus Google Scholar in of CKD, kidney and have shown to reduce and effects of glucose 2016; Scopus Google et of in and Int. Full Text Full Text PDF Scopus Google Scholar that the in SUSTAIN 6 and 6 a of of semaglutide. the of effect of the of and to treatment were with semaglutide and placebo. In semaglutide not associated with risk of kidney versus a of in previous et of semaglutide on kidney function and in patients with type 2 a analysis of the SUSTAIN 2020; Full Text Full Text PDF PubMed Scopus Google Scholar no difference the eGFR in the of that the of semaglutide is by kidney function. analyses have shown that the risk of is in people with eGFR ml/min per 1.73 K.R. et kidney disease: a from PubMed Scopus Google Scholar was also from in for participants in the semaglutide or placebo the was in the subgroup with lower baseline was with both semaglutide and placebo, this risk of may the effect of used in the of kidney function. this analysis of results from SUSTAIN 6 and 6 to the of a of on kidney function in patients with a of the data analyses were post hoc from trials that were not to eGFR the of the of these analyses to the effect of treatment on kidney the in the have in to in treatment effect on of in eGFR change over time in the and ml/min per 1.73 m2 a to the However, this is were not in the placebo In the results in patients with T2D and high CV semaglutide the rate of eGFR decline. were with no in the risk of or the difference in annual eGFR kidney disease with semaglutide to with the benefits on kidney is needed from trials with kidney disease the The to Semaglutide to in Type 2 and and to Semaglutide in the to in with Type 2 and trials of semaglutide will these on the of and treatment for in to how semaglutide to placebo in people with type 2 diabetes and chronic kidney disease to how semaglutide in the to placebo, in people with type 2 diabetes and chronic kidney disease Scholar In the of Semaglutide in Patients Type 2 CV trial with semaglutide kidney disease and will semaglutide and kidney disease of semaglutide in patients with type 2 diabetes Scholar from from and from and and for or from and is on the of for the and of the Diabetic both of are for the of and are of and in from and and for clinical trials from and from and from and for from and and for from and a from for and and a for and from from and for from and for from and and for the for the of in from The this post hoc analysis are from the The trials in this and the analysis were by The all the and The also for on the of the analysis and on the and and for and by with estimated glomerular filtration rate (eGFR) slope to of at from baseline in estimated glomerular filtration rate (eGFR) over time for by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 glomerular filtration rate (eGFR) over time for in the SUSTAIN 6 for the overall and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 glomerular filtration rate (eGFR) over time for in the 6 for the overall and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 change in estimated glomerular filtration rate annual eGFR from baseline to end of treatment for change in in the SUSTAIN 6 and 6 trials in the overall population, and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 change in estimated glomerular filtration rate annual eGFR from baseline to end of treatment for change in change in change in blood and baseline in the SUSTAIN 6 and 6 trials in the overall population, and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 change in estimated glomerular filtration rate annual eGFR data to end of treatment pooled in the overall population, and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 change in estimated glomerular filtration rate annual eGFR over time and from baseline to end of treatment in the SUSTAIN 6 in the overall population, and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 change in estimated glomerular filtration rate annual eGFR over time and from baseline to end of treatment in the 6 in the overall population, and by baseline eGFR ml/min per 1.73 m2 and ml/min per 1.73 to of persistent reductions in estimated glomerular filtration rate (eGFR) in the overall and by baseline eGFR in the SUSTAIN 6 to of persistent reductions in estimated glomerular filtration rate (eGFR) in the overall and by baseline eGFR in the 6 over time and from baseline to end of treatment in the SUSTAIN 6 in the overall population, and by baseline estimated glomerular filtration rate (eGFR) ml/min per 1.73 m2 and ml/min per 1.73 in this is the most used to of glucose with to the of 2 is in patients with chronic kidney et the with that of the by did in more PDF The et for on the benefits of peptide-1 receptor agonists for kidney function in type 2 clinical are and of clinical trial the effects of a receptor to of estimated glomerular filtration rate studies patients for type 2 most of did not have chronic kidney disease these results the that receptor agonists have a eGFR decline is a to to kidney PDF data and the benefits of receptor agonists on kidney function the analysis of to Cardiovascular and Semaglutide in Type 2 the Cardiovascular of Semaglutide in Type 2 trials by Tuttle et a peptide-1 receptor was associated with a estimated glomerular filtration rate (eGFR) decline versus placebo, with a difference in eGFR slope of 0.59 ml/min per 1.73 m2 per PDF

RCTHigh evidence score

A Survey on Causal Inference

Liuyi Yao, Zhixuan Chu, Sheng Li +3 more · ACM Transactions on Knowledge Discovery from Data · 2021 · 437 citations

Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.

Meta-analysisHigh evidence score

Individual participant data meta‐analysis to examine interactions between treatment effect and participant‐level covariates: Statistical recommendations for conduct and planning

Richard D Riley, Thomas P. A. Debray, David J. Fisher +9 more · Statistics in Medicine · 2020 · 161 citations

Precision medicine research often searches for treatment-covariate interactions, which refers to when a treatment effect (eg, measured as a mean difference, odds ratio, hazard ratio) changes across values of a participant-level covariate (eg, age, gender, biomarker). Single trials do not usually have sufficient power to detect genuine treatment-covariate interactions, which motivate the sharing of individual participant data (IPD) from multiple trials for meta-analysis. Here, we provide statistical recommendations for conducting and planning an IPD meta-analysis of randomized trials to examine treatment-covariate interactions. For conduct, two-stage and one-stage statistical models are described, and we recommend: (i) interactions should be estimated directly, and not by calculating differences in meta-analysis results for subgroups; (ii) interaction estimates should be based solely on within-study information; (iii) continuous covariates and outcomes should be analyzed on their continuous scale; (iv) nonlinear relationships should be examined for continuous covariates, using a multivariate meta-analysis of the trend (eg, using restricted cubic spline functions); and (v) translation of interactions into clinical practice is nontrivial, requiring individualized treatment effect prediction. For planning, we describe first why the decision to initiate an IPD meta-analysis project should not be based on between-study heterogeneity in the overall treatment effect; and second, how to calculate the power of a potential IPD meta-analysis project in advance of IPD collection, conditional on characteristics (eg, number of participants, standard deviation of covariates) of the trials (potentially) promising their IPD. Real IPD meta-analysis projects are used for illustration throughout.

Meta-analysisTop journalHigh evidence score

Evaluation of pathological complete response as surrogate endpoint in neoadjuvant randomised clinical trials of early stage breast cancer: systematic review and meta-analysis

Fabio Conforti, Laura Pala, Isabella Sala +11 more · BMJ · 2021 · 140 citations

Abstract Objective To evaluate pathological complete response as a surrogate endpoint for disease-free survival and overall survival in regulatory neoadjuvant trials of early stage breast cancer. Design Systematic review and meta-analysis. Data sources Medline, Embase, and Scopus to 1 December 2020. Eligibility criteria for study selection Randomised clinical trials that tested neoadjuvant chemotherapy given alone or combined with other treatments, including anti-human epidermal growth factor 2 (anti-HER2) drugs, targeted treatments, antivascular agents, bisphosphonates, and immune checkpoint inhibitors. Data extraction and synthesis Trial level associations between the surrogate endpoint pathological complete response and disease-free survival and overall survival. Methods A weighted regression analysis was performed on log transformed treatment effect estimates (hazard ratio for disease-free survival and overall survival and relative risk for pathological complete response), and the coefficient of determination (R 2 ) was used to quantify the association. The secondary objective was to explore heterogeneity of results in preplanned subgroups analysis, stratifying trials according treatment type in the experimental arm, definition used for pathological complete response (breast and lymph nodes v breast only), and biological features of the disease (HER2 positive or triple negative breast cancer). The surrogate threshold effect was also evaluated, indicating the minimum value of the relative risk for pathological complete response necessary to confidently predict a non-null effect on hazard ratio for disease-free survival or overall survival. Results 54 randomised clinical trials comprising a total of 32 611 patients were included in the analysis. A weak association was observed between the log(relative risk) for pathological complete response and log(hazard ratio) for both disease-free survival (R 2 =0.14, 95% confidence interval 0.00 to 0.29) and overall survival (R 2 =0.08, 0.00 to 0.22). Similar results were found across all subgroups evaluated, independently of the definition used for pathological complete response, treatment type in the experimental arm, and biological features of the disease. The surrogate threshold effect was 5.19 for disease-free survival but was not estimable for overall survival. Consistent results were confirmed in three sensitivity analyses: excluding small trials (<200 patients enrolled), excluding trials with short median follow-up (<24 months), and replacing the relative risk for pathological complete response with the absolute difference of pathological complete response rates between treatment arms. Conclusion A lack of surrogacy of pathological complete response was identified at trial level for both disease-free survival and overall survival. The findings suggest that pathological complete response should not be used as primary endpoint in regulatory neoadjuvant trials of early stage breast cancer.

RCTTop journalHigh evidence score

The Predictive Approaches to Treatment effect Heterogeneity (PATH) Statement

David M. Kent, Jessica K. Paulus, David van Klaveren +14 more · Annals of Internal Medicine · 2019 · 426 citations

Heterogeneity of treatment effect (HTE) refers to the nonrandom variation in the magnitude or direction of a treatment effect across levels of a covariate, as measured on a selected scale, against a clinical outcome. In randomized controlled trials (RCTs), HTE is typically examined through a subgroup analysis that contrasts effects in groups of patients defined "1 variable at a time" (for example, male vs. female or old vs. young). The authors of this statement present guidance on an alternative approach to HTE analysis, "predictive HTE analysis." The goal of predictive HTE analysis is to provide patient-centered estimates of outcome risks with versus without the intervention, taking into account all relevant patient attributes simultaneously. The PATH (Predictive Approaches to Treatment effect Heterogeneity) Statement was developed using a multidisciplinary technical expert panel, targeted literature reviews, simulations to characterize potential problems with predictive approaches, and a deliberative process engaging the expert panel. The authors distinguish 2 categories of predictive HTE approaches: a "risk-modeling" approach, wherein a multivariable model predicts the risk for an outcome and is applied to disaggregate patients within RCTs to define risk-based variation in benefit, and an "effect-modeling" approach, wherein a model is developed on RCT data by incorporating a term for treatment assignment and interactions between treatment and baseline covariates. Both approaches can be used to predict differential absolute treatment effects, the most relevant scale for clinical decision making. The authors developed 4 sets of guidance: criteria to determine when risk-modeling approaches are likely to identify clinically important HTE, methodological aspects of risk-modeling methods, considerations for translation to clinical practice, and considerations and caveats in the use of effect-modeling approaches. The PATH Statement, together with its explanation and elaboration document, may guide future analyses and reporting of RCTs.

RCTHigh evidence score

The use of propensity score methods with survival or time‐to‐event outcomes: reporting measures of effect similar to those used in randomized experiments

Peter C. Austin · Statistics in Medicine · 2013 · 1,433 citations

Propensity score methods are increasingly being used to estimate causal treatment effects in observational studies. In medical and epidemiological studies, outcomes are frequently time-to-event in nature. Propensity-score methods are often applied incorrectly when estimating the effect of treatment on time-to-event outcomes. This article describes how two different propensity score methods (matching and inverse probability of treatment weighting) can be used to estimate the measures of effect that are frequently reported in randomized controlled trials: (i) marginal survival curves, which describe survival in the population if all subjects were treated or if all subjects were untreated; and (ii) marginal hazard ratios. The use of these propensity score methods allows one to replicate the measures of effect that are commonly reported in randomized controlled trials with time-to-event outcomes: both absolute and relative reductions in the probability of an event occurring can be determined. We also provide guidance on variable selection for the propensity score model, highlight methods for assessing the balance of baseline covariates between treated and untreated subjects, and describe the implementation of a sensitivity analysis to assess the effect of unmeasured confounding variables on the estimated treatment effect when outcomes are time-to-event in nature. The methods in the paper are illustrated by estimating the effect of discharge statin prescribing on the risk of death in a sample of patients hospitalized with acute myocardial infarction. In this tutorial article, we describe and illustrate all the steps necessary to conduct a comprehensive analysis of the effect of treatment on time-to-event outcomes.

ObservationalTop journalHigh evidence score

Network analysis of multivariate data in psychological science

Denny Borsboom, Marie K. Deserno, Mijke Rhemtulla +12 more · Nature Reviews Methods Primers · 2021 · 1,219 citations

In recent years, network analysis has been applied to identify and analyse patterns of statistical association in multivariate psychological data. In these approaches, network nodes represent variables in a data set, and edges represent pairwise conditional associations between variables in the data, while conditioning on the remaining variables. This Primer provides an anatomy of these techniques, describes the current state of the art and discusses open problems. We identify relevant data structures in which network analysis may be applied: cross-sectional data, repeated measures and intensive longitudinal data. We then discuss the estimation of network structures in each of these cases, as well as assessment techniques to evaluate network robustness and replicability. Successful applications of the technique in different research areas are highlighted. Finally, we discuss limitations and challenges for future research. Network analysis allows the investigation of complex patterns and relationships by examining nodes and the edges connecting them. Borsboom et al. discuss the adoption of network analysis in psychological research.

Meta-analysisHigh evidence score

Psychotherapies for depression in low‐ and middle‐income countries: a meta‐analysis

Pim Cuijpers, Eirini Karyotaki, Mirjam Reijnders +2 more · World Psychiatry · 2018 · 165 citations

Most psychotherapies for depression have been developed in high‐income Western countries of North America, Europe and Australia. A growing number of randomized trials have examined the effects of these treatments in non‐Western countries. We conducted a meta‐analysis of these studies to examine whether these psychotherapies are effective and to compare their effects between studies from Western and non‐Western countries. We conducted systematic searches in bibliographical databases and included 253 randomized controlled trials, of which 32 were conducted in non‐Western countries. The effects of psychotherapies in non‐Western countries were large (g=1.10; 95% CI: 0.91‐1.30), with high heterogeneity (I 2 =90; 95% CI: 87‐92). After adjustment for publication bias, the effect size dropped to g=0.73 (95% CI: 0.51‐0.96). Subgroup analyses did not indicate that adaptation to the local situation was associated with the effect size. Comparisons with the studies in Western countries showed that the effects of the therapies were significantly larger in non‐Western countries, also after adjusting for characteristics of the participants, the treatments and the studies. These larger effect sizes in non‐Western countries may reflect true differences indicating that therapies are indeed more effective; or may be explained by the care‐as‐usual control conditions in non‐Western countries, often indicating that no care was available; or may be the result of the relative low quality of many trials in the field. This study suggests that psychotherapies that were developed in Western countries may or may not be more effective in non‐Western countries, but they are probably no less effective and can therefore also be used in these latter countries.

Meta-analysisHigh evidence score

Corn Response to Nitrogen is Influenced by Soil Texture and Weather

Nicolas Tremblay, Yacine Bouroubi, C. Bélec +9 more · Agronomy Journal · 2012 · 252 citations

Soil properties and weather conditions are known to affect soil N availability and plant N uptake; however, studies examining N response as affected by soil and weather sometimes give conflicting results. Meta‐analysis is a statistical method for estimating treatment effects in a series of experiments to explain the sources of heterogeneity. In this study, the technique was used to examine the influence of soil and weather parameters on N response of corn ( Zea mays L.) across 51 studies involving the same N rate treatments that were performed in a diversity of North American locations between 2006 and 2009. Results showed that corn response to added N was significantly greater in fine‐textured soils than in medium‐textured soils. Abundant and well‐distributed rainfall and, to a lesser extent, accumulated corn heat units enhanced N response. Corn yields increased by a factor of 1.6 (over the unfertilized control) in medium‐textured soils and 2.7 in fine‐textured soils at high N rates. Subgroup analyses were performed on the fine‐textured soil class based on weather parameters. Rainfall patterns had an important effect on N response in this soil texture class, with yields being increased 4.5‐fold by in‐season N fertilization under conditions of “abundant and well‐distributed rainfall.” These findings could be useful for developing N fertilization algorithms that would prescribe N application at optimal rates taking into account rainfall pattern and soil texture, which would lead to improved crop profitability and reduced environmental impacts.

RCTHigh evidence score

Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion)

P. Richard Hahn, Jared S. Murray, Carlos M. Carvalho · Bayesian Analysis · 2020 · 303 citations

This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding by observables. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects. First, they can yield badly biased estimates of treatment effects when fit to data with strong confounding. The Bayesian causal forest model presented in this paper avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. Second, standard approaches to response surface modeling do not provide adequate control over the strength of regularization over effect heterogeneity. The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively “shrink to homogeneity”. While we focus on observational data, our methods are equally useful for inferring heterogeneous treatment effects from randomized controlled experiments where careful regularization is somewhat less complicated but no less important. We illustrate these benefits via the reanalysis of an observational study assessing the causal effects of smoking on medical expenditures as well as extensive simulation studies.

Meta-analysisHigh evidence score

A graphical method for exploring heterogeneity in meta‐analyses: application to a meta‐analysis of 65 trials

B. Baujat, Cédric Mahé, Jean‐Pierre Pignon +1 more · Statistics in Medicine · 2002 · 695 citations

Heterogeneity can be a major component of meta-analyses and by virtue of that fact warrants investigation. Classic analysis methods, such as meta-regression, are used to explore the sources of heterogeneity. However, it may be difficult to apply such a method in complex cases or in the absence of an a priori hypothesis. This paper presents a graphical method to identify trials, groups of trials or groups of patients that are sources of heterogeneity. The contribution of these trials to the overall result can also be evaluated with this method. Each trial is represented by a dot on a 2D graph. The X-axis represents the contribution of the trial to the overall Cochran Q-test for heterogeneity. The Y-axis represents the influence of the trial, defined as the standardized squared difference between the treatment effects estimated with and without the trial. This approach has been applied to data from the Meta-Analysis of Chemotherapy in Head and Neck Cancer (MACH-NC) comprising 10,850 patients in 65 randomized trials. The graphical method allowed us to identify trials that contributed considerably to the overall heterogeneity and had a strong influence on the overall result. It also provided useful information for the interpretation of heterogeneity in this meta-analysis. The proposed graphical method identifies trials that account for most of the heterogeneity without having to explore all possible sources of heterogeneity by subgroup analyses. This method can also be applied to identify types of patients that explain heterogeneity in the treatment effect.

StudyModerate

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology

Sue Richards, Nazneen Aziz, Sherri J. Bale +9 more · Genetics in Medicine · 2015 · 31,840 citations

Meta-analysisHigh evidence score

A process for assessing the feasibility of a network meta-analysis: a case study of everolimus in combination with hormonal therapy versus chemotherapy for advanced breast cancer

Shannon Cope, Jie Zhang, Stephen Saletan +3 more · BMC Medicine · 2014 · 116 citations

BACKGROUND: The aim of this study is to outline a general process for assessing the feasibility of performing a valid network meta-analysis (NMA) of randomized controlled trials (RCTs) to synthesize direct and indirect evidence for alternative treatments for a specific disease population. METHODS: Several steps to assess the feasibility of an NMA are proposed based on existing recommendations. Next, a case study is used to illustrate this NMA feasibility assessment process in order to compare everolimus in combination with hormonal therapy to alternative chemotherapies in terms of progression-free survival for women with advanced breast cancer. RESULTS: A general process for assessing the feasibility of an NMA is outlined that incorporates explicit steps to visualize the heterogeneity in terms of treatment and outcome characteristics (Part A) as well as the study and patient characteristics (Part B). Additionally, steps are performed to illustrate differences within and across different types of direct comparisons in terms of baseline risk (Part C) and observed treatment effects (Part D) since there is a risk that the treatment effect modifiers identified may not explain the observed heterogeneity or inconsistency in the results due to unexpected, unreported or unmeasured differences. Depending on the data available, alternative approaches are suggested: list assumptions, perform a meta-regression analysis, subgroup analysis, sensitivity analyses, or summarize why an NMA is not feasible. CONCLUSIONS: The process outlined to assess the feasibility of an NMA provides a stepwise framework that will help to ensure that the underlying assumptions are systematically explored and that the risks (and benefits) of pooling and indirectly comparing treatment effects from RCTs for a particular research question are transparent.

StudyTop journalModerate

Habitat fragmentation and its lasting impact on Earth’s ecosystems

Nick M. Haddad, Lars A. Brudvig, Jean Clobert +21 more · Science Advances · 2015 · 4,475 citations

We conducted an analysis of global forest cover to reveal that 70% of remaining forest is within 1 km of the forest's edge, subject to the degrading effects of fragmentation. A synthesis of fragmentation experiments spanning multiple biomes and scales, five continents, and 35 years demonstrates that habitat fragmentation reduces biodiversity by 13 to 75% and impairs key ecosystem functions by decreasing biomass and altering nutrient cycles. Effects are greatest in the smallest and most isolated fragments, and they magnify with the passage of time. These findings indicate an urgent need for conservation and restoration measures to improve landscape connectivity, which will reduce extinction rates and help maintain ecosystem services.

StudyModerate

A Quantile-Based g-Computation Approach to Addressing the Effects of Exposure Mixtures

Alexander P. Keil, Jessie P. Buckley, Katie M. O’Brien +3 more · Environmental Health Perspectives · 2020 · 1,597 citations

BACKGROUND: Exposure mixtures frequently occur in data across many domains, particularly in the fields of environmental and nutritional epidemiology. Various strategies have arisen to answer questions about exposure mixtures, including methods such as weighted quantile sum (WQS) regression that estimate a joint effect of the mixture components. OBJECTIVES: We demonstrate a new approach to estimating the joint effects of a mixture: quantile g-computation. This approach combines the inferential simplicity of WQS regression with the flexibility of g-computation, a method of causal effect estimation. We use simulations to examine whether quantile g-computation and WQS regression can accurately and precisely estimate the effects of mixtures in a variety of common scenarios. METHODS: We examine the bias, confidence interval (CI) coverage, and bias-variance tradeoff of quantile g-computation and WQS regression and how these quantities are impacted by the presence of noncausal exposures, exposure correlation, unmeasured confounding, and nonlinearity of exposure effects. RESULTS: Quantile g-computation, unlike WQS regression, allows inference on mixture effects that is unbiased with appropriate CI coverage at sample sizes typically encountered in epidemiologic studies and when the assumptions of WQS regression are not met. Further, WQS regression can magnify bias from unmeasured confounding that might occur if important components of the mixture are omitted from the analysis. DISCUSSION: Unlike inferential approaches that examine the effects of individual exposures while holding other exposures constant, methods like quantile g-computation that can estimate the effect of a mixture are essential for understanding the effects of potential public health actions that act on exposure sources. Our approach may serve to help bridge gaps between epidemiologic analysis and interventions such as regulations on industrial emissions or mining processes, dietary changes, or consumer behavioral changes that act on multiple exposures simultaneously. https://doi.org/10.1289/EHP5838.

StudyModerate

The Effect of Minimum Wages on Low-Wage Jobs*

Doruk Cengiz, Arindrajit Dubé, Attila Lindner +1 more · The Quarterly Journal of Economics · 2019 · 2,335 citations

Abstract We estimate the effect of minimum wages on low-wage jobs using 138 prominent state-level minimum wage changes between 1979 and 2016 in the United States using a difference-in-differences approach. We first estimate the effect of the minimum wage increase on employment changes by wage bins throughout the hourly wage distribution. We then focus on the bottom part of the wage distribution and compare the number of excess jobs paying at or slightly above the new minimum wage to the missing jobs paying below it to infer the employment effect. We find that the overall number of low-wage jobs remained essentially unchanged over the five years following the increase. At the same time, the direct effect of the minimum wage on average earnings was amplified by modest wage spillovers at the bottom of the wage distribution. Our estimates by detailed demographic groups show that the lack of job loss is not explained by labor-labor substitution at the bottom of the wage distribution. We also find no evidence of disemployment when we consider higher levels of minimum wages. However, we do find some evidence of reduced employment in tradeable sectors. We also show how decomposing the overall employment effect by wage bins allows a transparent way of assessing the plausibility of estimates.

ObservationalModerate

A tutorial on propensity score estimation for multiple treatments using generalized boosted models

Daniel F. McCaffrey, Beth Ann Griffin, Daniel Almirall +3 more · Statistics in Medicine · 2013 · 1,479 citations

The use of propensity scores to control for pretreatment imbalances on observed variables in non-randomized or observational studies examining the causal effects of treatments or interventions has become widespread over the past decade. For settings with two conditions of interest such as a treatment and a control, inverse probability of treatment weighted estimation with propensity scores estimated via boosted models has been shown in simulation studies to yield causal effect estimates with desirable properties. There are tools (e.g., the twang package in R) and guidance for implementing this method with two treatments. However, there is not such guidance for analyses of three or more treatments. The goals of this paper are twofold: (1) to provide step-by-step guidance for researchers who want to implement propensity score weighting for multiple treatments and (2) to propose the use of generalized boosted models (GBM) for estimation of the necessary propensity score weights. We define the causal quantities that may be of interest to studies of multiple treatments and derive weighted estimators of those quantities. We present a detailed plan for using GBM to estimate propensity scores and using those scores to estimate weights and causal effects. We also provide tools for assessing balance and overlap of pretreatment variables among treatment groups in the context of multiple treatments. A case study examining the effects of three treatment programs for adolescent substance abuse demonstrates the methods.

RCTTop journalHigh evidence score

Symptom-based stratification of patients with primary Sjögren's syndrome: multi-dimensional characterisation of international observational cohorts and reanalyses of randomised clinical trials

Jessica Tarn, Nadia Howard-Tripp, Dennis Lendrem +97 more · The Lancet Rheumatology · 2019 · 134 citations

BackgroundHeterogeneity is a major obstacle to developing effective treatments for patients with primary Sjögren's syndrome. We aimed to develop a robust method for stratification, exploiting heterogeneity in patient-reported symptoms, and to relate these differences to pathobiology and therapeutic response.MethodsWe did hierarchical cluster analysis using five common symptoms associated with primary Sjögren's syndrome (pain, fatigue, dryness, anxiety, and depression), followed by multinomial logistic regression to identify subgroups in the UK Primary Sjögren's Syndrome Registry (UKPSSR). We assessed clinical and biological differences between these subgroups, including transcriptional differences in peripheral blood. Patients from two independent validation cohorts in Norway and France were used to confirm patient stratification. Data from two phase 3 clinical trials were similarly stratified to assess the differences between subgroups in treatment response to hydroxychloroquine and rituximab.FindingsIn the UKPSSR cohort (n=608), we identified four subgroups: Low symptom burden (LSB), high symptom burden (HSB), dryness dominant with fatigue (DDF), and pain dominant with fatigue (PDF). Significant differences in peripheral blood lymphocyte counts, anti-SSA and anti-SSB antibody positivity, as well as serum IgG, κ-free light chain, β2-microglobulin, and CXCL13 concentrations were observed between these subgroups, along with differentially expressed transcriptomic modules in peripheral blood. Similar findings were observed in the independent validation cohorts (n=396). Reanalysis of trial data stratifying patients into these subgroups suggested a treatment effect with hydroxychloroquine in the HSB subgroup and with rituximab in the DDF subgroup compared with placebo.InterpretationStratification on the basis of patient-reported symptoms of patients with primary Sjögren's syndrome revealed distinct pathobiological endotypes with distinct responses to immunomodulatory treatments. Our data have important implications for clinical management, trial design, and therapeutic development. Similar stratification approaches might be useful for patients with other chronic immune-mediated diseases.FundingUK Medical Research Council, British Sjogren's Syndrome Association, French Ministry of Health, Arthritis Research UK, Foundation for Research in Rheumatology.Video AbstracteyJraWQiOiI4ZjUxYWNhY2IzYjhiNjNlNzFlYmIzYWFmYTU5NmZmYyIsImFsZyI6IlJTMjU2In0.eyJzdWIiOiIyOTUzMmExYmI4ZmUzM2U3MTY4OWEwY2RlZDgzZTgzMSIsImtpZCI6IjhmNTFhY2FjYjNiOGI2M2U3MWViYjNhYWZhNTk2ZmZjIiwiZXhwIjoxNjQ3NDI0MjYzfQ.FPob_XBAgxz6D-GKyAA1qrbbBlFyZR-cvgsA51HscM66sOA7KC_ZBdSh1pCuKrWp_7WWQWES3d0s55x8EyByctqnBLy3e97Cbym1rL-8jIZJj_XjGQJFYygDXk8WclyNJtu9kAqUpyAcLnZdlmMVcAmm76mny0JyLP6dC3D6YNdwcCwMRbXHosz2ZfUlfhQ9Q-eMojRGvmI1joDMeUxd03joeO7_Hht9hcuI809_mhAfZu6SVKUo-qu6tHKNpszu8Me-M6bM2q149gSplGxObG0sBezPJZ2qbgm1tK8BSPuVpPPk9dbwMT7fg-cjpY4en6KurU1PsbVqQMvl01qjJw(mp4, (77.1 MB) Download video Symptom-based stratification of patients with Sjögren's syndrome

ObservationalModerate

The Gaussian Graphical Model in Cross-Sectional and Time-Series Data

Sacha Epskamp, Lourens Waldorp, René Mõttus +1 more · Multivariate Behavioral Research · 2018 · 1,145 citations

We discuss the Gaussian graphical model (GGM; an undirected network of partial correlation coefficients) and detail its utility as an exploratory data analysis tool. The GGM shows which variables predict one-another, allows for sparse modeling of covariance structures, and may highlight potential causal relationships between observed variables. We describe the utility in three kinds of psychological data sets: data sets in which consecutive cases are assumed independent (e.g., cross-sectional data), temporally ordered data sets (e.g., n = 1 time series), and a mixture of the 2 (e.g., n > 1 time series). In time-series analysis, the GGM can be used to model the residual structure of a vector-autoregression analysis (VAR), also termed graphical VAR. Two network models can then be obtained: a temporal network and a contemporaneous network. When analyzing data from multiple subjects, a GGM can also be formed on the covariance structure of stationary means-the between-subjects network. We discuss the interpretation of these models and propose estimation methods to obtain these networks, which we implement in the R packages graphicalVAR and mlVAR. The methods are showcased in two empirical examples, and simulation studies on these methods are included in the supplementary materials.

RCTHigh evidence score

Tutorial in biostatistics: data‐driven subgroup identification and analysis in clinical trials

Ilya Lipkovich, Alex Dmitrienko, Benjamin James Ralph · Statistics in Medicine · 2016 · 293 citations

It is well known that both the direction and magnitude of the treatment effect in clinical trials are often affected by baseline patient characteristics (generally referred to as biomarkers). Characterization of treatment effect heterogeneity plays a central role in the field of personalized medicine and facilitates the development of tailored therapies. This tutorial focuses on a general class of problems arising in data-driven subgroup analysis, namely, identification of biomarkers with strong predictive properties and patient subgroups with desirable characteristics such as improved benefit and/or safety. Limitations of ad-hoc approaches to biomarker exploration and subgroup identification in clinical trials are discussed, and the ad-hoc approaches are contrasted with principled approaches to exploratory subgroup analysis based on recent advances in machine learning and data mining. A general framework for evaluating predictive biomarkers and identification of associated subgroups is introduced. The tutorial provides a review of a broad class of statistical methods used in subgroup discovery, including global outcome modeling methods, global treatment effect modeling methods, optimal treatment regimes, and local modeling methods. Commonly used subgroup identification methods are illustrated using two case studies based on clinical trials with binary and survival endpoints. Copyright © 2016 John Wiley & Sons, Ltd.

StudyTop journalModerate

Bayesian statistics and modelling

Rens van de Schoot, Sarah Depaoli, Ruth King +8 more · Nature Reviews Methods Primers · 2021 · 1,152 citations

Meta-analysisHigh evidence score

A Guide to Understanding Meta-analysis

Heidi Israel, Randy R. Richter · Journal of Orthopaedic and Sports Physical Therapy · 2011 · 210 citations

With the focus on evidence-based practice in healthcare, a well-conducted systematic review that includes a meta-analysis where indicated represents a high level of evidence for treatment effectiveness. The purpose of this commentary is to assist clinicians in understanding meta-analysis as a statistical tool using both published articles and explanations of components of the technique. We describe what meta-analysis is, what heterogeneity is, and how it affects meta-analysis, effect size, the modeling techniques of meta-analysis, and strengths and weaknesses of meta-analysis. Common components like forest plot interpretation, software that may be used, special cases for meta-analysis, such as subgroup analysis, individual patient data, and meta-regression, and a discussion of criticisms, are included.

RCTHigh evidence score

Assessing and reporting heterogeneity in treatment effects in clinical trials: a proposal

David M. Kent, Peter M. Rothwell, John P. A. Ioannidis +2 more · Trials · 2010 · 502 citations

Mounting evidence suggests that there is frequently considerable variation in the risk of the outcome of interest in clinical trial populations. These differences in risk will often cause clinically important heterogeneity in treatment effects (HTE) across the trial population, such that the balance between treatment risks and benefits may differ substantially between large identifiable patient subgroups; the "average" benefit observed in the summary result may even be non-representative of the treatment effect for a typical patient in the trial. Conventional subgroup analyses, which examine whether specific patient characteristics modify the effects of treatment, are usually unable to detect even large variations in treatment benefit (and harm) across risk groups because they do not account for the fact that patients have multiple characteristics simultaneously that affect the likelihood of treatment benefit. Based upon recent evidence on optimal statistical approaches to assessing HTE, we propose a framework that prioritizes the analysis and reporting of multivariate risk-based HTE and suggests that other subgroup analyses should be explicitly labeled either as primary subgroup analyses (well-motivated by prior evidence and intended to produce clinically actionable results) or secondary (exploratory) subgroup analyses (performed to inform future research). A standardized and transparent approach to HTE assessment and reporting could substantially improve clinical trial utility and interpretability.

ObservationalTop journalModerate

Recursive partitioning for heterogeneous causal effects

Susan Athey, Guido W. Imbens · Proceedings of the National Academy of Sciences · 2016 · 1,523 citations

In this paper we propose methods for estimating heterogeneity in causal effects in experimental and observational studies and for conducting hypothesis tests about the magnitude of differences in treatment effects across subsets of the population. We provide a data-driven approach to partition the data into subpopulations that differ in the magnitude of their treatment effects. The approach enables the construction of valid confidence intervals for treatment effects, even with many covariates relative to the sample size, and without "sparsity" assumptions. We propose an "honest" approach to estimation, whereby one sample is used to construct the partition and another to estimate treatment effects for each subpopulation. Our approach builds on regression tree methods, modified to optimize for goodness of fit in treatment effects and to account for honest estimation. Our model selection criterion anticipates that bias will be eliminated by honest estimation and also accounts for the effect of making additional splits on the variance of treatment effect estimates within each subpopulation. We address the challenge that the "ground truth" for a causal effect is not observed for any individual unit, so that standard approaches to cross-validation must be modified. Through a simulation study, we show that for our preferred method honest estimation results in nominal coverage for 90% confidence intervals, whereas coverage ranges between 74% and 84% for nonhonest approaches. Honest estimation requires estimating the model with a smaller sample size; the cost in terms of mean squared error of treatment effects for our preferred method ranges between 7-22%.

StudyModerate

Toward Causal Representation Learning

Bernhard Schölkopf, Francesco Locatello, Stefan Bauer +4 more · Proceedings of the IEEE · 2021 · 998 citations

The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities.

StudyModerate

The Welfare Effects of Social Media

Hunt Allcott, Luca Braghieri, Sarah Eichmeyer +1 more · American Economic Review · 2020 · 787 citations

The rise of social media has provoked both optimism about potential societal benefits and concern about harms such as addiction, depression, and political polarization. In a randomized experiment, we find that deactivating Facebook for the four weeks before the 2018 US midterm election (i) reduced online activity, while increasing offline activities such as watching TV alone and socializing with family and friends; (ii) reduced both factual news knowledge and political polarization; (iii) increased subjective well-being; and post-experiment Facebook use. Deactivation reduced post-experiment valuations of Facebook, suggesting that traditional metrics may overstate consumer surplus. (JEL D12, D72, D90, I31, L82, L86, Z13)

StudyModerate

Interpretable machine learning: Fundamental principles and 10 grand challenges

Cynthia Rudin, Chaofan Chen, Zhi Chen +3 more · Statistics Surveys · 2022 · 828 citations

Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the “Rashomon set” of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.

StudyModerate

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors)

Jerome H. Friedman, Trevor Hastie, Robert Tibshirani · The Annals of Statistics · 2000 · 6,928 citations

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.

StudyModerate

Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank

Fidel Alfaro‐Almagro, Mark Jenkinson, Neal K. Bangerter +18 more · NeuroImage · 2017 · 1,742 citations

UK Biobank is a large-scale prospective epidemiological study with all data accessible to researchers worldwide. It is currently in the process of bringing back 100,000 of the original participants for brain, heart and body MRI, carotid ultrasound and low-dose bone/fat x-ray. The brain imaging component covers 6 modalities (T1, T2 FLAIR, susceptibility weighted MRI, Resting fMRI, Task fMRI and Diffusion MRI). Raw and processed data from the first 10,000 imaged subjects has recently been released for general research access. To help convert this data into useful summary information we have developed an automated processing and QC (Quality Control) pipeline that is available for use by other researchers. In this paper we describe the pipeline in detail, following a brief overview of UK Biobank brain imaging and the acquisition protocol. We also describe several quantitative investigations carried out as part of the development of both the imaging protocol and the processing pipeline.

Meta-analysisHigh evidence score

A META-ANALYSIS OF THE EFFECTS OF ORGANIZATIONAL BEHAVIOR MODIFICATION ON TASK PERFORMANCE, 1975-95.

Alex Stajkovic, Fred Luthans · Academy of Management Journal · 1997 · 388 citations

Results of a primary meta-analysis indicated a significant main effect of the organizational behavior modification (O.B. Mod.) approach on task performance (d. = .51; a 17 percent increase) and a significant treatment-by-study interaction. To account for within-group heterogeneity of effect sizes, we conducted a two-level theory-driven moderator analysis by partitioning the sample of studies first into manufacturing and service groups and then into seven classes of reinforcement interventions. Results indicated a stronger average effect of O.B. Mod. in manufacturing organizations, moderation by the type of contingent interventions, and "pairwise" differences among average effect sizes in both organizational types. The practical implications of these findings for solving

RCTHigh evidence score

Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases

Thierry Wendling, Kenneth Jung, Alison Callahan +3 more · Statistics in Medicine · 2018 · 117 citations

There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies.

StudyTop journalModerate

Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors

Jennifer A. Hoeting, David Madigan, Adrian E. Raftery +1 more · Statistical Science · 1999 · 4,164 citations

Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-confident inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA)provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA have recently emerged. We discuss these methods and present a number of examples.In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of currently available BMA software.

RCTTop journalHigh evidence score

Subgroup analysis in randomised controlled trials: importance, indications, and interpretation

Peter M. Rothwell · The Lancet · 2005 · 924 citations

StudyModerate

A kernel two-sample test

Arthur Gretton, Karsten Borgwardt, Malte J. Rasch +2 more · MPG.PuRe (Max Planck Society) · 2012 · 2,230 citations

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD).We present two distributionfree tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

RCTTop journalHigh evidence score

Individualized Treatment Effect Prediction with Machine Learning — Salient Considerations

Rishi Desai, Robert J. Glynn, Scott D. Solomon +3 more · NEJM Evidence · 2024 · 32 citations

BACKGROUND: Machine learning-based approaches that seek to accomplish individualized treatment effect prediction have gained traction; however, some salient challenges lack wider recognition. METHODS: We describe key methodologic considerations for individualized treatment effect prediction models using data from the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist Trial for spironolactone in heart failure with preserved ejection fraction. The causal survival forest algorithm was used for model development. Calibration and discrimination were evaluated using a bootstrapping-based internal validation procedure. Observed benefits were described for predicted benefit quartiles and quartiles of a known effect modifier: ejection fraction. A negative control analysis with noncardiovascular death as the outcome was implemented to detect confounding. RESULTS: Among 3445 participants, 671 events occurred over a median of 3.3 years of follow-up. In internal validation, a higher average observed benefit was noted among patients in the highest quartile of predicted benefit. The median (interquartile range) of the observed restricted mean survival time difference at 3.3 years at the highest quartile of model-predicted benefit was 62 days (32 to 83) and was 47 days (26 to 67) at the lowest quartile of ejection fraction. Body-mass index had higher contribution to prediction of benefit relative to other included measures (33.7% vs. glomerular filtration rate [27.3%], ejection fraction [15.1%], and younger age [12.8%]) No benefit was observed for noncardiovascular death at higher model-predicted benefit quartiles, although benefit for noncardiovascular death was observed at lower quartiles. CONCLUSIONS: Carefully applied and validated predictive models hold promise in identifying heterogeneous treatment effects and are useful for hypothesis generation regarding the role of phenotypic characteristics in modifying the benefit of experimental interventions in clinical trials. (Funded by the National Heart, Lung, and Blood Institute; ClinicalTrials.gov number, NCT00094302.).

RCTHigh evidence score

Using Machine Learning to Individualize Treatment Effect Estimation: Challenges and Opportunities

Alicia Curth, Richard Peck, Eoin McKinney +2 more · Clinical Pharmacology & Therapeutics · 2023 · 34 citations

The use of data from randomized clinical trials to justify treatment decisions for real-world patients is the current state of the art. It relies on the assumption that average treatment effects from the trial can be extrapolated to patients with personal and/or disease characteristics different from those treated in the trial. Yet, because of heterogeneity of treatment effects between patients and between the trial population and real-world patients, this assumption may not be correct for many patients. Using machine learning to estimate the expected conditional average treatment effect (CATE) in individual patients from observational data offers the potential for more accurate estimation of the expected treatment effects in each patient based on their observed characteristics. In this review, we discuss some of the challenges and opportunities for machine learning to estimate CATE, including ensuring identification assumptions are met, managing covariate shift, and learning without access to the true label of interest. We also discuss the potential applications as well as future work and collaborations needed to further improve identification and utilization of CATE estimates to increase patient benefit.

StudyModerate

Principles of confounder selection

Tyler J. VanderWeele · European Journal of Epidemiology · 2019 · 1,447 citations

Selecting an appropriate set of confounders for which to control is critical for reliable causal inference. Recent theoretical and methodological developments have helped clarify a number of principles of confounder selection. When complete knowledge of a causal diagram relating all covariates to each other is available, graphical rules can be used to make decisions about covariate control. Unfortunately, such complete knowledge is often unavailable. This paper puts forward a practical approach to confounder selection decisions when the somewhat less stringent assumption is made that knowledge is available for each covariate whether it is a cause of the exposure, and whether it is a cause of the outcome. Based on recent theoretically justified developments in the causal inference literature, the following proposal is made for covariate control decisions: control for each covariate that is a cause of the exposure, or of the outcome, or of both; exclude from this set any variable known to be an instrumental variable; and include as a covariate any proxy for an unmeasured variable that is a common cause of both the exposure and the outcome. Various principles of confounder selection are then further related to statistical covariate selection methods.

StudyModerate

What Do We Learn from the Weather? The New Climate-Economy Literature

Melissa Dell, Benjamin F. Jones, Benjamin Olken · Journal of Economic Literature · 2014 · 2,196 citations

A rapidly growing body of research applies panel methods to examine how temperature, precipitation, and windstorms influence economic outcomes. These studies focus on changes in weather realizations over time within a given spatial area and demonstrate impacts on agricultural output, industrial output, labor productivity, energy demand, health, conflict, and economic growth, among other outcomes. By harnessing exogenous variation over time within a given spatial unit, these studies help credibly identify (i) the breadth of channels linking weather and the economy, (ii) heterogeneous treatment effects across different types of locations, and (iii) nonlinear effects of weather variables. This paper reviews the new literature with two purposes. First, we summarize recent work, providing a guide to its methodologies, datasets, and findings. Second, we consider applications of the new literature, including insights for the “damage function” within models that seek to assess the potential economic effects of future climate change. (JEL C51, D72, O13, Q51, Q54)

StudyModerate

A review of machine learning applications in wildfire science and management

Piyush Jain, Sean C.P. Coogan, Sriram Ganapathi Subramanian +3 more · Environmental Reviews · 2020 · 680 citations

Artificial intelligence has been applied in wildfire science and management since the 1990s, with early applications including neural networks and expert systems. Since then, the field has rapidly progressed congruently with the wide adoption of machine learning (ML) methods in the environmental sciences. Here, we present a scoping review of ML applications in wildfire science and management. Our overall objective is to improve awareness of ML methods among wildfire researchers and managers, as well as illustrate the diverse and challenging range of problems in wildfire science available to ML data scientists. To that end, we first present an overview of popular ML approaches used in wildfire science to date and then review the use of ML in wildfire science as broadly categorized into six problem domains, including (i) fuels characterization, fire detection, and mapping; (ii) fire weather and climate change; (iii) fire occurrence, susceptibility, and risk; (iv) fire behavior prediction; (v) fire effects; and (vi) fire management. Furthermore, we discuss the advantages and limitations of various ML approaches relating to data size, computational requirements, generalizability, and interpretability, as well as identify opportunities for future advances in the science and management of wildfires within a data science context. In total, to the end of 2019, we identified 300 relevant publications in which the most frequently used ML methods across problem domains included random forests, MaxEnt, artificial neural networks, decision trees, support vector machines, and genetic algorithms. As such, there exists opportunities to apply more current ML methods — including deep learning and agent-based learning — in the wildfire sciences, especially in instances involving very large multivariate datasets. We must recognize, however, that despite the ability of ML models to learn on their own, expertise in wildfire science is necessary to ensure realistic modelling of fire processes across multiple scales, while the complexity of some ML methods such as deep learning requires a dedicated and sophisticated knowledge of their application. Finally, we stress that the wildfire research and management communities play an active role in providing relevant, high-quality, and freely available wildfire data for use by practitioners of ML methods.

RCTHigh evidence score

Machine learning approaches to evaluate heterogeneous treatment effects in randomized controlled trials: a scoping review

Kosuke Inoue, Motohiko Adomi, Orestis Efthimiou +7 more · Journal of Clinical Epidemiology · 2024 · 28 citations

RCTHigh evidence score

From Sample Average Treatment Effect to Population Average Treatment Effect on the Treated: Combining Experimental with Observational Studies to Estimate Population Treatment Effects

Erin Hartman, Richard Grieve, Roland R. Ramsahai +1 more · Journal of the Royal Statistical Society Series A (Statistics in Society) · 2015 · 152 citations

Summary Randomized controlled trials (RCTs) can provide unbiased estimates of sample average treatment effects. However, a common concern is that RCTs may fail to provide unbiased estimates of population average treatment effects. We derive the assumptions that are required to identify population average treatment effects from RCTs. We provide placebo tests, which formally follow from the identifying assumptions and can assess whether they hold. We offer new research designs for estimating population effects that use non-randomized studies to adjust the RCT data. This approach is considered in a cost-effectiveness analysis of a clinical intervention: pulmonary artery catheterization.

StudyModerate

Recent Developments in the Econometrics of Program Evaluation

Guido W. Imbens, Jeffrey M. Wooldridge · Journal of Economic Literature · 2009 · 4,835 citations

Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades, much research has been done on the econometric and statistical analysis of such causal effects. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization, and other areas of empirical microeconomics. In this review, we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research.

Meta-analysisHigh evidence score

Network meta‐analysis of individual and aggregate level data

Jeroen P. Jansen · Research Synthesis Methods · 2012 · 94 citations

Network meta-analysis is often performed with aggregate-level data (AgD). A challenge in using AgD is that the association between a patient-level covariate and treatment effects at the study level may not reflect the individual-level effect modification. In this paper, non-linear network meta-analysis models for combining individual patient data (IPD) and AgD are presented to reduce bias and uncertainty of direct and indirect treatment effects in the presence of heterogeneity. The first method uses the same model form for IPD and AgD. With the second method, the model for AgD is obtained by integrating an underlying IPD model over the joint within-study distribution of covariates, in line with the method by Jackson et al. for ecological inferences. With simulated examples, the models are illustrated. Having IPD for a subset of studies improves estimation of treatment effects in the presence of patient-level heterogeneity. Of the two proposed non-linear models for combining IPD and AgD, the second seems less affected by bias in situations with large treatment-by-patient-level-covariate interactions, probably at the cost of greater uncertainty. Additional studies are needed to better understand when one model is favorable over the other. For network meta-analysis, it is recommended to use IPD when available. Copyright © 2012 John Wiley & Sons, Ltd.

StudyModerate

EFFECTS OF BIODIVERSITY ON ECOSYSTEM FUNCTIONING: A CONSENSUS OF CURRENT KNOWLEDGE

David U. Hooper, F. Stuart Chapin, John J. Ewel +12 more · Ecological Monographs · 2005 · 7,899 citations

Humans are altering the composition of biological communities through a variety of activities that increase rates of species invasions and species extinctions, at all scales, from local to global. These changes in components of the Earth's biodiversity cause concern for ethical and aesthetic reasons, but they also have a strong potential to alter ecosystem properties and the goods and services they provide to humanity. Ecological experiments, observations, and theoretical developments show that ecosystem properties depend greatly on biodiversity in terms of the functional characteristics of organisms present in the ecosystem and the distribution and abundance of those organisms over space and time. Species effects act in concert with the effects of climate, resource availability, and disturbance regimes in influencing ecosystem properties. Human activities can modify all of the above factors; here we focus on modification of these biotic controls. The scientific community has come to a broad consensus on many aspects of the relationship between biodiversity and ecosystem functioning, including many points relevant to management of ecosystems. Further progress will require integration of knowledge about biotic and abiotic controls on ecosystem properties, how ecological communities are structured, and the forces driving species extinctions and invasions. To strengthen links to policy and management, we also need to integrate our ecological knowledge with understanding of the social and economic constraints of potential management practices. Understanding this complexity, while taking strong steps to minimize current losses of species, is necessary for responsible management of Earth's ecosystems and the diverse biota they contain. Based on our review of the scientific literature, we are certain of the following conclusions: 1) Species' functional characteristics strongly influence ecosystem properties. Functional characteristics operate in a variety of contexts, including effects of dominant species, keystone species, ecological engineers, and interactions among species (e.g., competition, facilitation, mutualism, disease, and predation). Relative abundance alone is not always a good predictor of the ecosystem-level importance of a species, as even relatively rare species (e.g., a keystone predator) can strongly influence pathways of energy and material flows. 2) Alteration of biota in ecosystems via species invasions and extinctions caused by human activities has altered ecosystem goods and services in many well-documented cases. Many of these changes are difficult, expensive, or impossible to reverse or fix with technological solutions. 3) The effects of species loss or changes in composition, and the mechanisms by which the effects manifest themselves, can differ among ecosystem properties, ecosystem types, and pathways of potential community change. 4) Some ecosystem properties are initially insensitive to species loss because (a) ecosystems may have multiple species that carry out similar functional roles, (b) some species may contribute relatively little to ecosystem properties, or (c) properties may be primarily controlled by abiotic environmental conditions. 5) More species are needed to insure a stable supply of ecosystem goods and services as spatial and temporal variability increases, which typically occurs as longer time periods and larger areas are considered. We have high confidence in the following conclusions: 1) Certain combinations of species are complementary in their patterns of resource use and can increase average rates of productivity and nutrient retention. At the same time, environmental conditions can influence the importance of complementarity in structuring communities. Identification of which and how many species act in a complementary way in complex communities is just beginning. 2) Susceptibility to invasion by exotic species is strongly influenced by species composition and, under similar environmental conditions, generally decreases with increasing species richness. However, several other factors, such as propagule pressure, disturbance regime, and resource availability also strongly influence invasion success and often override effects of species richness in comparisons across different sites or ecosystems. 3) Having a range of species that respond differently to different environmental perturbations can stabilize ecosystem process rates in response to disturbances and variation in abiotic conditions. Using practices that maintain a diversity of organisms of different functional effect and functional response types will help preserve a range of management options. Uncertainties remain and further research is necessary in the following areas: 1) Further resolution of the relationships among taxonomic diversity, functional diversity, and community structure is important for identifying mechanisms of biodiversity effects. 2) Multiple trophic levels are common to ecosystems but have been understudied in biodiversity/ecosystem functioning research. The response of ecosystem properties to varying composition and diversity of consumer organisms is much more complex than responses seen in experiments that vary only the diversity of primary producers. 3) Theoretical work on stability has outpaced experimental work, especially field research. We need long-term experiments to be able to assess temporal stability, as well as experimental perturbations to assess response to and recovery from a variety of disturbances. Design and analysis of such experiments must account for several factors that covary with species diversity. 4) Because biodiversity both responds to and influences ecosystem properties, understanding the feedbacks involved is necessary to integrate results from experimental communities with patterns seen at broader scales. Likely patterns of extinction and invasion need to be linked to different drivers of global change, the forces that structure communities, and controls on ecosystem properties for the development of effective management and conservation strategies. 5) This paper focuses primarily on terrestrial systems, with some coverage of freshwater systems, because that is where most empirical and theoretical study has focused. While the fundamental principles described here should apply to marine systems, further study of that realm is necessary. Despite some uncertainties about the mechanisms and circumstances under which diversity influences ecosystem properties, incorporating diversity effects into policy and management is essential, especially in making decisions involving large temporal and spatial scales. Sacrificing those aspects of ecosystems that are difficult or impossible to reconstruct, such as diversity, simply because we are not yet certain about the extent and mechanisms by which they affect ecosystem properties, will restrict future management options even further. It is incumbent upon ecologists to communicate this need, and the values that can derive from such a perspective, to those charged with economic and policy decision-making.