← All topics

Causal Estimation

Propensity scores, matching, IV, DID, RDD, synthetic control, and doubly-robust estimators.

Evidence briefs

Reviewed claims

Claim-level summaries connect a practical takeaway to the papers that actually support it.

High confidencePublished

Differences-in-differences (DiD) positive Bias in treatment effect estimates

DiD removes bias from time-invariant unobserved confounders by comparing changes over time between treated and control groups, whereas naive comparisons confound time trends with treatment effects.

Population: Panel data settings with a treated and untreated group · Comparator: Simple before-after or cross-sectional comparisons

Primary evidence

Mostly Harmless Econometrics: An Empiricist's Companion

DiD removes bias from time-invariant unobserved confounders by comparing changes over time between treated and control groups, whereas naive comparisons confound time trends with treatment effects.

High confidencePublished

Regression discontinuity (RD) positive Causal effect estimate validity

RD provides credible causal estimates by comparing outcomes just above and below the cutoff, mimicking a local randomized experiment, whereas naive comparisons suffer from confounding due to the assignment rule.

Population: Settings where treatment is assigned by a cutoff on a continuous variable · Comparator: Naive comparison of treated and untreated units far from cutoff

Primary evidence

Mostly Harmless Econometrics: An Empiricist's Companion

RD provides credible causal estimates by comparing outcomes just above and below the cutoff, mimicking a local randomized experiment, whereas naive comparisons suffer from confounding due to the assignment rule.

High confidencePublished

Difference-in-Differences with multiple time periods (new estimator) positive Bias in treatment effect estimates

The proposed estimator avoids bias from using already-treated units as controls, which occurs in standard DiD when treatment timing varies. The bias arises because treatment effects may change over time, and the new method constructs valid comparison groups of not-yet-treated units.

Population: Panel data settings with staggered treatment adoption (groups treated at different times) and multiple time periods · Comparator: Standard Difference-in-Differences (two-period, two-group) or older methods that compare treated vs untreated across all periods

Primary evidence

Difference-in-Differences with multiple time periods

The proposed estimator avoids bias from using already-treated units as controls, which occurs in standard DiD when treatment timing varies. The bias arises because treatment effects may change over time, and the new method constructs valid comparison groups of not-yet-treated units.

High confidencePublished

Two-step estimation strategy (group-time average treatment effects) positive Validity of parallel trends assumption and effect estimate consistency

The two-step approach allows parallel trends to hold conditional on covariates and does not impose constant treatment effects over time, unlike two-way fixed effects which can produce weighted averages of treatment effects that may be negative even if all individual effects are positive.

Population: Staggered adoption designs with covariates · Comparator: Single-step regression with unit and time fixed effects (two-way fixed effects)

Primary evidence

Difference-in-Differences with multiple time periods

The two-step approach allows parallel trends to hold conditional on covariates and does not impose constant treatment effects over time, unlike two-way fixed effects which can produce weighted averages of treatment effects that may be negative even if all individual effects are positive.

High confidencePublished

Differences-in-differences (DiD) positive Causal effect estimate validity under parallel trends

DiD removes bias from time-invariant unobserved confounders by comparing changes over time between treated and untreated groups, yielding credible causal estimates when parallel trends hold.

Population: Panel data settings with treatment and control groups · Comparator: Simple before-after or cross-sectional comparisons

Primary evidence

Mostly Harmless Econometrics: An Empiricist's Companion

DiD removes bias from time-invariant unobserved confounders by comparing changes over time between treated and untreated groups, yielding credible causal estimates when parallel trends hold.

High confidencePublished

Regression discontinuity (RD) positive Causal effect estimate validity near the cutoff

RD provides unbiased causal estimates for units near the cutoff by exploiting the discontinuity in treatment assignment, mimicking a local randomized experiment.

Population: Settings where treatment is assigned by a cutoff on a continuous variable · Comparator: Global regression or naive comparison of means

Primary evidence

Mostly Harmless Econometrics: An Empiricist's Companion

RD provides unbiased causal estimates for units near the cutoff by exploiting the discontinuity in treatment assignment, mimicking a local randomized experiment.

Evidence base

Min quality:

50 papers

StudyPreprintWikiCanonicalModerate

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Stefan Wager, Susan Athey · 2015

Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Read the breakdown →
StudyPreprintWikiCanonicalModerate

Double/Debiased Machine Learning for Treatment and Causal Parameters

Victor Chernozhukov, Denis Chetverikov, Mert Demirer +4 more · 2016 · 154 citations

Most modern supervised statistical/machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly due to the regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an orthogonal score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The score is then used to build a de-biased estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a "double ML" method because it relies on estimating primary and auxiliary predictive models. In order to avoid overfitting, our construction also makes use of the K-fold sample splitting, which we call cross-fitting. This allows us to use a very broad set of ML predictive methods in solving the auxiliary and main prediction problems, such as random forest, lasso, ridge, deep neural nets, boosted trees, as well as various hybrids and aggregators of these methods.

Read the breakdown →
StudyPreprintWikiModerate

CausalGuard: Conformal Inference under Graph Uncertainty

Vikash Singh, Weicong Chen, Debargha Ganguly +12 more · 2026

Estimating treatment effects from observational data requires choosing an adjustment set, but valid adjustment depends on an unknown causal graph. Graph misspecification can cause under-coverage, while graph-agnostic conformal wrappers may regain nominal coverage only through large padding. We introduce CausalGuard, a structure-weighted conformal framework that calibrates after aggregating graph-conditional doubly robust pseudo-outcomes. Candidate DAGs are proposed from an LLM-derived edge prior, pruned by conditional-independence tests, and reweighted by Bayesian Information Criterion. A composite nonconformity score then calibrates the posterior-weighted pseudo-outcome. CausalGuard provides distribution-free finite-sample marginal coverage for this aggregated pseudo-outcome; under causal identification, overlap, conditional-mean nuisance stability, and concentration on target-aligned valid adjustment strategies, its conditional mean converges to the true Conditional Average Treatment Effect. Across five benchmarks, CausalGuard attains mean coverage above the nominal 90% level for the directly evaluable target and reduces width when graph-agnostic conformal baselines require large padding. Stress tests show that CausalGuard suppresses invalid collider adjustment and remains stable under misspecified priors when the retained candidate set is data-supported.

Read the breakdown →
RCTPreprintWikiModerate

Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments

Jonas Schweisthal, Dennis Frauen, Mihaela van der Schaar +1 more · 2024

Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine. Here, we focus on the widespread setting where the observational data come from multiple environments, such as different hospitals, physicians, or countries. Furthermore, we allow for violations of standard causal assumptions, namely, overlap within the environments and unconfoundedness. To this end, we move away from point identification and focus on partial identification. Specifically, we show that current assumptions from the literature on multiple environments allow us to interpret the environment as an instrumental variable (IV). This allows us to adapt bounds from the IV literature for partial identification of CATE by leveraging treatment assignment mechanisms across environments. Then, we propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models. We further demonstrate the effectiveness of our meta-learners across various experiments using both simulated and real-world data. Finally, we discuss the applicability of our meta-learners to partial identification in instrumental variable settings, such as randomized controlled trials with non-compliance.

Read the breakdown →
StudyPreprintWikiModerate

Individualized Causal Effects under Network Interference with Combinatorial Treatments

Yunping Lu, Haoang Chi, Qirui Hu +1 more · 2026

Modern causal decision-making increasingly demands individualized treatment-effect estimation in networks where interventions are high-dimensional, combinatorial vectors. While network interference, effect heterogeneity, and multi-dimensional treatments have been studied separately, their intersection yields an exponentially large intervention space that makes standard identification tools and low-dimensional exposure mappings untenable. We bridge this gap with a unified framework that constructs a \emph{global potential-outcome emulator} for unit-level inference. Our method combines (1) rooted network configurations to leverage local smoothness, (2) doubly robust orthogonalization to mitigate confounding from network position and covariates, and (3) sparse spectral learning to efficiently estimate response surfaces over the $2^p$-dimensional treatment space. We also decompose networked effects into own-treatment, structural, and interaction components, and provide finite-sample error bounds and asymptotic consistency guarantees. Overall, we show that individualized causal inference remains feasible in high-dimensional networked settings without collapsing the intervention space.

Read the breakdown →
StudyPreprintWikiModerate

Causal EpiNets: Precision-corrected Bounds on Individual Treatment Effects using Epistemic Neural Networks

Gandharv Patil, Keyi Tang, Raquel Aoki +1 more · 2026

Individual treatment effects are not point-identified from data. The Probability of Necessity and Sufficiency (PNS) circumvents this limitation by characterizing individual-level causality through intersection bounds derived from combined experimental and observational data. In finite samples, however, standard plug-in estimators systematically fail: they violate structural probability constraints and suffer from extremum bias induced by max-min operators, yielding spuriously narrow intervals. We propose a neural framework for finite-sample PNS estimation that resolves both pathologies. We introduce an anchored neural architecture that guarantees structural constraint satisfaction by construction. To correct extremum bias, we employ precision-corrected intersection-bound inference, leveraging Epistemic Neural Networks for scalable, high-dimensional uncertainty quantification. Empirical evaluations confirm that this approach maintains nominal coverage and exact constraint validity in high-dimensional regimes where standard estimators systematically undercover.

Read the breakdown →
StudyPreprintWikiModerate

Causal Discovery in Structural VAR Models Under Equal Noise Variance

SeyedSina Seyedi HasanAbadi, Fahimeh Arab, Erfan Nozari +1 more · 2026 · 5 citations

Causal discovery from multivariate time series is challenging when causal effects may occur both across time and within the same sampling interval. This issue is especially important in applications such as neuroscience, where the sampling rate may be coarse relative to the underlying dynamics and contemporaneous effects need not form an acyclic graph. We study causal discovery in linear Gaussian structural VAR models under an equal noise variance assumption, meaning that the structural noise terms have a common variance. Unlike the DAG-based cross-sectional equal noise variance setting, the time-series setting considered here does not generally yield point identification of a unique causal graph. Instead, multiple structural VAR parameterizations can induce the same stationary observed process law. We introduce a notion of observational equivalence tailored to this setting and show that the corresponding equivalence class is characterized by orthogonal transformations of the structural equations together with a global positive scale. This characterization leads to an equivalence-aware model discrepancy, the observational alignment discrepancy, which compares structural models modulo transformations that preserve the observed law. Building on this theory, we propose ENVAR, a sparsity-based procedure that searches over the induced observational equivalence class for a sparse normalized structural representative. We evaluate the proposed methodology on synthetic structural VAR data and on an fMRI dataset.

Read the breakdown →
StudyPreprintWikiModerate

Causal Identification under Interference: The Role of Treatment Assignment Independence

Julius Owusu, Monika Avila Márquez · 2026 · 0 citations

Empirical researchers routinely invoke the no-interference or \textit{individualistic treatment response} (ITR) assumption to identify causal effects in observational studies, despite concerns that interference across units may arise in many economic settings. This paper studies the causal content of standard ITR-based identification formulas when arbitrary interference is present. We show that, under restrictions on dependence between treatment assignments across units, conventional ITR-based identification formulas -- including those underlying selection-on-observables, instrumental variables, regression discontinuity designs, and difference-in-differences -- identify well-defined causal objects: types of \textit{average direct effects} (ADEs). These results do not require knowledge of the interference structure or specification of exposure mappings. We also propose a sensitivity analysis framework that quantifies the robustness of statistical inference to violations of treatment-assignment independence under arbitrary interference.

Read the breakdown →
StudyPreprintWikiModerate

Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification

Apu Chandra Das, Sakib Salam, Md Robiul Islam Talukder +3 more · 2026

Propensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.

Read the breakdown →
StudyPreprintWikiModerate

Sensitivity analysis for causal mediation: bridge score, sharp sensitivity bounds, and calibration

Yuki Ohnishi, Fan Li · 2026

Causal mediation analysis decomposes the total treatment effect into a portion operating through a hypothesized mediator and a residual direct portion. Identification of natural direct and indirect effects typically rests on the mediator stage of sequential ignorability, which cannot be empirically verified and requires explicit sensitivity analysis. We introduce the \emph{bridge score}, a low-dimensional vector formed from the two treatment-specific mediator densities at a common mediator value, and show that it is a balancing score for the mediator stage of sequential ignorability. Conditional on the bridge score, we then derive a sharp pointwise envelope on the unidentified mediator-outcome confounding function in terms of two interpretable latent confounding parameters. To make the bound operational for sensitivity analysis, we further introduce two calibration approaches. The first is benchmark calibration against an observed covariate, including a rank-based version that is invariant to monotone re-expressions of the benchmark; the second is residual budget calibration based on residual outcome variation. Finally, we show how the pointwise bound can be operationalized for inference through a scalar functional reduction and a Bayesian g-computation algorithm that propagates all sources of uncertainty into posterior draws of the mediation effect estimates.

Read the breakdown →
StudyPreprintWikiModerate

A formal approach to variable selection in difference-in-differences

Daniela Rodrigues, Laura A. Hatfield · 2026 · 0 citations

Difference-in-differences (DiD) identification relies mainly on a parallel trends assumption about untreated potential outcomes. Researchers often relax this assumption by assuming conditional parallel trends within units with the same covariate values. However, the process of selecting which covariates to include in this assumption is often \emph{ad hoc}. We propose a formal approach to select the variables that support conditional parallel trends based on graphical criteria. We show that the parallel trends assumption is rarely justified without conditioning on covariates, and that unconditional and conditional parallel trends can conflict with one another. We also demonstrate that a time-invariant covariate with a time-invariant effect on the outcome, which might not ordinarily be considered a confounder in DiD, may be a useful conditioning variable. We clarify that adjustment for a post-treatment covariate depends on what causes that covariate to change. Extending our framework to multiple time periods, we distinguish between treatment type and rollout strategy and examine the problem of treatment-confounder feedback. On the estimation side, we argue that the difficulty of incorporating covariates in DiD, often framed as an estimator problem, is more accurately understood as a misalignment between the adjustment set used by the estimator and the adjustment set required for identification. This misalignment affects several popular estimation procedures, and resolving it requires not a change of estimator, but a change in how covariates enter the estimation procedure. We show how to achieve this alignment for all estimators we evaluate.

Read the breakdown →
StudyPreprintWikiModerate

Low-rank Covariate Balancing Estimators under Interference

Souhardya Sengupta, Kosuke Imai, Georgia Papadogeorgou · 2025 · 1 citations

A key methodological challenge in observational studies with interference between units is twofold: (1) each unit's outcome may depend on many others' treatments, and (2) treatment assignments may exhibit complex dependencies across units. We develop a general statistical framework for constructing robust causal effect estimators to address these challenges. We first show that, without restricting the patterns of interference, the standard inverse probability weighting (IPW) estimator is the only uniformly unbiased estimator when the propensity score is known. In contrast, no estimator has such a property if the propensity score is unknown. We then introduce a \emph{low-rank structure} of potential outcomes as a broad class of structural assumptions about interference. This framework encompasses common assumptions such as anonymous, nearest-neighbor, and additive interference, while flexibly allowing for more complex study-specific interference assumptions. Under this low-rank assumption, we show how to construct an unbiased weighting estimator for a large class of causal estimands. The proposed weighting estimator does not require knowledge of true propensity scores and is therefore robust to unknown treatment assignment dependencies that often exist in observational studies. If the true propensity score is known, we can obtain an unbiased estimator that is more efficient than the IPW estimator by leveraging a low-rank structure. We establish the finite sample and asymptotic properties of the proposed weighting estimator, develop a data-driven procedure to select among candidate low-rank structures, and validate our approach through simulation and empirical studies.

Read the breakdown →
StudyPreprintWikiModerate

Evaluating causal indirect effects when mediators are left-censored by assay limit of quantification

Cong Jiang, Michael D. Hughes, Nima S. Hejazi · 2026

Causal mediation analysis is essential for disentangling the mechanisms by which investigational therapeutic and preventive agents impact clinical outcomes. However, the measurement of biological mediators is often subject to left-censoring by technical measurement limitations, most commonly an assay's limit of quantification. This form of censoring can pose severe challenges for both identification and estimation of causal mediation estimands, particularly when the censoring mechanism is deterministic and the resulting missingness is missing not at random (MNAR) or nonignorable. Motivated by the question of assessing the role of viral RNA in the action mechanism of monoclonal antibody therapies for COVID-19 in the Accelerating COVID-19 Therapeutics and Vaccine (ACTIV)-2 platform trial, we develop a semi-parametric framework for estimation of the natural direct and indirect effects when the mediator of interest is partially subject to this form of left-censoring. Our proposed strategy combines fractional imputation with a semi-parametric EM algorithm to flexibly estimate key components of the factorized data likelihood. Applying the proposed strategy to circumvent the left-censoring, we discuss both traditional plug-in and asymptotically efficient estimators of the direct and indirect effect estimands, introducing a data-adaptive $m$-out-of-$n$ bootstrap for robust inference under the imputation procedure. We demonstrate in numerical experiments that our approach significantly reduces bias and allows for reliable inference. An application to data from the ACTIV-2 platform trial confirms that monoclonal antibody therapies reduce the risk of hospitalization and death due to COVID-19, while suggesting that changes in viral RNA mediate only a modest proportion of the overall treatment effect.

Read the breakdown →
StudyPreprintWikiModerate

Conformal Convolution and Monte Carlo Meta-learners for Predictive Inference of Individual Treatment Effects

Jef Jonkers, Jarne Verhaeghe, Glenn Van Wallendael +2 more · 2024 · 6 citations

Generating probabilistic forecasts of potential outcomes and individual treatment effects (ITE) is essential for risk-aware decision-making in domains such as healthcare, policy, marketing, and finance. We propose two novel methods: the conformal convolution T-learner (CCT) and the conformal Monte Carlo (CMC) meta-learner, that generate full predictive distributions of both potential outcomes and ITEs. Our approaches combine weighted conformal predictive systems with either analytic convolution of potential outcome distributions or Monte Carlo sampling, addressing covariate shift through propensity score weighting. In contrast to other approaches that allow the generation of potential outcome predictive distributions, our approaches are model agnostic, universal, and come with finite-sample guarantees of probabilistic calibration under knowledge of the propensity score. Regarding estimating the ITE distribution, we formally characterize how assumptions about potential outcomes' noise dependency impact distribution validity and establish universal consistency under independence noise assumptions. Experiments on synthetic and semi-synthetic datasets demonstrate that the proposed methods achieve probabilistically calibrated predictive distributions while maintaining narrow prediction intervals and having performant continuous ranked probability scores. Besides probabilistic forecasting performance, we observe significant efficiency gains for the CCT- and CMC meta-learners compared to other conformal approaches that produce prediction intervals for ITE with coverage guarantees.

Read the breakdown →
StudyPreprintWikiModerate

Quasi-Bayesian Local Projection Instrumental-Variables Method: Application to Renewable Energy and Electricity Prices

Masahiro Tanaka · 2026 · 0 citations

This paper introduces a quasi-Bayesian approach for local projection instrumental-variables (LP-IV) estimation. It builds a moment-based quasi-posterior using the generalized method of moments (GMM) objective and applies a roughness-penalty prior to smooth impulse responses over different horizons. The approach maintains the key first-order features of traditional LP-IV methods, while enhancing stability in finite samples and allowing for joint inference through simultaneous bands. Simulations indicate that this regularization decreases root mean squared error compared to standard GMM, especially at medium and longer horizons. An application to Danish electricity markets highlights the method's practical usefulness.

Read the breakdown →
ObservationalPreprintWikiModerate

Targeted maximum likelihood estimation of vaccine effectiveness and immune correlates in test-negative design studies with missing data

Leah I. B. Andrews, Lars van der Laan, Peter B. Gilbert · 2026

The test-negative design (TND) is a resource-efficient observational study design that can assess vaccine effectiveness and exposure-proximal immune correlates of disease. The TND enrolls symptomatic individuals seeking diagnostic testing and compares case status by an exposure variable, such as vaccination status or immune marker level, that is measured at testing. While the TND reduces confounding by healthcare-seeking behavior, other sources of confounding may remain. TND studies may also have missing data in the exposure variable due to incomplete records or two-phase sampling designs. We present a targeted maximum likelihood estimation approach involving a semiparametric logistic regression model that targets a causal conditional risk ratio of symptomatic disease in the healthcare-seeking population. Under causal and missing at random assumptions, our method produces an efficient, asymptotically linear estimator that provides flexible, data-driven confounding control and valid causal inference when analyzing TND studies with missing exposure variable data. We evaluate our method's finite sample properties using plasmode simulations of a two-phase TND immune correlates study. We also apply our method to assess COVID-19 vaccine effectiveness and antibody marker correlates of COVID-19 from TND study cohorts derived from the Moderna Coronavirus Efficacy phase 3 trial.

Read the breakdown →
StudyPreprintWikiModerate

PPI is the Difference Estimator: Recognizing the Survey Sampling Roots of Prediction-Powered Inference

Reagan Mozer · 2026

Prediction-powered inference (PPI) is a rapidly growing framework for combining machine learning predictions with a small set of gold-standard labels to conduct valid statistical inference. In this article, I argue that the core estimators underlying PPI are equivalent to well-established estimators from the survey sampling literature dating back to the 1970s. Specifically, the PPI estimator for a population mean is algebraically equivalent to the difference estimator of Cassel et al. (1976), and PPI plus corresponds to the generalized regression (GREG) estimator of Sarndal et al. (2003). Recognizing this equivalence, I consider what part of PPI is inherited from a long-standing literature in statistics, what part is genuinely new, and where inferential claims require care. After introducing the two frameworks and establishing their equivalence, I break down where PPI diverges from model-assisted estimation, including differences in the mode of inference, the role of the unlabeled data pool, and the consequences of differential prediction error for subgroup estimands such as the average treatment effect. I then identify what each framework offers the other: PPI researchers can draw on the survey sampling literature's well-developed theory of calibration, optimal allocation, and design-based diagnostics, while survey sampling researchers can benefit from PPI's extensions to non-standard estimands and its accessible software ecosystem. The article closes with a call for integration between these two communities, motivated by the growing use of large language models as measurement instruments in applied research.

Read the breakdown →
StudyPreprintWikiModerate

Endogenous Quantile Regression with Measurement Error in Dependent Variable

Xuanjing Su · 2026

This paper studies quantile regression with an endogenous regressor and measurement error in the dependent variable. Standard quantile regression estimators ignoring these two elements can induce substantial bias. We adopt a control-function approach in a triangular system and show that the conditional quantile coefficient functions, together with all other distributional parameters, are nonparametrically identifiable. Building on this constructive identification result, we propose a two-step sieve ML estimator. The first step estimates the control function. The second step performs a sieve likelihood maximization that incorporates the generated control variable through copula weights. When the number of quantile grid knots grows at an appropriate speed, the estimator is consistent and asymptotically normal, permitting inference via bootstrap. Monte Carlo simulations demonstrate that the estimator markedly reduces bias relative to existing methods, confirming its effectiveness in settings with endogeneity and additive measurement error in the outcome.

Read the breakdown →
RCTPreprintWikiModerate

Assessing Estimate of CATE from Observational Data via an RCT Study

Bosen Cui, Yuhong Yang · 2026

Conditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness needs to be assessed, yet observational data alone offer limited opportunities for doing so. We propose CATE Assessment via Fitness Evaluation (CAFE), a formal framework for directly assessing the goodness-of-fit of a CATE estimate learned from observational data, rather than the full underlying outcome model, using evidence from a randomized trial. CAFE partitions the trial covariate space according to estimated propensity scores (or the like) and compares observationally derived conditional treatment effects with group-level experimental averages. The framework accommodates a broad class of CATE learners, including parametric models and flexible machine learning methods such as causal forest and boosting. We establish theoretical guarantees under both the null and alternative hypotheses, and introduce a maximum-type extension to improve sensitivity to localized lack of fit. When both randomized trial and observational data are available, we further develop a two-stage procedure to detect the existence of unobserved confounders. Extensive numerical studies show the utility of the CAFE approach when assessing observational-derived CATE estimates.

Read the breakdown →
StudyPreprintWikiModerate

Difference-in-differences with a mediator

Yuhao Deng, Haoyu Wei, Zhongzhe Ouyang · 2026 · 0 citations

Causal mediation analysis is a powerful tool for disentangling the total effect of a treatment into its direct effect on the outcome and its indirect effect mediated through an intermediate variable. However, in observational studies, confounding between treatment and potential outcomes typically renders the total and natural effects non-identifiable. In this work, we advance mediation analysis within the difference-in-differences framework. Under a mediator-adjusted parallel trends assumption and additional conditions, we demonstrate that natural indirect, direct, and total effects are identifiable in the treated group. We further derive efficient influence functions for these estimands, enabling the construction of multiply robust and nonparametrically efficient estimators. We establish the asymptotic properties of these estimators. Applying our methodology to data from the Job Corps Study, we find that job training significantly increases both short-term and long-term earnings, after controlling for the indirect effect through the proportion of weeks employed.

Read the breakdown →
RCTPreprintWikiModerate

Multi-Study R-Learner for Estimating Heterogeneous Treatment Effects Across Studies Using Statistical Machine Learning

Cathy Shyr, Boyu Ren, Prasad Patil +1 more · 2023 · 3 citations

Estimating heterogeneous treatment effects (HTEs) is crucial for precision medicine. While multiple studies can improve the generalizability of results, leveraging them for estimation is statistically challenging. Existing approaches often assume identical HTEs across studies, but this may be violated due to various sources of between-study heterogeneity, including differences in study design, study populations, and data collection protocols, among others. To this end, we propose a framework for multi-study HTE estimation that accounts for between-study heterogeneity in the nuisance functions and treatment effects. Our approach, the multi-study R-learner, extends the R-learner to obtain principled statistical estimation with machine learning (ML) in the multi-study setting. It involves a data-adaptive objective function that links study-specific treatment effects with nuisance functions through membership probabilities, which enable information to be borrowed across potentially heterogeneous studies. The multi-study R-learner framework can combine data from randomized controlled trials, observational studies, or a combination of both. It's easy to implement and flexible in its ability to incorporate ML for estimating HTEs, nuisance functions, and membership probabilities. In the series estimation framework, we show that the multi-study R-learner is asymptotically normal and more efficient than the R-learner when there is between-study heterogeneity in the propensity score model under homoscedasticity. We illustrate using cancer data that the proposed method performs favorably compared to existing approaches in the presence of between-study heterogeneity.

Read the breakdown →
StudyPreprintWikiModerate

Explainable Outlier Detection for Multivariate Functional Data

Marcus Mayrhofer, Una Radojičić, Horst Lewitschnig +1 more · 2026

This work addresses the challenges of robust covariance estimation and interpretable outlier detection for multivariate functional data with separable covariance structure. We develop a method that simultaneously improves robustness and interpretability in this context by establishing a connection between stochastic processes with separable covariance structures and the corresponding matrix-variate distribution of their basis representations. Leveraging this connection, we employ the recently developed matrix-variate counterpart of the Minimum Covariance Determinant estimator (MMCD) in conjunction with a truncated multivariate functional Mahalanobis semi-distance to robustly estimate mean and covariance for multivariate functional data. For interpretable outlier detection, we generalize multivariate outlier explanations based on Shapley values to decompose overall multivariate functional outlyingness into time-coordinate-specific contributions. Importantly, we reduce the otherwise exponential computational complexity (relative to the number of components) to linear complexity, while retaining the key properties of the Shapley value. This integrated framework combines robust Mahalanobis distances, MMCD estimators, and Shapley value-based outlyingness decomposition to provide a robust and interpretable approach for analyzing multivariate functional data with separable covariance structures. The effectiveness of this approach is demonstrated through both theoretical analysis and practical applications, including simulations and real-world examples.

Read the breakdown →
StudyPreprintWikiModerate

Stable Causal Discovery via Directed Acyclic Graph Aggregation

Yunan Wu, Yue Wang, Chunlin Li +1 more · 2026

Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently yield unstable estimates. We propose DAGgr, a model averaging framework that aggregates multiple candidate DAGs into a single stable representation. Candidate graphs are weighted by their out-of-sample predictive likelihood across repeated data splits, and a thresholding rule on the resulting edge-importance scores guarantees that the aggregated graph is itself acyclic. We establish a finite-sample risk bound, prove that the procedure preserves acyclicity, and show that edge selection is consistent under mild conditions on the weights. Simulations across random, hub, and chain structures, together with an analysis of the Sachs et al. (2005) protein-signaling network, show that DAGgr matches or exceeds the best individual candidate while consistently outperforming bootstrap-aggregation baselines across structural recovery metrics.

Read the breakdown →
StudyPreprintWikiModerate

Difference-Based High-Dimensional Long-Run Covariance Matrix Estimation for Mean-shift Time Series

Yanhong Liu, Fengyi Song, Long Feng · 2026

We consider estimation of high-dimensional long-run covariance matrices for time series with nonconstant means, a setting in which conventional estimators can be severely biased. To address this difficulty, we propose a difference-based initial estimator that is robust to a broad class of mean variations, and combine it with hard thresholding, soft thresholding, and tapering to obtain sparse long-run covariance estimators for high-dimensional data. We derive convergence rates for the resulting estimators under general temporal dependence and time-varying mean structures, showing explicitly how the rates depend on covariance sparsity, mean variation, dimension, and sample size. Numerical experiments show that the proposed methods perform favorably in high dimensions, especially when the mean evolves over time.

Read the breakdown →
StudyPreprintWikiModerate

A Meta-learner for Heterogeneous Effects in Difference-in-Differences

Hui Lan, Haoge Chang, Eleanor Dillon +1 more · 2025 · 3 citations

We address the problem of estimating heterogeneous treatment effects in panel data, adopting the popular Difference-in-Differences (DiD) framework under the conditional parallel trends assumption. We propose a novel doubly robust meta-learner for the Conditional Average Treatment Effect on the Treated (CATT), reducing the estimation to a convex risk minimization problem involving a set of auxiliary models. Our framework allows for the flexible estimation of the CATT, when conditioning on any subset of variables of interest using generic machine learning. Leveraging Neyman orthogonality, our proposed approach is robust to estimation errors in the auxiliary models. As a generalization to our main result, we develop a meta-learning approach for the estimation of general conditional functionals under covariate shift. We also provide an extension to the instrumented DiD setting with non-compliance. Empirical results demonstrate the superiority of our approach over existing baselines.

Read the breakdown →
StudyPreprintWikiModerate

Distribution-free root cause analysis

Rohan Hore, Aaditya Ramdas · 2026

We study distribution-free root cause analysis in multi-stream data, where an evolving underlying system is observed through multiple data streams that may each undergo distributional changes at unknown timepoints. In such settings, the stream exhibiting the earliest change provides a natural starting point for investigating the underlying cause, which we refer to as the root-cause index. Leveraging conformal $p$-values, we propose a novel framework, Conformal Root Cause Analysis (CROC), which constructs finite-sample valid confidence sets for the root-cause index under minimal assumptions: the data streams are independent, and within each stream the pre- and post-change observations are sampled exchangeably from arbitrary and unknown distributions. We further establish a universality property, showing that any distribution-free method for root cause localization can be represented within the CROC framework. In addition, under mild regularity conditions and principled score design, our method yields asymptotically sharp confidence sets that efficiently isolate the root cause. We further extend CROC to efficiently handle cross-stream dependence when present. Extensive simulations demonstrate accurate localization of the root stream, supporting our theoretical guarantees.

Read the breakdown →
StudyPreprintWikiModerate

Covariate Balancing and Riesz Regression Should Be Guided by the Neyman Orthogonal Score in Debiased Machine Learning

Masahiro Kato · 2026

This position paper argues that, in debiased machine learning, balancing functions should be derived from the Neyman orthogonal score, not chosen only as functions of covariates. Covariate balancing is effective when the regression error entering the score can be represented by functions of covariates alone, and it is the natural finite-dimensional approximation for targets such as ATT counterfactual means. For ATE estimation under treatment effect heterogeneity, however, the score error generally contains treatment-specific components because the outcome regression is a function of the full regressor $X=(D,Z)$. In that case, balancing common functions of $Z$ can leave the treatment-specific component unbalanced. We therefore advocate regressor balancing, implemented by Riesz regression with basis functions of $X$, as the general balancing principle for DML. The position is not that covariate balancing is invalid, but that covariate balancing should be understood as the special case that is appropriate when the score-relevant regression error is a function of covariates alone.

Read the breakdown →
StudyPreprintWikiModerate

Estimation of MIDAS Regressions with Errors-in-the-Variables

Sukhbir Kaur, Sukhbir Singh, Kanchan Jain +1 more · 2026

In this paper, a Mixed Data Sampling (MIDAS) model is studied when both low and high frequency variables are contaminated with measurement error. It is shown that the profile likelihood estimator becomes inconsistent in the presence of measurement error. Using the corrected score approach along with profile likelihood approach, a consistent estimator for parameters of MIDAS Measurement Error model is proposed. Small and large sample properties of the estimator are examined by performing a monte carlo simulation study and considering the effect of sample size, number of lags and profiling parameter.

Read the breakdown →
StudyPreprintWikiModerate

Higher-Order Neyman Orthogonality in Moment-Condition Models

Stéphane Bonhomme, Koen Jochmans, Whitney K. Newey +1 more · 2026

We construct moment functions that are Neyman-orthogonal to a chosen order in parametric moment condition models. These moment functions reduce sensitivity to nuisance estimation error and, as such, offer a unified and tractable route to higher-order debiasing in a wide range of econometric models. The number of additional nuisance parameters required by our construction, beyond those already present in the original moment conditions, is independent of the order of orthogonalization and can be reduced to a single scalar if desired.

Read the breakdown →
StudyPreprintWikiModerate

Minimax unbiased estimation for finite populations with bounded outcomes

P. M. Aronow, Patrick Lopatto · 2026

We study design-unbiased estimation of the finite-population total $\sum_{i=1}^N y_i$ when each outcome satisfies known bounds $y_i\in[a_i,b_i]$. For any sampling design with inclusion probabilities $π_i>0$, we prove a sharp lower bound on the worst-case squared error over the rectangular parameter space. This bound is attained if and only if the unit inclusion indicators are pairwise independent, in which case the minimax estimator is the midpoint-differenced Horvitz-Thompson estimator $\sum_{i=1}^N m_i+\sum_{i\in S}(y_i-m_i)/π_i$, with $m_i=(a_i+b_i)/{2}$. We then solve the joint design-and-estimation problem under the constraint $\sum_i π_i\le n$. We find that a minimax strategy samples units independently with probabilities $π_i^\ast=\min(1,c (b_i-a_i))$ where $c>0$ is chosen so that $\sum_i π_i^\ast=n$, and uses the midpoint-differenced estimator. This extends Gabler (1990)'s linear minimax result to the full class of design-unbiased estimators. We also show that the estimator is admissible among unbiased estimators and affine equivariant.

Read the breakdown →
StudyPreprintWikiModerate

Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients

Raphaël Langevin · 2026

Decision-makers frequently must choose a single action from a finite set of alternatives -- for example, physicians selecting a treatment, investors choosing a portfolio risk level, or judges determining sentences. To improve outcomes, policymakers often issue policy rules or guidelines to inform such choices. In this paper, I show how to generally derive policy rules from observational data in a multi-action framework under relatively weak assumptions about the underlying structure of the heterogeneous sampled population. Conditional average treatment effects (CATEs) are consistently estimated via a weighted K-means algorithm, assuming the outcome model is correctly specified within each homogeneous subgroup. Feasible policy rules are then implemented via a standard decision tree, allowing for both perfect and imperfect adherence to treatment. The methodology is applied to treatment options for Hepatitis C (HCV) among patients co-infected with human immunodeficiency virus (HIV), a setting in which no uniform guideline exists for modern pharmaceutical therapies. The results identify a subgroup of patients with approximately an 80% probability of spontaneous HCV clearance without treatment. Estimation results also show that reallocating treatments among treated individuals could have reduced total treatment costs by CAN$3.6-4.9 million while still increasing aggregate health benefits relative to the status quo. These findings demonstrate that the proposed approach can generate improved, data-driven treatment guidelines for the management of HIV/HCV co-infected patients.

Read the breakdown →
StudyPreprintWikiModerate

Dynamic Linear Panel Regression Models with Interactive Fixed Effects

Hyungsik Roger Moon, Martin Weidner · 2026 · 191 citations

We analyze linear panel regression models with interactive fixed effects and predetermined regressors, for example lagged-dependent variables. The first-order asymptotic theory of the least squares (LS) estimator of the regression coefficients is worked out in the limit where both the cross-sectional dimension and the number of time periods become large. We find two sources of asymptotic bias of the LS estimator: bias due to correlation or heteroscedasticity of the idiosyncratic error term, and bias due to predetermined (as opposed to strictly exogenous) regressors. We provide a bias-corrected LS estimator. We also present bias-corrected versions of the three classical test statistics (Wald, LR, and LM test) and show their asymptotic distribution is a chi-squared distribution. Monte Carlo simulations show the bias correction of the LS estimator and of the test statistics also work well for finite sample sizes.

Read the breakdown →
StudyPreprintWikiModerate

An Old Look at Empirical Bayes

Nicholas G. Polson, Vadim O. Sokolov, Daniel Zantedeschi · 2026 · 59 citations

Dennis Lindley once said that there is only one thing worse than a frequentist, and that is an empirical Bayesian. The quip has the air of caricature, but its technical content is serious: empirical Bayes uses the same data twice, conflates levels of a hierarchy, and produces posterior-shaped summaries whose uncertainty quantification differs from what a fully hierarchical model delivers. David Blei's 2026 IMS Medallion Lecture, "A Fresh Look at Empirical Bayes," revives the program under three new banners: empirical Bayes via probabilistic symmetries (rebranded "Bayesian empirical Bayes"), empirical Bayes with implicit likelihoods through simulation-based inference, and empirical Bayes for combining experimental and observational data through calibration studies. This is a continuation of Blei and Kucukelbir's earlier "population empirical Bayes" (PopEB, 2015). We argue, in the spirit of Lindley, I. J. Good, William DuMouchel, Thomas Louis, and our own recent work with Datta, that Blei's machinery targets inferential objects distinct from the posterior conditional on the realized data, and that the cost of maintaining the full hierarchical discipline has fallen low enough that the computational trade-off no longer favors the shortcut. The case study is the Tweedie formula. Efron's f-modeling empirical Bayes plugs an estimated score function into a posterior-mean identity, but a smoothed score need not arise from any prior. The horseshoe Tweedie formula does. We conclude by recommending that the impressive computational machinery of modern empirical Bayes (variational inference, neural amortization, simulation-based inference) be redeployed in service of properly hierarchical Bayes.

Read the breakdown →
StudyPreprintWikiModerate

CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support

Ricardo Diaz-Rincon, Muxuan Liang, Adolfo Ramirez-Zamora +1 more · 2026

Effective medication management in Parkinson's Disease (PD) is challenging due to heterogeneous disease progression, variable patient response, and medication side effects. While AI models can forecast levodopa equivalent daily dose (LEDD) as a measure of medication needs, standard uncertainty quantification often fails to communicate the reliability of these predictions, treating high and low confidence clinical decisions identically. We introduce CASCADE (Calibrated Adaptive Scaling via Conformal And Distributional Estimation), a novel conformal prediction framework that propagates epistemic uncertainty from a screening classifier to adapt downstream predictions. Unlike standard conformal methods that rely on auxiliary residual regression, we leverage epistemic uncertainty from a primary classification task (identifying whether a medication change is needed) to dynamically scale the prediction intervals of a secondary regression task (predicting how much change). By mapping Venn-Abers multi-probabilistic uncertainty directly to non-conformity scores, our framework achieves continuous risk adaptation. We demonstrate that this ``cascade effect'' produces highly efficient intervals for confident patients (38.9% narrower than standard conformal baselines) while automatically expanding intervals to ensure robust coverage for uncertain cases, bridging the gap between discrete clinical decision-making and continuous dose forecasting in PD.

Read the breakdown →
StudyPreprintWikiModerate

Realized Regularized Regressions

Aleksey Kolokolov, Shifan Yu · 2026

We develop a continuous-time penalized regression framework for the estimation of time-varying coefficients and variable selection when both the response and covariates are Itô semimartingales with jumps. The coefficient paths are approximated by spline basis expansions and estimated via least squares from truncated high-frequency increments. In a finite-dimensional setting, we establish consistency and derive a feasible asymptotic distribution for the integrated coefficient estimator under infill asymptotics. We then extend the framework to high-dimensional settings in which the number of candidate covariates diverges, and show that a group-wise penalized estimator with a truncated $\ell_1$-penalty attains the oracle property, which delivers both consistent model selection and coefficient estimation. An empirical application to a large panel of more than two hundred high-frequency factors documents sparse factor structure across a large cross-section of stocks and industry portfolios.

Read the breakdown →
StudyPreprintWikiModerate

A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning

Honglin Du, Muxuan Liang, Xiang Zhong · 2026 · 1 citations

In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this structural heterogeneity introduces bias and obscures subpopulation-specific dependencies. To address this, we propose Directed Acyclic Graph-based Dependency Clustering via Alternating Direction Method of Multipliers (DAG-DC-ADMM), a unified framework built upon Structural Equation Modeling (SEM) that jointly learns cluster assignments and cluster-specific dependency structures. We encode acyclicity via a smooth constraint and integrate a groupwise truncated Lasso fusion penalty (gTLP) to cluster subjects based on their structural similarity. This yields a nonconvex optimization problem that incorporates sparsity, acyclicity, and structural consensus constraints. We address the nonconvexity by using the augmented Lagrangian method and solve it with an adapted version of the Alternating Direction Method of Multipliers (ADMM) for difference-of-convex programs. For certain graph structures, such as upper triangular adjacency matrices, our algorithm is guaranteed to converge to a Karush-Kuhn-Tucker (KKT) point. Experiments demonstrate that our method recovers cluster-specific causal dependency structures with a high true positive rate and a low false discovery rate. This capability enables the robust discovery of heterogeneous dependencies across subjects where the subpopulation label is unknown.

Read the breakdown →
StudyPreprintWikiModerate

Doubly Robust Estimation of Treatment Effects in Staggered Difference-in-Differences with Time-Varying Covariates

Yuhao Deng, Le Kang · 2026 · 0 citations

The difference-in-differences (DiD) design is a quasi-experimental method for estimating treatment effects. In staggered DiD with multiple treatment groups and periods, estimation based on the two-way fixed effects model yields negative weights when averaging heterogeneous group-period treatment effects into an overall effect. To address this issue, we first define group-period average treatment effects on the treated (ATT), and then define groupwise, periodwise, dynamic, and overall ATTs nonparametrically, so that the estimands are model-free. We propose doubly robust estimators for these types of ATTs in the form of augmented inverse variance weighting (AIVW). The proposed framework allows time-varying covariates that partially explain the time trends in outcomes. Even if part of the working models is misspecified, the proposed estimators still consistently estimate the parameter of interest. The asymptotic variance can be explicitly computed from influence functions. Under a homoskedastic working model, the AIVW estimator is simplified to an augmented inverse probability weighting (AIPW) estimator. We demonstrate the desirable properties of the proposed estimators through simulation and an application that compares the effects of a parallel admission mechanism with immediate admission on the China National College Entrance Examination.

Read the breakdown →
StudyPreprintWikiModerate

Nonparametric Bayesian Policy Learning

Haonan Ye · 2026 · 41 citations

I propose Nonparametric Bayesian Policy Learning (NBPL) as a framework for uncertainty-aware treatment choice. I consider a decision-maker (DM) seeking to select an expected welfare-maximizing treatment rule using observable characteristics. A key observation is that, for a given welfare criterion and policy class, uncertainty about welfare-relevant objects is entirely induced by uncertainty about a reduced-form distribution. I assume the DM places a nonparametric Dirichlet process prior on this reduced-form parameter and uses the resulting posterior to conduct inference on optimal treatment assignments, optimal welfare, and comparisons across policy classes. The NBPL framework is flexible, and its implementation via the Bayesian bootstrap is highly tractable. I establish two main theoretical properties of NBPL. First, posterior welfare regret under NBPL converges at the minimax-optimal rate. Second, posterior model comparison across policy classes is pointwise consistent. I illustrate NBPL in two empirical applications: the bednet subsidy experiment of Bhattacharya and Dupas (2012) and the JTPA experiment studied by Kitagawa and Tetenov (2018).

Read the breakdown →
StudyPreprintWikiModerate

Conditioning Gaussian Processes on Almost Anything

Henry Moss, Lachlan Astfalck, Thomas Cowperthwaite +5 more · 2026

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

Read the breakdown →
StudyPreprintWikiModerate

Recent Advances in Causal Analysis of the Stochastic Frontier Model

Samuele Centorrino, Christopher F. Parmeter · 2026

Causal inference methods (instrumental variables, difference-in-differences, regression discontinuity, etc.) are primary tools used across many social science milieus. One area where their application has lagged however, is in the study of productivity and efficiency. A main reason for this is that the nature of the stochastic frontier model does not immediately lend itself to a causal framework when interest hinges on an error component of the model. This paper reviews the nascent literature on attempts to merge the stochastic frontier literature with causal inference methods. We discuss modeling approaches and empirical issues that are likely to be relevant for applied researchers in this area. This review shows how this model can be easily put within the confines of causal analysis, reviews existing work that has already made inroads in this area, addresses challenges that have yet to be met and discusses core findings.

Read the breakdown →
StudyPreprintWikiModerate

Double/Debiased Machine Learning for Continuous Treatment Effects in Panel Data with Endogeneity

Peikai Wu, Kuan Sun, Zhiguo Xiao · 2026

We propose a double/debiased machine learning framework to estimate average derivative effects in nonparametric panel models with two-way fixed effects. It extends instrumental variable methods to panel settings, handles continuous treatments and various forms of endogeneity, and introduces a cross-fitting scheme to restore independence after eliminating time fixed effects. A penalized GMM debiasing term enables automatic debiased machine learning with endogeneity. Our estimators for contemporaneous, dynamic, and aggregated effects are consistent and asymptotically normal with a valid variance estimator. Simulations show reduced regularization bias and accurate confidence intervals. An application to ECLS-K data reveals rich dynamics in the effect of family SES on childhood BMI.

Read the breakdown →
StudyPreprintWikiModerate

Selecting Informative Conformal Prediction Sets with an Optimized FCR-Controlled Approach

Israela Solomon, Etienne Roquain, Saharon Rosset +1 more · 2026

Conformal methods provide prediction sets for outcomes with confidence guarantees. We study their use in a selective inference setting, where inference is performed only when the prediction set is informative. The analyst may consider as informative, for example, cases with prediction sets that are sufficiently small, exclude null values, or satisfy other appropriate monotone constraints. Because inference is typically restricted to informative cases in practical applications, accounting for the resulting selection bias is crucial to maintaining false coverage rate (FCR) control. A general framework for constructing such informative conformal prediction sets while controlling the FCR on the selected sample was suggested in Gazin et al. (2025). In this work we focus on oracle-guided procedures. We derive the optimal decision policy under a suitable power objective in the oracle setting where the probability of belonging to each prediction set can be computed. In practice, of course, only estimated probabilities are available. We therefore introduce a calibration procedure that adjusts the oracle policy to maintain finite sample FCR control. We show that this approach can achieve substantially higher power than available alternatives. We demonstrate the effectiveness of our new methods for classification outcomes on both real and simulated data.

Read the breakdown →
StudyPreprintWikiModerate

Uncertainty Quantification in Forecast Comparisons

Marc-Oliver Pohle, Tanja Zahn, Sebastian Lerch · 2026 · 0 citations

Skill scores, which measure the relative improvement of a forecasting method over a benchmark via consistent scoring functions and proper scoring rules, are a standard tool in forecast evaluation, yet their sampling uncertainty is rarely rigorously quantified. With modern forecasting applications being increasingly multivariate and involving evaluations across multiple horizons, variables, spatial locations, and forecasting methods, standard tools like the pairwise Diebold-Mariano forecast accuracy test or pointwise confidence intervals fail to account for the multiple comparison problem, leading to inflated Type I error rates and invalid joint inference. To address the lack of a coherent, statistically rigorous framework for quantifying uncertainty across these multi-dimensional evaluation problems, we introduce simultaneous confidence bands for expected scores and skill scores. Our framework provides a versatile tool for joint inference that is applicable to any forecast type from mean and quantile to full distributional forecasts. We develop a bootstrap implementation and show that our bands are valid under multivariate extensions of the classical Diebold-Mariano assumptions. We demonstrate the practical utility of the approach in two case studies by quantifying the benefits of time-varying parameter models for macroeconomic forecasting, and by comparing data-driven and physics-based models in probabilistic weather forecasting.

Read the breakdown →
StudyPreprintWikiModerate

Block-Independent Likelihood Ratio Testing for High-Dimensional Mean Vectors with Applications to Matrix-Variate Data

Minsub Shin, Kwangok Seo, Sang Han Lee +1 more · 2026

Testing the equality of two high-dimensional mean vectors is a fundamental problem in multivariate analysis. While the classical Hotelling's $T^2$ test is optimal in low-dimensional settings, it fails when the dimension $p$ is comparable to or exceeds the sample size $n$. Several extensions, including the Diagonal Likelihood Ratio Test (DLRT), have been proposed under the working independence assumption among variables. However, such an assumption can lead to a substantial loss of power when correlations are present. In this paper, we propose a new test, the Block Independent Likelihood Ratio Test (BILT), which generalizes DLRT by relaxing the working independence assumption to a block independence assumption. We establish its asymptotic normality of the null distribution of the BILT statistic for 'increasing $p$ with small $n$' under mild regularity conditions. We further analyze the asymptotic power of BILT under a local alternatives. Extensive simulation studies show that BILT maintains Type I error control and achieves substantially higher power than DLRT across a wide range of covariance structures. An application to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further demonstrates the application of BILT to testing mean differences between two matrix-variate populations.

Read the breakdown →
StudyPreprintWikiModerate

Inference on Linear Regressions with Two-Way Unobserved Heterogeneity

Hugo Freeman, Dennis Kristensen · 2026

We develop a general estimation and inference procedure for the common parameters in linear panel data regression models with nonparametric two-way specification of unobserved heterogeneity. The procedure takes as input any first-step estimators of the nonparametric regression function and the fixed effects and relies on two key ingredients: First, we develop moment conditions for the common parameters that are Neyman orthogonal with respect to the nonparametric regression function. Second, we employ a novel adjustment of the nonparametric regression estimator so the estimated fixed effects do not generate incidental parameter biases. Together, these ensure that the resulting estimator of the common parameters is root-NT -- asymptotically normally distributed under weak conditions on the estimators of fixed effects and regression function. Next, we propose a novel two-step estimator of the nonparametric regression function and the fixed effects and verify that this particular estimator satisfies the conditions of our general theory. A numerical study shows that the proposed estimators perform well in finite samples.

Read the breakdown →
StudyPreprintWikiModerate

Compensator-Based Inference for Signal Detection Under Unknown Background

Aritra Banerjee, Sara Algeri · 2026

The problem of detecting new signals in the presence of an unknown background is ubiquitous in scientific discoveries and is especially prominent in the physical sciences. Most solutions proposed thus far to address the problem focus on estimating the background distribution and using that estimate to infer the signal. By studying the geometry of the problem, this article demonstrates that estimating the background distribution is somewhat unnecessary for inferring the signal intensity. Instead, it suffices to estimate a single parameter, referred to as the compensator, to account for the incomplete knowledge on the background, substantially simplifying the problem's complexity and enabling proper uncertainty propagation. Such a compensator is shown to govern the conservativeness of the inference, both in the proposed setup and in likelihood-based approaches.

Read the breakdown →
StudyPreprintWikiModerate

Conditional regularized halfspace depth for sparse functional data and its applications

Hyemin Yeon, Xiongtao Dai, Sara Lopez-Pintado · 2026

Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.

Read the breakdown →
StudyPreprintWikiModerate

Linear Regression for Panel With Unknown Number of Factors as Interactive Fixed Effects

Hyungsik Roger Moon, Martin Weidner · 2026 · 280 citations

In this paper we study the least squares (LS) estimator in a linear panel regression model with unknown number of factors appearing as interactive fixed effects. Assuming that the number of factors used in estimation is larger than the true number of factors in the data, we establish the limiting distribution of the LS estimator for the regression coefficients as the number of time periods and the number of cross-sectional units jointly go to infinity. The main result of the paper is that under certain assumptions the limiting distribution of the LS estimator is independent of the number of factors used in the estimation, as long as this number is not underestimated. The important practical implication of this result is that for inference on the regression coefficients one does not necessarily need to estimate the number of interactive fixed effects consistently.

Read the breakdown →
StudyPreprintWikiModerate

Factor-Augmented Panel Regressions and Variance-Weighted Treatment Effects

Artūras Juodis, Martin Weidner · 2026

We revisit panel regressions with unobserved heterogeneity through the lens of variance-weighted average treatment effects. Building on established results for cross-sectional OLS and one-way fixed effects panels, we show that two-way panel estimators with latent factors, specifically the principal components estimator of Greenaway-McGrevy, Han and Sul (2012) and the interactive fixed effects estimator of Bai (2009), also converge to interpretable estimands under fully nonparametric assumptions. Both estimators consistently estimate the same variance-weighted average of unit-time-specific treatment effects, where the weights are proportional to the conditional variance of the regressor given the unobserved heterogeneity. The result requires the number of estimated factors to grow with the sample size and applies to the single regressor case. We discuss the challenges that arise when extending to multiple regressors and to inference.

Read the breakdown →