logo
ResearchBunny Logo
Recurrent individual treatment assignment: a treatment policy approach to account for heterogeneous treatment effects

Psychology

Recurrent individual treatment assignment: a treatment policy approach to account for heterogeneous treatment effects

I. Cornelisz and C. V. Klaveren

This study introduces Recurrent Individual Treatment Assignment (RITA), a groundbreaking method that tackles heterogeneous treatment effects in longitudinal studies by focusing on individual treatment responses. Developed by Ilja Cornelisz and Chris van Klaveren, RITA outperforms traditional strategies in scenarios with unobserved heterogeneity, adapting over time to optimize individual treatment assignments.

00:00
00:00
~3 min • Beginner • English
Introduction
Intervention studies assess the effectiveness of an intervention relative to a control or alternative intervention, with randomization ensuring independence between characteristics and treatment status. Conventional RCTs estimate an average treatment effect (ATE), which is informative about average effectiveness but not about which intervention works best for which individual. In the presence of heterogeneous treatment effects (HTE)—systematic variability in individual treatment effects (ITE)—an unbiased ATE can be uninformative or even misleading for individual assignment. Prior work in education and clinical domains notes the importance of personalization and the limitations of relying solely on ATEs, including risks such as reference class problems, overfitting, and biased predictions for new populations. Observational designs lack internal validity for assignment rules. Longitudinal settings allow learning over time via an explore–exploit trade-off, as in bandit algorithms. Existing frequentist and stochastic (Bayesian/bandit) approaches typically condition on observed covariates, raising concerns about unmeasured confounding and generalization. This paper argues that the objective in longitudinal assignment is not to estimate ITEs or conditional probabilities, but to learn over time which intervention is optimal for each individual using observed treatment responses. The Recurrent Individual Treatment Assignment (RITA) algorithm updates assignment decisions using sequential randomization and observed response variation, thereby accommodating unobserved heterogeneity and avoiding reference class issues. The study simulates multi-period settings to compare RITA against a baseline ATE-based assignment and outlines expectations across environments with varying heterogeneity and ATE patterns. It also discusses applicability in digital education and clinical contexts and previews the simulation design, comparisons, and methods.
Literature Review
Methodology
Design: A simulation study compares RITA to a baseline RCT-derived ATE assignment across 60 periods in a population of 1000 individuals. Initial conditions: P(A)=P(B)=0.5 in period 1; baseline outcome y0 ~ N(10,1). Each period’s observed outcome change includes noise ε ~ N(0,0.1). Four treatment-effect worlds are simulated: - World 1: No heterogeneity; A has constant mean effect 0.8, B has 0.6 (ATE difference 0.2). - World 2: Same means as World 1, with unobserved heterogeneity: ε_Ai ~ N(0.8, 0.4) around mean; ε_Bi ~ N(0.6, 0.3). - World 3: Both unobserved and observed heterogeneity via a grouping variable G∈{0,1} equally distributed: For A, mean components 0.7 + u_A(Li) with group shifts yielding ATEs 0.9 (G=1) and 0.7 (G=0), overall 0.8; for B, 0.5 (G=1) and 0.7 (G=0), overall 0.6. Additional unobserved variance terms as specified (e.g., N(0.7,0.35) plus ε_μ for A; N(0.5,0.25) plus ε_μ for B). - World 4: ATE equivalence; A and B both average 0.8, with heterogeneity such that A is better on average for one group and B for the other (same variance structures as World 3). Baseline model: Conducts an RCT and estimates differential ATE via OLS: y_i = α + α I_Ai + X_i β + ε_i. In longitudinal repeated measures, α_t can be estimated per period. Assignment policy: If differential ATE favors A (e.g., significant positive α), assign all individuals deterministically to A in subsequent periods; when ATEs equal (World 4), assign based on the larger point estimate coefficient. RITA algorithm: A sequential assignment policy using response ranks rather than explicit treatment-effect models. Steps per period t for each individual i: 1) Randomly assign A with probability T_A,t and B with T_B,t=1−T_A,t. 2) Observe treatment response Δy_it = y_it − y_i,t−1. 3) Compute rank(Δy_it) within the period’s distribution of outcome changes (higher rank = larger gain). 4) Update Individual Mean Rank (IMR) separately for A and B, conditional on the cumulative number of times assigned to each treatment. Only the IMR corresponding to the treatment received in period t is updated with the current rank; the other remains unchanged. 5) Update next-period assignment probability using a function of the individual’s current rank, the mean rank of the alternative treatment, and a learning parameter that increases with repeated assignments to the same treatment (increasing weight on relative rank). This drives probabilities toward the intervention showing higher relative effectiveness for the individual. 6) Impose exploration bounds λ_l and λ_u to maintain a minimum exploration rate and avoid probabilities exceeding [0,1]. In simulations, λ_l=0.05 and λ_u=0.95, ensuring persistent exploration to accommodate potential dynamic treatment effects. Initialization: At t=1, IMR_A0 = IMR_B0 = 500 (for 1000 individuals), yielding T_A1 = T_B1 = 0.5. The learning parameter starts at zero influence, increasing with repeated observations for an individual-treatment pair, acting as a drift toward the currently superior intervention but still allowing rapid reallocation if future responses favor the alternative. Evaluation metrics: For each world, compare (a) treatment assignment proportions over time, (b) average cumulative outcomes over periods, and (c) individual cumulative outcomes (ranked by final RITA-based outcome).
Key Findings
- Assignment dynamics: The baseline quickly converges to deterministic assignment to A in Worlds 1–3 due to higher ATE for A, and uses a tie-breaking rule in World 4 where ATEs are equal. RITA maintains probabilistic assignment with adaptation increasing when heterogeneity is present. - Average cumulative outcomes: RITA outperforms the baseline in all worlds with heterogeneity (Worlds 2–4), but underperforms in World 1 (no heterogeneity). In Worlds 2–3, RITA’s average cumulative gains align with the baseline after about 7–8 periods and then surpass it. In World 4 (ATE equivalence with heterogeneity), the average increase in cumulative gains using RITA over the baseline is 11.45 (SD=21.08) over 60 periods. - Individual outcomes and asymmetry: RITA yields broadly higher cumulative outcomes for many individuals in heterogeneous worlds, with small losses for some due to exploration. In World 1, where no heterogeneity exists, individuals experience small median losses with RITA (median −1.00 over 60 periods), reflecting roughly 5 exploratory assignments to the inferior treatment. In World 4, among individuals for whom A is truly better (n=520), the median loss is −0.71 over 60 periods; among those for whom B is truly better (n=480), the median gain is 21.91. This asymmetry—small losses for those who do not benefit from exploration versus large gains for those who do—drives RITA’s superior aggregate performance under heterogeneity. - Learning speed: RITA rapidly updates assignment probabilities and identifies preferable treatments within the first several periods, then continues to explore at a low rate, enabling detection of potential dynamic changes in relative effectiveness.
Discussion
The study addresses the challenge that ATE-based policies can be uninformative for personalized treatment assignment when heterogeneous treatment effects exist. By focusing on observed treatment response rather than explicit ITE estimation, RITA accommodates both observed and unobserved heterogeneity and learns individualized assignment policies over time. Results show that RITA improves average and many individual outcomes in heterogeneous environments, outperforming a standard ATE-based deterministic policy. However, in homogeneous settings without heterogeneity, RITA’s mandated exploration can slightly harm outcomes relative to immediate deterministic assignment to the superior intervention. The explore–exploit balance is central: maintaining exploration supports detection of dynamic changes and unobserved heterogeneity but entails small costs for some individuals. The observed asymmetry—large gains for those matched to their better treatment versus small losses for others—explains net improvements. The findings suggest that longitudinal, iterative assignment policies like RITA are well-suited to digital education and clinical contexts where repeated measurements allow learning optimal individualized assignments, particularly when heterogeneity is substantial and partially unobserved.
Conclusion
This paper introduces RITA, a sequential assignment algorithm that learns individualized treatment policies from longitudinal treatment responses without modeling ITEs directly. In simulations with varying degrees and forms of heterogeneity, RITA outperforms a conventional ATE-based deterministic assignment, except in a purely homogeneous world. Contributions include: (1) reframing longitudinal assignment as a learning problem centered on response ranks, (2) a practical algorithm balancing exploration and exploitation with explicit probability bounds, and (3) evidence that accounting for unobserved heterogeneity can materially improve outcomes. Future research should analyze theoretical properties (asymptotics, convergence, optimality), evaluate RITA on real-world datasets, study tuning of exploration and learning parameters, address potential switching costs (e.g., with calipers to reduce oscillation), and extend to dynamic treatment regimes where individual effects evolve over time. RITA may also help reveal associations between optimal assignments and observed characteristics, informing diagnostic insights.
Limitations
- The simulation parameters (effect sizes, variances, number of periods, population size, bounds λ_l and λ_u) are chosen for illustration and may affect outcomes; external validity to real-world settings remains to be tested. - RITA’s persistent exploration can reduce performance when no heterogeneity or dynamics exist, causing small but nonzero harm relative to deterministic assignment. - Potential oscillating behavior (switching between treatments) may be undesirable in settings with switching costs; mitigation (e.g., calipers) is suggested but not evaluated here. - The study does not provide formal proofs of RITA’s asymptotic properties, convergence, or optimality. - Comparisons to more advanced frequentist or Bayesian models are discussed conceptually; empirical head-to-head performance depends on model tuning, data richness, and the extent of unobserved versus observed heterogeneity. - Identification of which individuals benefit ex ante remains challenging; benefits are shown on average and across many individuals but not guaranteed for all.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny