Psychology
Neuro-computational mechanisms and individual biases in action-outcome learning under moral conflict
L. Fornari, K. Loumpa, et al.
The study addresses how humans learn action–outcome contingencies under moral conflict, where actions that benefit oneself may harm others, and alternatives reduce others’ harm at personal cost. While neural substrates of moral choice with known contingencies are documented, the learning processes and their neural implementation in conflicting self–other outcomes remain unclear. Reinforcement Learning Theory (RLT) posits updating expected values (EVs) via prediction errors (PEs). Key questions include whether self-benefit and other-harm are combined into a single value or tracked as separable expectations; how individual differences in weighting self vs other outcomes are represented computationally; and whether learning signals for others’ outcomes depend on personal preferences. Understanding these mechanisms informs theories of empathy, social decision-making, and how preferences shape learning and neural signals, with implications for prosocial behavior and computational psychiatry.
Prior work shows RLT accounts for learning to benefit oneself and, more recently, to benefit others. The vmPFC encodes current value of multiple outcomes, including social outcomes, and reward circuits encode monetary prediction errors. Empathy-for-pain literature highlights a distributed pain-observation network responsive to others’ painful expressions (e.g., affective vicarious pain signature, AVPS). Theories of motivated empathy propose that self-serving motives can downregulate empathy. However, it is unclear whether learning signals for others’ pain depend on such preferences, and whether learners combine outcomes into a common currency or maintain separable EVs under moral conflict. Devaluation paradigms suggest goal-directed representations but findings vary across tasks (e.g., model-based vs model-free control in harm-avoidance for others).
Two independent experiments were conducted: (1) an Online behavioral study (N=79; 25±7 years; 39 female) and (2) an fMRI study (N=27 for behavior; fMRI on 25 right-handed; 37±17 years; all female for fMRI data acquisition). Core task: a probabilistic two-armed bandit with moral conflict.
- Task conditions: Conflict and NoConflict. In Conflict blocks, one symbol (lucrative) led to high money for self and painful shocks to the other 80% of the time; the other symbol (considerate) led to low money and low-intensity, non-painful shocks 80%. Outcomes for money and shock were drawn independently per trial to partially decorrelate them. In NoConflict blocks, high money coincided with low shock and low money with high shock, both at 80%.
- Block types (Online): NoDropout (10 trials) followed by explicit probability reports; and Dropout (20 trials): first 10 trials identical to NoDropout, then announcement that either money (MoneyDropout) or shock (ShockDropout) outcomes would be withheld for trials 11–20. Dropout also applied in NoConflict. The fMRI study included six Conflict NoDropout blocks (10 trials each) and a separate Helping task.
- Stimuli: Monetary outcomes displayed numerically (€ amounts). Other’s outcome shown via pre-recorded videos of a confederate’s facial responses to electrical stimulation, enabling inference of pain/non-pain from expressions. Participants were led to believe outcomes had real implications (online: subset delivered later; fMRI: real-time live-feed cover story). Participants knew the video model and the shock recipient were different individuals.
- Explicit learning assessment (Online): After NoDropout blocks, participants reported, for each symbol, the probability of high money and high shock (0–100%).
- Devaluation (Dropout): Trial 11 choices were analyzed to test separable vs combined EVs: if separable, choices should change selectively when the preferred outcome is removed in Conflict blocks; no change expected in NoConflict Dropout.
- Preference classification: Based on first 10 Conflict trials per block using cumulative binomial: Considerate (above 97.5% tail), Lucrative (below 2.5%), Ambiguous (between).
- Computational modeling: Hierarchical Bayesian RLT models in RStan adapted from hBayesDM. Models: • M0: random choice. • M1: single combined EV updated by composite PE = (Out_Mwf + Out_S(1-wf)) − EV (single LR). • M2Dec: separate EV_M and EV_S updated independently (PE_M = Out_M − EV_M; PE_S = Out_S − EV_S; separate LR_M, LR_S); weighting factor wf applied only at decision (softmax on wfEV_M + (1−wf)EV_S). • M2Out: separate EV_M, EV_S updated with outcomes scaled by wf (PE_M = Out_Mwf − EV_M; PE_S = Out_S(1−wf) − EV_S; LR_M, LR_S); decision based on EV_M + EV_S. Outcomes coded as +1/−1 (withheld outcome = 0). Parameters: wf∈[0,1] (weight for money vs shock), LR∈[0,1] (or LR_M, LR_S), inverse temperature τ∈[0,5]. Priors mapped via probit to parameter bounds. Fitting: Models fit to first 10 trials of Conflict blocks. Predictive performance in those trials assessed via LOOIC. For Dropout trial 11 (not fit), models predicted choice likelihoods with Dropout rules: for M2 models, set EV of removed outcome to 0 and base choice only on remaining EV without wf; for M1, EV unchanged but wf set to value the remaining outcome. Group-level posterior predictive comparisons used summed log-likelihoods over 11th trials (4,000 posterior draws).
- Helping task (fMRI sample): Participants endowed with €6 after watching a painful stimulation; donations reduced intensity of a second stimulation (1 point per €1 on a 10-point scale). Average donations (facial-expression trials) were related to wf from the learning task.
- fMRI acquisition: 3T Philips Ingenia CX, EPI TR=1.7 s, TE=27.6 ms, voxel 3×3×3 mm; T1 structural 1 mm isotropic. GLMs modeled outcome phase with parametric modulators for PE signals (BPE for shocks, BPEM for money). Signature analyses: voxelwise dot-product of participant BPE maps with AVPS (affective vicarious pain signature) and Reward Signature (RS) to test loading of PE signals on these distributed systems. Voxelwise regressions examined PE and PEM associations with wf and valuation signals (cluster-level FWE-corrected). Value-updating analyses examined signals covarying with PEs×LRs and PEM×LR_M.
- Statistical analyses: Both Bayesian and frequentist tests reported; Bayes factors used to infer evidence for effects or evidence of absence using conventional thresholds (BF10>3, BF10<1/3).
- Behavioral preferences and learning: In Conflict blocks, participants clustered into Considerate, Lucrative, and Ambiguous subgroups (Online: 29/24/26; fMRI: 13/3/11). Considerate and Lucrative groups showed clear learning curves toward their preferred option (~80% by last trials), while Ambiguous hovered near 50%.
- Explicit reports (Online): Participants reported above-chance, correctly directed symbol–outcome probabilities in both Conflict and NoConflict blocks across preference groups. Reports were biased toward the participant’s valued outcome: Considerate participants showed larger and more accurate shock-probability differentiation than money; Lucrative participants showed the reverse. Bias correlated with proportion of considerate choices (r=0.51, p=1.6×10−5).
- Dropout (devaluation) test (Online): On trial 11, removing the preferred outcome in Conflict blocks led to significant shifts toward the alternative, whereas removing the less-preferred outcome did not change choices. No comparable changes occurred in NoConflict Dropout blocks. Shifts were not mirror-symmetric; preferences moved close to ~50%, consistent with less differentiated EVs for the remaining outcome in biased learners.
- Model comparison: Over first 10 Conflict trials, M1, M2Dec, and M2Out fit similarly (LOOIC within SE), all outperforming random M0. Critically, for devaluation (trial 11), M2Out best predicted observed choices at group and majority individual levels, outperforming M1 and M2Dec; its posterior predictive log-likelihood distribution did not overlap with alternatives. Mechanistically, M2Out scales outcomes by wf during learning, yielding near-zero EV for the less-valued outcome and thus ~50% choices when the favored outcome is removed.
- Parameter recovery and external validity: wf estimates spanned a wide range and strongly correlated with considerate choice proportions. Simulations showed good recovery (r=0.69, p<10−5, BF10>10^5). In the fMRI sample, wf predicted costly helping: higher consideration for others (lower wf) associated with higher donations (Kendall’s Tau = −0.47, BF10=76, p<0.001). Including wf improved prediction of helping beyond IRI empathy subscales and Money Attitude (BF_incl=11.46, p=0.009), while questionnaires did not add predictive power (BF_incl<0.7, p>0.09).
- fMRI outcomes and signatures: Outcome phase activated a broad network. AVPS loaded significantly and negatively on shock PEs (reflecting coding of non-painful as +1, painful as −1), but not on PEM; AVPS loading did not depend on wf (evidence for absence). RS loaded positively on both PEs and PEM; dependence on wf was inconclusive, leaning toward absence.
- vmPFC and valuation: Voxelwise analyses showed vmPFC signals covarying positively with PEs for shocks; a more ventral vmPFC cluster’s PE association depended on wf, while a more dorsal cluster reflected PE independent of wf. Left somatomotor cortex also showed wf-dependent PE associations. PEM effects emerged at a more liberal threshold in striatum and ventral PFC (classic reward PE regions), with some wf-dependent clusters.
- Value updating: Signals covarying with PEs×LRs revealed a robust (medial) prefrontal network associated with shock value updating; analogous PEM×LR_M effects did not survive correction.
The findings demonstrate that during moral conflict, humans learn separable expected values for self-money and others’ shocks, but these representations are biased toward personally prioritized outcomes. Explicit reports and devaluation behavior both indicate separable EVs: participants adapt flexibly when the preferred outcome is removed, yet show attenuated shifts consistent with reduced differentiation of the less-valued outcome. Bayesian model comparison formalizes this: M2Out, which scales outcomes by a personal valuation parameter during learning, uniquely predicts behavior under devaluation, unlike models that combine outcomes during learning (M1) or apply weighting only at decision time (M2Dec). The individual weighting factor wf captures stable preferences that generalize beyond the learning task to costly helping decisions. Neurally, prediction error signals for others’ pain within the pain-observation network (AVPS) appear independent of personal preferences, suggesting that early empathic processing encodes PE-like signals regardless of motives. In contrast, valuation-related regions, notably ventral vmPFC, reflect wf-dependent PE signals, indicating that preference biases manifest at valuation stages. Reward circuitry (RS) encodes PE signals for both lower-than-expected shocks and better-than-expected money, consistent with a common reward-like response to favorable outcomes. These results refine accounts of moral learning by showing that learners retain information about the nature of outcomes (self vs other) rather than reducing to a single composite value, yet allow preferences to bias the learning of those outcomes. The work bridges motivated empathy theories with computational mechanisms, pinpointing where in the brain preference-related biases emerge (vmPFC), while early pain-observation processes remain relatively preference-invariant. The task and computational framework provide tools for investigating atypical social decision-making within computational psychiatry, including antisocial profiles.
This study introduces and validates a computational framework for learning under moral conflict in which individuals maintain separable expected values for self-benefit and other-harm, with an individual valuation parameter (wf) biasing learning toward the prioritized outcome. Behaviorally, devaluation responses and explicit reports support separable but biased representations; computationally, M2Out best accounts for choices, particularly under devaluation. Neurally, PE signals for others’ pain in the pain-observation network are preference-independent, while vmPFC valuation signals reflect individual biases. The wf parameter demonstrates external validity by predicting costly helping. Future directions include: testing alternative value combination rules (e.g., ratio/log-ratio), examining heuristic strategies versus RL, assessing stability and domain-generality of wf across contexts and over time, fitting more general models that distribute preference influence across outcome and decision phases (e.g., participant-specific α), and applying the paradigm to clinical or antisocial populations to probe neurocomputational differences.
- Model space: Focused on RLT formulations; did not test ratio or logarithmic ratio value structures that may apply in some moral contexts.
- Strategy heterogeneity: Some participants may rely on heuristics (e.g., win–stay/lose–shift) rather than RL; not explicitly modeled here.
- Generalizability of wf: Unclear whether wf reflects stable moral preferences versus context-specific tendencies; further longitudinal and cross-task validation is needed.
- Task design influences: Dropout and explicit-report components, while critical for adjudicating models, may themselves encourage separable representations; designs with minimal devaluation/reporting could test this.
- fMRI power: Sample size adequate for moderate effects; underpowered for detecting weaker dependencies of signature loadings on wf; PEM-based updating signals did not survive stringent corrections.
- Group differences: Online vs fMRI sample differences (age, context, experimenter presence) may have influenced preference distributions.
Related Publications
Explore these studies to deepen your understanding of the subject.

