logo
ResearchBunny Logo
Neuro-computational mechanisms and individual biases in action-outcome learning under moral conflict

Psychology

Neuro-computational mechanisms and individual biases in action-outcome learning under moral conflict

L. Fornari, K. Loumpa, et al.

Explore how people navigate morally challenging decisions where personal gain might inflict harm on others. This intriguing study reveals that choices in these situations can be predicted by a reinforcement learning model that values self-benefit and other-harm separately. Key insights come from authors at the Netherlands Institute for Neuroscience and other institutions, making this research a must-listen for those curious about the psychology of decision-making.

00:00
00:00
Playback language: English
Introduction
Predicting the consequences of actions in morally ambiguous scenarios is crucial for effective social interaction, yet the underlying cognitive and neural mechanisms remain poorly understood. This study addresses the gap in knowledge by exploring how individuals learn to balance self-interest against the potential harm to others. Existing reinforcement learning theory (RLT) has successfully modeled self-benefit learning, and more recently, prosocial learning. However, its applicability to situations involving conflicting outcomes for self and others needs further investigation. This study particularly focused on whether conflicting outcomes are integrated into a single value representation or processed separately and how individual differences in moral preferences affect the learning process. The researchers hypothesized that individuals might track separate expectations for self-benefit and other-harm, and that individual differences in preference would be reflected in a valuation parameter. Moreover, the study explored the involvement of brain networks related to empathy for pain and reward processing in this learning process.
Literature Review
The study draws upon existing research on reinforcement learning theory (RLT) and its application to both self-benefit and prosocial learning. It cites studies demonstrating successful application of RLT in understanding how individuals learn to maximize their own rewards and how they learn to benefit others. However, the study notes that the application of RLT to situations involving conflicting outcomes for self and others requires further investigation. The study also builds upon neuroimaging research related to moral decision-making and empathy for pain, referencing brain regions like the vmPFC and pain-observation networks involved in processing moral choices and observing others' pain. The existing literature highlights the complexity of moral decisions and the interplay between cognitive and emotional processes.
Methodology
The research employed two independent studies: an online behavioral study and an fMRI study. The core task in both involved a reinforcement learning paradigm where participants learned to associate symbols with outcomes. One symbol consistently led to higher monetary gains for the participant but also a painful shock to a confederate, while the other symbol offered lower monetary rewards but avoided the painful shock. This created a moral conflict. The online study included additional conditions to assess explicit learning of symbol-outcome associations and to investigate the impact of outcome devaluation (removing either the monetary reward or the shock after an initial learning phase) on subsequent choices. Explicit learning was assessed through post-block probability reports. The fMRI study focused on neural activity during the outcome phase, using neural signatures (AVPS for pain observation and a novel reward signature, RS) to analyze responses. A helping task was included in the fMRI study to assess the external validity of the valuation parameter derived from the moral conflict task. The researchers used Hierarchical Bayesian Model comparisons to evaluate several computational formulations of RLT, comparing models with combined vs. separate outcome representations and exploring the role of individual valuation parameters in shaping decisions. fMRI data were analyzed using voxelwise linear regressions and neural signatures to investigate brain regions involved in prediction error coding and value updating. The behavioral data were analyzed using various statistical methods, including binomial distributions for categorizing participant preferences, Wilcoxon signed-rank tests, and Bayesian methods for model comparison.
Key Findings
The study found substantial individual variability in preferences, with participants exhibiting either a 'Considerate' (prioritizing minimizing other-harm), 'Lucrative' (prioritizing self-gain), or 'Ambiguous' preference. Choices were best described by a reinforcement learning model (M2Out) that tracked self-money and other-shocks separately. Individual differences in preferences were captured by a valuation parameter (*wf*) that weighted the relative importance of self-money and other-shocks. This *wf* parameter showed external validity, predicting choices in an independent costly helping task and outperforming traditional empathy and money attitude measures as predictors of helping behavior. fMRI results revealed that the vmPFC reflected the bias in expected values towards the favored outcome (correlated with *wf*), indicating its role in valuation. The pain-observation network (AVPS) showed activity related to pain prediction errors, but this activity was independent of individual preferences (*wf*). The reward system (RS) responded to both monetary rewards and shock outcomes.
Discussion
The findings support the hypothesis that individuals represent self-money and other-shocks separately, with individual preferences influencing the weighting of these outcomes during decision-making. The model M2Out, which incorporated separate outcome representations and an individual valuation parameter, best predicted participants’ choices, especially in devaluation conditions. The external validity of the valuation parameter (*wf*) highlights its importance in capturing individual differences in moral decision-making beyond traditional trait measures. The dissociation between vmPFC activity (reflecting preference-dependent valuation) and pain-observation network activity (representing preference-independent pain prediction errors) provides insights into the neural mechanisms underlying moral conflict. The study’s findings contribute to a better understanding of how individuals learn and make decisions in morally challenging situations.
Conclusion
This study demonstrates that individuals process self-benefit and other-harm separately when learning in moral conflict situations. Individual preferences are crucial and captured by a weighting parameter that predicts behavior in other similar tasks. The vmPFC's role in valuation and the pain-observation network's role in representing pain prediction errors independently of preferences were identified. Further research should explore alternative computational models, examine the stability of individual preferences, and investigate these processes in atypical populations.
Limitations
The study focused on a specific type of moral conflict and a limited set of computational models. The devaluation trials and probability reports might have influenced participants’ strategies. The study's reliance on a deception-based paradigm raises potential concerns about ecological validity. Future research should explore other forms of moral conflicts, consider alternative modeling approaches, and investigate the influence of task design on decision strategies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny