Psychology
The impact of legal expertise on moral decision-making biases
S. Baez, M. Patiño-sáenz, et al.
This intriguing study by Sandra Baez and colleagues explores how legal expertise influences moral biases in decision-making. By comparing criminal judges, attorneys, and control groups, the researchers found that legal professionals are less swayed by language and emotional states when assessing harm, indicating a fascinating mitigation of biases in moral judgment.
~3 min • Beginner • English
Introduction
The study addresses whether legal expertise reduces common cognitive, emotional, and physiological biases in moral decision-making. Although legal decision-making is traditionally framed as rational and unbiased, prior work shows that judgments can be swayed by intentionality, emotional content (e.g., gruesome evidence), and implicit biases. Moral and legal judgments also share neural substrates, yet moral decision-making in criminal judges and attorneys had not been empirically compared with non-experts. The authors hypothesized that, relative to controls, criminal judges and attorneys would (1) weight the perpetrator’s mental state more appropriately, (2) be less influenced by emotionally arousing language, and (3) rely less on peripheral physiological signals when evaluating transgressions.
Literature Review
Prior research shows that people punish intentional harms more severely and judge them as morally worse than accidental harms, and often overestimate damage in intentional cases (the harm-magnification effect). Emotionally arousing elements, such as gruesome language or images, can increase moral outrage and punishment, and engage limbic regions like the amygdala. Legal decision-makers are susceptible to biases including anchoring, framing, egocentric and implicit racial biases, though judges can sometimes monitor and suppress them. Moral and legal judgments engage overlapping brain networks related to theory of mind and control, with legal judgments showing relatively greater dorsolateral prefrontal involvement. Executive functions have been linked to aspects of moral reasoning and control over automatic responses, and physiological arousal can shape affect and judgments. Despite this background, effects of gruesome language and physiological arousal on legal experts’ decisions had not been tested.
Methodology
Design: A 2 (language: gruesome language, GL, vs. plain language, PL; between-subjects) × 2 (intentionality: intentional vs. accidental; within-subjects) × 3 (group: judges, attorneys, controls; between-subjects) design assessed three decision dimensions: morality, punishment, and harm severity.
Participants: N=169 adults: 45 criminal judges (mean age 44.17, SD 8.98; mean 19.09 years criminal law experience, SD 9.81), 60 criminal attorneys (mean age 37.06, SD 9.98; mean 13.00 years experience, SD 11.17), and 64 community controls (mean age 41.39, SD 11.84) without law degrees or legal work. Groups did not differ in sex; controls had fewer years of education than legal experts; attorneys were younger than judges. All were native Spanish speakers; exclusion criteria included visual disabilities, substance abuse, neurological or psychiatric disorders. A subsample from Colombia (n=86; 29 judges, 30 attorneys, 27 controls) completed executive function testing and ECG recordings; the remaining n=83 completed the task online.
Task and stimuli: Participants read 24 text-based scenarios (12 intentional, 12 accidental) where a protagonist harms a victim, spanning property damage, physical harm, and death. Language manipulation: GL versions used highly graphic descriptions; PL versions used neutral descriptions while keeping consequences equivalent. Each participant saw one language condition only. Order of intentional vs. accidental scenarios was counterbalanced to equate harm severity ranges across intentionality conditions.
Measures: After each scenario, participants rated: (a) morality (1=entirely good to 9=entirely wrong; analyzed as inverted for interpretability), (b) punishment (1=no punishment to 9=severe punishment), and (c) harm severity (1=not harmful to 9=very harmful). Executive functions (EFs) were assessed with the INECO Frontal Screening (IFS). Physiological arousal was indexed via ECG-derived heart rate variability (HRV), focusing on low-frequency (LF, 0.04–0.15 Hz) power; percentage change from baseline to task was computed. LF power was log-transformed due to skewness; task recording lengths did not differ by group.
Statistical analysis: Behavioral data were analyzed in R using mixed ANOVAs/ANCOVAs with group, language, and intentionality as factors; age and years of education were included as covariates given group differences. Box–Cox transformations were applied to improve normality and homoscedasticity; robustness checks used Welch–James tests with trimming, Winsorized variances, and bootstrapping. Post-hoc tests used Tukey-adjusted comparisons; planned Wilcoxon tests with Holm–Bonferroni corrections corroborated results. Power analysis indicated N=158 sufficed for 80% power (actual N=169, power=0.83). For the subsample (n=86), multiple regressions examined associations of group, language, EFs, LF power change, and age with average morality (across intentionality), punishment to accidental harms, and harm severity to accidental harms. A separate power analysis indicated N=79 for 80% power (actual n=86, power=0.95).
Key Findings
- Morality ratings: Intentional > accidental across groups (F1,163=606.82, p<0.0001, η²=0.78). GL increased moral wrongness overall (F1,163=4.77, p=0.03, η²=0.02). Group × language interaction (F2,163=7.16, p=0.002, η²=0.07): judges (p=0.77) and attorneys (p=0.50) were unaffected by GL; controls rated actions as more morally wrong under GL vs. PL (p=0.0002).
- Punishment ratings: Intentional > accidental across groups (F1,163=1107.60, p<0.0001, η²=0.87). Group differences (F2,163=37.85, p<0.0001, η²=0.31): controls punished more than judges (p=0.00002) and attorneys (p=0.00002); judges and attorneys did not differ (p=0.98). Group × intentionality interaction (F2,163=17.94, p<0.0001, η²=0.18): for accidental harms, judges (p=0.0002) and attorneys (p=0.00002) assigned less punishment than controls; no group differences for intentional harms (judges vs controls p=0.96; attorneys vs controls p=0.48).
- Harm severity ratings: Intentional > accidental across groups (F1,163=170.37, p<0.0001, η²=0.51) despite matched harm ranges. Group differences (F2,161=10.59, p=0.0004, η²=0.11): judges rated harms as less severe than controls (p=0.00003) and attorneys (p=0.040); attorneys vs controls ns (p=0.10). Group × intentionality interaction (F2,163=23.42, p<0.0001, η²=0.20): for accidental harms, judges (p=0.0002) and attorneys (p=0.00002) rated lower severity than controls; no group differences for intentional harms (judges vs controls p=0.98; attorneys vs controls p=0.29). Within each group, intentional > accidental (judges p=0.00002; attorneys p=0.00002; controls p=0.005).
- Physiological and cognitive predictors (subsample n=86): In a regression on average morality, significant group × language effects replicated; LF power change predicted morality overall (t=2.70, p=0.0086, β=−0.06). Simple regressions by group/condition: under GL, LF power predicted morality in controls (t=−2.48, p=0.03, β=−0.61) but not in judges (t=0.47, p=0.64) or attorneys (t=2.06, p=0.06). No associations under PL. For punishment to accidental harms, age, EFs, and LF power were not significant; only judges differed from controls (t=−4.04, p=0.00012, β=−1.03). For harm severity to accidental harms, only EFs predicted ratings (t=2.37, p=0.01, β=0.25).
Discussion
Information about the transgressor’s mental state robustly shaped moral judgments across all groups: intentional harms were judged more wrong, meriting more punishment and greater harm. However, legal experts (judges and attorneys) showed attenuated bias for accidental harms, assigning less punishment and lower harm severity than controls, consistent with more accurate integration of mental state information and potential ability to override outcome-driven responses. Despite this, all groups exhibited the harm-magnification effect, overestimating harm in intentional vs. accidental scenarios with equivalent outcomes, indicating persistence of an intent-driven damage bias.
Emotionally arousing language biased morality judgments only in controls; judges and attorneys were insensitive to GL, suggesting legal expertise buffers language-driven emotional influences. In line with this, peripheral physiological arousal (LF HRV change) predicted morality ratings only in controls exposed to GL, not in judges and only marginally in attorneys, indicating reduced reliance on bodily signals among legal experts, particularly judges. Executive functions predicted harm severity ratings, supporting a role for domain-general control processes in moral assessment.
These findings suggest that legal training and professional role can reduce susceptibility to certain cognitive-emotional and physiological biases in third-party moral decision-making, with implications for legal procedures (e.g., juror exposure to gruesome evidence) and for understanding overlaps and distinctions between moral and legal judgments.
Conclusion
This first direct comparison of criminal judges, criminal attorneys, and non-experts shows that legal expertise attenuates specific moral decision-making biases. Judges and attorneys were less influenced by gruesome language and peripheral physiological arousal and assigned less punishment and harm severity for accidental harms, indicating better utilization of mental state information. Nonetheless, all groups showed harm magnification for intentional harms. These results highlight how expertise shapes legal decision-making and support bias-reduction approaches in legal settings. Future research should include direct measures of intentionality detection skills, quantify exposure/desensitization to graphic materials, employ more ecological designs, and further dissect executive function contributions across moral judgment components and expert roles.
Limitations
- Group differences in age and education required covariate control; residual confounding cannot be fully excluded.
- Only years of professional experience were measured; specific aspects of expertise (e.g., exposure to gruesome materials, desensitization, explicit training on bias) were not assessed.
- No direct measures of intentionality detection ability were included, limiting mechanistic inference.
- Physiological analyses were conducted in a subsample (n=86), potentially limiting generalizability.
- Task employed text-based vignettes; ecological validity relative to real courtroom decision-making is limited.
- Participants were from Colombia and Argentina; cross-cultural generalizability may be constrained.
Related Publications
Explore these studies to deepen your understanding of the subject.

