Psychology
How human–AI feedback loops alter human perceptual, emotional and social judgements
M. Glickman and T. Sharot
The study investigates how human–AI interactions modify human beliefs and judgements, particularly whether and how biases can be amplified via feedback loops between humans and AI systems. Modern AI (e.g., CNNs and transformers) can surpass humans in consistency and sensitivity to subtle patterns but may also inherit and amplify biases from training data or data imbalance. As AI increasingly assists humans (e.g., in diagnosis, recommendations, hiring), there exists a mechanism through which biased AI can influence and increase human bias, beyond the well-documented case of humans producing biased AI. The central hypothesis is that repeated interaction with biased AI leads humans to learn and internalize that bias over time, in contrast to human–human interactions, and that the magnitude of amplification depends both on AI output characteristics (e.g., lower noise, leveraging subtle signals) and on how humans perceive AI (e.g., as superior or more reliable). The work tests these predictions across perceptual (motion discrimination), emotional (emotion aggregation), and social judgement tasks, including a real-world generative AI (Stable Diffusion).
Prior work shows AI systems can inherit and perpetuate human biases due to biased datasets or class imbalance ("bias in, bias out"). Generative models trained on Internet-scale data reflect cognitive, racial, and gender biases; other AI systems (facial recognition, recommender systems, hiring and credit tools) exhibit social bias and may amplify it. Bias amplification has been documented algorithmically in multiple domains, yet its impact on human belief formation via interaction with AI outputs has been underexplored. Ensemble perception literature indicates humans can show valence biases under brief encoding. Humans may trust or appreciate algorithmic judgments, especially as tasks become difficult, though algorithm aversion can occur after observing errors. These strands suggest a plausible pathway for AI-to-human bias contagion via feedback learning, motivating empirical tests of human–AI versus human–human influence and the roles of AI output characteristics and human perceptions.
Ethics and participants: Approved by UCL Ethics Committee (3990/003, EP_2023_013). Total n = 1,401 recruited via Prolific; compensated; normal or corrected vision. Experiments were programmed in PsychoPy3 and hosted on Pavlovia. Analyses used Matlab, Python (Colab), and SPSS.
Experiment 1: Emotion aggregation and AI/human interaction
- Level 1 (baseline human bias): n = 50. Task: 100 trials of arrays of 12 faces, each array shown for 500 ms. Faces drawn from 50 morphed grayscale faces ranging from 100% sad (rank 1) to 100% happy (rank 50), created by linear interpolation between Ekman gallery expressions. Arrays constructed so half had mean rank < 25.5 (more sad) and half > 25.5 (more happy). Bias defined as mean proportion of "more sad" minus 0.5.
- Level 2 (training): A CNN was trained on 5,000 arrays (50 participants × 100 arrays). Architectures tested included a CNN with five convolutional layers (filters 32, 64, 128, 256, 512; ReLU activations), three dense layers, dropout 0.5, and also ResNet50 variants. Models were evaluated on a 300-array out-of-sample test set. Training label manipulations: objective labels; objective labels with 3% bias; participant classifications (noisy, ~63% accuracy; 3% bias); random labels with 3% induced bias.
- Level 3 (interaction): New participants (n = 50 per condition, multiple conditions) completed 150 baseline trials, then 300 interaction trials in six blocks. On each trial: participant classifies array, then sees associate response (AI or human), and chooses whether to change. Conditions: human–AI; human–human; human–AI perceived as human (AI input labeled human); human–human perceived as AI (human input labeled AI). Engagement checks and exclusions applied in human–human Level 2.
Experiment 2: Random dot kinematogram (RDK) with algorithm interaction
- Baseline: 30 trials with 100 white moving dots (1 s display). Rightward motion percentages spanned 6–96%. Participants estimate % moving right (0–100%) and confidence.
- Interaction blocks: Three algorithms (order counterbalanced via Latin squares) labeled A/B/C: Accurate (truthful), Biased (systematic upward bias: +0–49%, mean ~24.96), and Noisy (accurate plus Gaussian noise, s.d. ~28.46), with biased and noisy having matched absolute error. On each trial, before seeing algorithm output, participants assign a weight w between "100% you" and "100% AI" to form a joint decision: w × participant response + (1−w) × algorithm response. Algorithm output then revealed for 2 s. Dependent measures: AI-induced bias (interaction bias − baseline bias), AI-induced accuracy change (baseline error − interaction error), and average AI weight. Follow-up studies: exclusive interaction across five blocks with biased AI (n = 50) and with accurate AI (n = 50).
Experiment 3: Stable Diffusion exposure and social judgements
- Design: Participants (n = 100; 50 AI exposure, 50 control) completed three stages each with 100 trials. Stage 1 baseline: shown six headshots (White man, White woman, Asian man, Asian woman, Black man, Black woman) from the Chicago Face Database (ages 30–40; balanced for age, attractiveness, racial prototypicality). Task: select who is most likely a financial manager (definition provided).
- Stage 2 exposure: AI group viewed three Stable Diffusion v2.1 images per trial (prompt: "A color photo of a financial manager, headshot, high-quality"), each for 1.5 s; images curated to avoid artifacts; categorization by 30 raters (Cohen’s κ = 0.611) yielded 85% White men, 11% White women, 3% non-White men, 1% non-White women. Control group viewed fractal images.
- Stage 3 post-exposure: repeat Stage 1 task. Analysis: mixed-model multinomial logistic regression with exposure (before vs after) as fixed factor; random intercepts and slopes per participant. A treatment vs control comparison assessed differential change.
Statistics: Two-sided tests; non-parametric permutation tests (10^5 shuffles) for means; ANOVAs with sphericity checks and Greenhouse–Geisser corrections; Levene’s/Welch corrections for variance issues. Linear mixed models used for temporal trends. Effects reported with P values, CIs, and effect sizes where applicable.
Data and code: GitHub repository provides datasets and analysis code (https://github.com/affective-brain-lab/BiasedHumanAI).
- Baseline human bias in emotion aggregation (Level 1): Participants categorized 53.08% of arrays as "more sad" (permutation vs 50%: P = 0.017; d = 0.34; 95% CI_more sad = 0.51–0.56), larger in block 1 (56.72%) than blocks 2–4 (51.87%) (P = 0.002; d = 0.46).
- AI trained on human labels amplified bias: CNN trained on human classifications labeled 65.33% of test arrays as sad (vs 50% chance; P < 0.001; 95% CI = 0.60–0.71), significantly higher than human bias (53%; P < 0.001; d = 1.33). When trained on accurate labels with 3% induced bias, model bias matched 3%; when trained on random labels with 3% bias (high noise), model bias was 50% (100% of arrays classified sad), linking amplification to label noise.
- Human–AI interaction increased human bias over time: Baseline sad classification rate was 49.9% (±1.1% s.e.), rising to 56.3% (±1.1% s.e.) during AI interaction (P < 0.001; d = 0.84), increasing from 50.72% in first block to 61.44% in last (linear mixed model b = 0.02; t(50) = 6.23; P < 0.001). Decision changes were more frequent when the AI disagreed (32.72% ± 2.3% s.e.) vs agreed (0.3% ± 0.1% s.e.) (P < 0.001; d = 1.97).
- Human–human interactions showed no amplification: Participants changed responses less when disagreeing with humans (11.27% ± 1.4% s.e.) vs AI (32.72%) (P < 0.001; d = 1.07). No learned bias versus baseline (51.45% vs 50.6%; P = 0.48; d = 0.10), and no increase over time (b = 0.003; t(50) = 1.1; P = 0.27).
- Perception and input both matter: When AI was labeled as human, induced bias increased over blocks (50.5% to 55.28%; b = 0.01; t(50) = 3.14; P < 0.001). When human outputs were labeled as AI, induced bias also increased (49.0% to 54.6%; b = 0.01; t(50) = 2.85; P = 0.004). ANOVA on induced bias showed interactions of input×time (F(4.55,892.35) = 3.40; P = 0.006) and label×time (F(4.55,892.35) = 2.65; P = 0.026), with main effects of input (F(1,196) = 9.45; P = 0.002) and time (F(4.55,892.35) = 14.80; P < 0.001). Decision-change ANOVA: input (F(1,196) = 7.05; P = 0.009) and label (F(1,196) = 76.30; P < 0.001) increased change likelihood (robust to Welch corrections).
- RDK task: Interacting with biased algorithm increased human bias relative to baseline (M_bias(biased AI) = 2.66 vs baseline 0.62; P = 0.002; d = 0.28) and relative to interacting with accurate (M_bias = 1.26; P = 0.006; d = 0.25) or noisy (M_bias = 1.15; P = 0.006; d = 0.25) algorithms; accurate and noisy did not differ from baseline (P > 0.28). In a follow-up with biased AI across five blocks, AI-induced bias increased linearly over time (b = 1.0; t(50) = 2.99; P = 0.004).
- Accuracy improved with accurate AI: AI-induced accuracy change (baseline error − interaction error) was positive with accurate AI (M = 1.55; P < 0.001; d = 0.32), larger than with biased (M = 0.03; P < 0.001; d = 0.33) and noisy (M = 0.67; P = 0.01; d = 0.22). Follow-up with accurate AI across five blocks showed accuracy gains increasing over time (M = 3.55; P < 0.001; b = 0.84; t(50) = 5.65; P < 0.001).
- Participants underestimated biased AI influence: Reported perceived influence was higher for accurate vs biased (P < 0.001; d = 0.57) and accurate vs noisy (P < 0.001; d = 0.58). However, actual influence magnitudes (z-scored across algorithms and relative change vs baseline) were equal for biased-induced bias vs accurate-induced accuracy (P = 0.90 and P = 0.89, respectively).
- Stable Diffusion exposure amplified social bias: In treatment, selecting White men as financial managers increased from 32.36% to 38.20% post-exposure, with significant contrasts vs White women (b = 0.26; t = 2.08; P = 0.04), Asian women (b = 0.47; t = 3.79; P < 0.001), Black men (b = 0.65; t = 3.04; P = 0.004), and Black women (b = 0.47; t = 2.46; P = 0.02); no significant difference vs Asian men (P = 0.051). Control (fractals) showed no significant exposure effect (F(5,67) = 1.69; P = 0.15). Treatment vs control difference in change for selecting White men was significant (P = 0.02; d = 0.46).
The results demonstrate a feedback mechanism wherein AI systems amplify small biases present in training data and, through interaction, humans learn and internalize these biases, increasing their own error. In contrast, human–human interactions did not produce comparable amplification, likely because human outputs are noisier and humans rely on broader experiential priors, making them less sensitive to subtle biased signals. AI’s lower noise and tendency to leverage weak but predictive shortcuts enable strong human learning even when the signal is biased; if AI is perceived as superior, adopting its bias can be a rational update from the learner’s perspective. These effects generalize across modalities (emotion, motion, social judgement) and algorithm types (CNN, rule-based biased/accurate algorithms, real-world latent diffusion). Notably, a popular generative model (Stable Diffusion) over-represented White men for the prompt "financial manager," and brief exposure biased subsequent human social judgements, highlighting risks beyond direct, deliberate AI use—passive consumption of AI-generated content can shift beliefs. Participants underestimated the biased algorithm’s impact, implying increased susceptibility and limited metacognitive awareness of AI’s influence. The findings underscore the broader societal relevance: biased AI can propagate and magnify human biases in domains like hiring, medicine, and education. Conversely, accurate AI improved human accuracy, revealing potential benefits when algorithmic bias is reduced and performance is reliable. Together, the work advances understanding of human–AI co-adaptation and the conditions under which AI enhances or harms human judgment.
This study identifies and characterizes a human–AI feedback loop that amplifies bias: AI trained on slightly biased human data magnifies that bias, and humans interacting with the biased AI learn to become more biased over time. The amplification is stronger than in human–human interactions and depends both on the AI’s outputs (low noise, leveraging subtle biases) and on how humans perceive AI. The phenomenon extends to real-world generative AI, where exposure to biased outputs shifts social judgements. Importantly, interacting with accurate AI improves human accuracy, indicating that reducing algorithmic bias can yield positive effects on human decision-making. Implications include the need for developers, policymakers, and users to recognize AI’s influence on human beliefs, implement bias mitigation and auditing, and consider awareness-raising interventions. Future work should assess the persistence and durability of AI-induced bias, explore moderators (exposure duration, bias salience, individual differences), and test strategies (e.g., transparency, debiasing prompts, training) to reduce bias transmission while preserving AI’s beneficial impacts.
- Persistence over time is unknown: The longevity of AI-induced bias was not measured beyond the experimental blocks; endurance, decay, or consolidation of effects requires longitudinal testing.
- Sampling and ecological validity: Convenience samples recruited online via Prolific were not representative; tasks were controlled lab-style paradigms (emotion aggregation, RDK), which may not capture full complexity of real-world decisions.
- Self-selection bias: Online recruitment may attract participants with particular interest in AI/decision-making; study adverts attempted to mitigate, but residual bias is possible.
- Domain and stimuli constraints: Emotion task used morphed Ekman faces; RDK is abstract motion estimation; social judgement relied on curated headshots and a single occupational prompt; generalization to other tasks and domains needs testing.
- Label noise and model specifics: Bias amplification demonstrations used specific CNN architectures and label manipulations; different models/training regimes may yield different amplification dynamics.
- Ground truth ambiguity in social judgements: There is no definitive ground truth for the "financial manager" category selection; analyses rely on demographic benchmarks rather than objective correctness.
- Brief exposure design: Stable Diffusion exposure was constrained (~1.5 s per trial) to mimic typical online viewing; different exposure durations and contexts may modulate effects.
- Perception manipulations: Conditions labeling AI/humans may interact with expectations and trust; broader measures of perceptions and trust were not exhaustively explored.
Related Publications
Explore these studies to deepen your understanding of the subject.

