Psychology
A computational reward learning account of social media engagement
B. Lindström, M. Bellander, et al.
The study investigates whether engagement on social media platforms can be explained by reinforcement learning mechanisms driven by social rewards (likes). Motivated by the widespread use of social media and its comparison to a Skinner Box, the authors test the hypothesis that users learn to maximize social rewards by adapting the timing of their posts according to the average reward rate, balancing effort costs and the opportunity cost of time. Drawing on theory that links average reward rate to response vigor, they predict that higher subjective reward rates should lead to shorter latencies between posts, extending classic learning principles from animal behavior to real-world human social media activity across much longer timescales.
Prior work suggests that likes act as social rewards that engage neural systems similar to those processing primary and monetary rewards. Neuroimaging studies show overlapping circuitry for social and non-social rewards, and behavioral research indicates that receiving likes increases satisfaction and activity, consistent with reward anticipation. Social comparison affects the subjective value of likes, paralleling relative valuation in non-social rewards. However, much prior research relies on self-report and laboratory studies, leaving a gap in direct behavioral evidence for reward learning in real-world social media use. Previous quantitative analyses have yielded mixed results, and RL methods have often been used to optimize platforms rather than to model human psychological mechanisms.
The research comprises three studies combining large-scale behavioral data and an experiment with computational modeling based on reinforcement learning (RL) theory.
- Study 1 (Instagram): Anonymized dataset from a prior study of users in a 2014 Instagram photography contest; users with fewer than 10 posts excluded. Final sample: 851,946 posts from 2,039 users. The setting involves posting images and receiving likes.
- Study 2 (three forums): Web-scraped, anonymized data from topic-focused forums with prolific image-based threads: Men’s fashion (styleforum.net), Women’s fashion (forum.purseblog.com), and Gardening (garden.org). Users with fewer than 10 image posts excluded. Final sample: 190,721 posts from 2,127 users (Men’s fashion: N=543; Women’s fashion: N=773; Gardening: N=813). Analyses were also checked including text-based posts (qualitatively similar results).
- Study 3 (experiment): Online experiment on Amazon Mechanical Turk (n=176). Participants freely posted memes during a 25-minute session (total posts=2,206). Likes (0–19) served as social rewards, ostensibly from other users. The average number of likes was experimentally manipulated between session halves (low=0–9 vs high=10–19 likes per post; direction of change counterbalanced). Participants could also like others’ posts. Computational modeling and analyses:
- Conceptualization: Posting is treated as free-operant behavior with response latency T_post (time between posts). RL theory predicts response vigor relates to the average reward rate due to opportunity costs.
- Model-independent tests: (1) Response rates vs reward rates evaluated with hyperbolic (quantitative law of effect) vs linear fits. (2) Granger causality tests whether past likes predict T_post beyond autoregressive structure; parameters tuned using simulations to detect learning-driven causality and avoid false positives from non-learning models.
- RL model: A policy-gradient variant of R-learning with continuous action space (response latency). The model updates a posting policy (mean of an exponential distribution for latency) to maximize average net reward rate, balancing effort cost of responding and opportunity cost of time. Key elements: prediction error δ = experienced reward − (effort cost depending on latency + opportunity cost proportional to average reward rate R). The policy and R are updated via gradient ascent using step size α. Three free parameters per user: learning rate (α), initial policy (P), and effort cost sensitivity (C). A small “Pavlovian” term incorporates the momentary effect of average reward rate on latency. Parameters are estimated individually; model fit compared to a null model with a constant average latency (no learning) using Akaike weights and Bayesian model selection.
- Simulations: Generative simulations using median best-fitting parameters assessed whether the model reproduces observed effects, with likes generated from simple Poisson processes.
- Statistical analysis: Mixed-effects models (log-linear for T_post), with random intercepts per user; predictors standardized within individuals. Covariates included likes at previous post, post number, and weekday. Additional robustness checks used cluster-corrected SEs.
- Individual differences: k-means clustering on standardized, log-transformed individual parameter estimates (α, P, C) determined computational phenotypes; optimal cluster number selected via NbClust.
- Reward sensitivity signatures:
- Quantitative law of effect: Response rates followed a hyperbolic function of reward rate better than a linear function across datasets (mean R^2: Study 1=0.43; Study 2=0.37).
- Granger causality: Likes Granger-caused posting latencies in all datasets (Study 1 Instagram: Z=23.65, p<0.001; Study 2 Men’s fashion: Z=3.94, p<0.001; Women’s fashion: Z=14.16, p<0.001; Gardening: Z=6.78, p<0.001).
- Model comparison and fit:
- Study 1 (Instagram): RL model outperformed no-learning model for ~70% of users (mean AIC weight=0.70, 99% CI [0.68, 0.81]; t(2038)=23.1, p<0.0001; Bayesian exceedance probability xp=1). Robust to outliers and dataset partitioning. Individuals with more followers showed diminishing marginal utility of likes (habituation; Supplementary).
- Study 2 (forums): RL model favored across all three platforms (pooled mean AIC weight=0.77, 99% CI [0.76, 0.79]; t(2126)=38.84, p<0.0001; xp=1). Robustness checks confirmed results.
- Alternative models lacking key RL components, fixed/altered cost structures, foraging-based models, or models without instrumental policy adjustments fit worse than the RL model.
- Effect of subjective average reward rate (R) on posting latency (T_post):
- Study 1 (Instagram; N_obs=851,946; N_users=2,039): Higher R predicted shorter T_post (β=-0.18, SE=0.003, t=54.59, p<0.0001), implying ~18% shorter inter-post intervals under high vs low R (~8 hours reduction). Each 1% increase in R corresponded to ~0.34% (~5 min) shorter T_post. Effect stronger among users better fit by the RL model (interaction with AIC weight β=-0.04, SE=0.008, t=-5.5, p<0.0001).
- Study 2: Men’s fashion (N_obs=36,139; N_users=541): β=-0.08, SE=0.016, t=-5.1, p<0.0001 (~8% shorter). Women’s fashion (N_obs=36,434; N_users=773): β=-0.16, SE=0.02, t=-7.1, p<0.0001 (~16% shorter). Gardening (N_obs=118,148; N_users=813): β=-0.18, SE=0.02, t=-12.09, p<0.0001 (~18% shorter). Per 1% increase in R: reductions of ~0.18%, 0.41%, and 0.38%, respectively.
- Generative simulations using fitted parameters reproduced the observed R effects on T_post across platforms.
- Individual differences (computational phenotypes): Four clusters emerged from k-means on (α, P, C), ranging from 41% to 7% of users. Cluster 1 featured low learning rate (α) and weakest RL model fit (mean AICw=0.11), indicating relative insensitivity to rewards. Clusters 2 and 4 showed high responsiveness to rewards via different mechanisms (e.g., low effort cost with average α vs higher effort cost with higher α). Cluster assignments were not strongly tied to dataset (Cramér’s V=0.3).
- Experimental causality (Study 3; n=176): Manipulating likes caused changes in posting latencies: lower reward rate (0–9 likes/post) produced longer T_post than higher reward rate (10–19 likes/post) (β=0.109, SE=0.044, z=2.47, p=0.013; ~10.9% longer). Modeling subjective R for a subset (n=156) replicated the core pattern (β=0.28, SE=0.045, z=6.24, p<0.0001). Participants with more Instagram followers showed weaker like effects, paralleling diminished marginal utility observed in field data.
The findings demonstrate that social media posting behavior conforms to reinforcement learning principles: users adapt the timing of posts to maximize the average social reward rate, balancing effort and opportunity costs. This extends classic RL theories of response vigor from animal laboratory settings to real-world human behavior over much longer timescales. The consistency across platforms and the causal experiment strengthens the interpretation that social rewards drive posting latencies. The results align with dopamine-inspired theories linking tonic dopamine to average reward rate and response vigor, suggesting a potential neurobiological substrate for online engagement patterns. Practically, understanding social media engagement through RL offers predictive power for diverse online behaviors (e.g., moral outrage diffusion, norm expression), and indicates the potential for targeted interventions or design changes (e.g., altering effort costs) tailored to individual computational phenotypes.
Across large-scale observational datasets and an online experiment, the study shows that social media engagement reflects reward learning: higher average social reward rates lead to shorter posting intervals, and RL models quantitatively account for users’ posting dynamics. The work introduces computational phenotyping of social reward learning on social media and provides a mechanistic framework linking basic RL processes to complex online behaviors. Future research should incorporate user demographics and development, integrate negative feedback and punishments, extend models to content selection and action choice, and experimentally probe social comparison effects and other social factors in online environments.
- Observational datasets are correlational; while the experiment establishes causality for reward rate effects, some field inferences may still be subject to confounds.
- Anonymized real-world data precluded demographic analyses (e.g., age), limiting assessment of moderators of reward learning.
- Focus was on timing (latency) rather than content or type of actions; content-related learning and action selection were not modeled.
- Potential platform-specific issues (e.g., fake accounts/likes, economic motives on Instagram) could introduce noise or bias.
- Negative feedback (e.g., downvotes) and broader social network factors (reciprocity, social proximity) were not modeled in the main analyses.
Related Publications
Explore these studies to deepen your understanding of the subject.

