logo
ResearchBunny Logo
Simple autonomous agents can enhance creative semantic discovery by human groups

Interdisciplinary Studies

Simple autonomous agents can enhance creative semantic discovery by human groups

A. Ueshima, M. I. Jones, et al.

Exploring how groups discover ideas, this study had 1,875 participants search a 20,000-noun semantic space and shows that groups outperform individuals — and that simple bots sharing the most similar noun can boost group performance in easy-to-navigate spaces. Research conducted by Atsushi Ueshima, Matthew I. Jones, and Nicholas A. Christakis.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates how human groups navigate large semantic spaces to discover high-value ideas and whether simple autonomous agents can enhance this process. Prior work highlights tensions between independence and interdependence in collective intelligence: too much interdependence can cause groupthink and premature convergence, while too little coordination hampers exploitation of promising areas. Traditional experimental research has often neglected relationships among candidate ideas (e.g., semantic similarity), even though in real life similar ideas tend to have similar value and are easier to discover via marginal improvements. The authors develop a word-search game mimicking realistic idea exploration, examine performance of isolated individuals versus networked groups, and test interventions using simple bots with interpretable strategies. They also manipulate problem difficulty via decoy peaks that distort the fitness landscape to study how AI assistance interacts with landscape ruggedness.
Literature Review
The paper situates its work within research on collective intelligence, social learning, and networked decision-making. It references studies on the balance of exploration and exploitation in groups, the effects of network structure, learning strategies, and group size on collective performance, and phenomena like social herding and groupthink that can undermine crowd wisdom. Prior work shows more connected networks can aid convergence on easy problems and that collective intelligence emerges from group-level properties not reducible to individual traits. The authors also draw on natural language processing for measuring creativity and vector-space models of semantic representation, proposing that maintaining naturalistic semantic correlations in experimental tasks can improve generalizability.
Methodology
The authors created a word-search game where participants search for high-reward nouns from a curated set of 20,000 frequently used English nouns. Word similarities were computed using word2vec embeddings (300-dimensional vectors), and point values were assigned so that semantically similar nouns receive similar rewards. A single target noun per game had the highest value (20,000), and other nouns were assigned points as a function of cosine similarity to the target; values shown to participants were multiplied by a random factor between 1 and 3 to discourage meta-strategies. Participants: 1875 participants recruited via Amazon Mechanical Turk were organized into 125 groups of 15. Each group played five sequential games (conditions), each lasting 25 rounds. In each round, participants submitted one lowercase, singular noun; invalid nouns received zero points. After each round, participants saw their noun’s point value and the latest answers and values from their immediate network neighbors. They also saw the highest-point-value noun observed so far among themselves and neighbors. Monetary payoffs were based on the group’s highest point value per game to align incentives and reduce producer–scrounger dynamics. Groups were assembled on a website implemented using Breadboard, with tutorial filters, bot detection, and attention checks; participants were compensated $3 plus up to $11 in bonuses; experiments lasted ~40 minutes. Networks and bots: Baseline social networks among the 15 humans were generated using an Erdős–Rényi model with ~20% tie saturation. In bot-embedded conditions, two bot nodes were added, each connected to disjoint subsets of human nodes (7 and 4 neighbors, respectively, selected to match expected degree distributions). Participants were not told whether neighbors were human or bot. Three bot strategies were tested: (1) most-similar bot chose the neighbor noun most similar (highest average cosine similarity) among neighbors’ latest guesses; (2) least-similar bot chose the least similar noun; (3) random bot chose uniformly at random from neighbors’ latest nouns. Bots immediately propagated their selected noun to the other bot, which could then share it with its own neighbors in the same round, enabling rapid dissemination across distant network regions. Bots did not use target information; selections were based solely on human-provided nouns and their semantic relations. Landscape manipulation (decoys): To simulate ruggedness and local optima, the authors introduced decoy nouns whose neighborhoods were artificially boosted in value while keeping the target as the global maximum. Two parameters varied: decoy height (tall vs short) and width (number of boosted neighbors: wide vs narrow), yielding five landscapes: tall/wide, tall/narrow, short/wide, short/narrow, and no decoy. The tall/narrow and short/wide were tuned to have comparable early-stage misleading potential. Decoy treatments were between-groups: each of the five landscapes had 25 unique groups; bot treatments were within-groups: each group experienced five games across conditions (least-similar bot, most-similar bot, random bot, no bot, solo). In solo, all edges were removed; in no-bot, the two bots and their edges were removed; in bot conditions, humans remained networked and bots added edges. Targets: Eighteen target nouns were chosen to be spread out and similarly obscure in the embedding space (e.g., recce, cartography, investiture, comedown, hesitance, decile, shoehorn, edutainment, narrowness, activewear, epee, doyenne, actuation, sarcoma, braggadocio, jowl, fratricide, translocation). These also served as decoys in other games with target–decoy pairs chosen to be semantically distant. Statistical analysis: Primary dependent variables were normalized (mean 0, SD 1). The main measure of group creativity was average cosine similarity to the target across all guesses in a game. Bayesian multilevel regression models (via brms/RStan) included varying intercepts for groups and varying intercepts and slopes for target nouns, with bot condition and landscape type (and their interaction) as key predictors. Highest density intervals (HDIs) were reported; MCMC used 10 chains, 1000 warm-up, 1500 post-warm-up iterations; Rhat < 1.05. Additional analyses examined correlations between round t point values and similarity of successive guesses, and compared noun quality shared by different bot types.
Key Findings
• Groups outperformed isolated individuals: using mean cosine similarity to the target as the dependent variable, solo performance was significantly worse than the no-bot group baseline (β_solo = −0.70; 95% HDI [−1.10, −0.33]). • Bots showed no overall main effects versus no-bot groups, but interactions with landscape mattered. The most-similar bot improved performance in easier landscapes: no-decoy (β_Most:no decoy = 0.56; 95% HDI [0.05, 1.07]) and tall/narrow (β_Most:tall/narrow = 0.50; 95% HDI [0.00, 1.03]); a positive trend in short/narrow (β_Most:short/narrow = 0.44; 95% HDI [−0.08, 0.97]; 90% HDI [0.00, 0.87]). • Direct connection to a bot did not yield significant performance differences relative to non-connected participants, suggesting network-wide effects rather than localized benefits. • Nouns propagated by most-similar bots were more similar to the target than those propagated by least-similar bots in easier landscapes: no-decoy (β_Least:no decoy = −0.85; 95% HDI [−1.42, −0.25]), short/narrow (β_Least:short/narrow = −0.81; 95% HDI [−1.45, −0.20]), tall/narrow (β_Least:tall/narrow = −0.81; 95% HDI [−1.40, −0.20]). Random bots underperformed most-similar bots in tall/narrow (B_random:tall/narrow = −0.85; 95% HDI [−1.44, −0.23]); showed a weaker trend in no-decoy (B_random:no decoy = −0.47; 95% HDI [−1.04, 0.14]; 90% HDI [−0.98, 0.01]). • Most-similar bots did not pull groups toward decoys; in tall/narrow landscapes, least-similar and random bots increased participants’ similarity to the decoy relative to most-similar bots (B_Least:tall/narrow = 0.67; 95% HDI [0.05, 1.31]; B_random:tall/narrow = 0.65; 95% HDI [0.02, 1.26]). • Wide decoy peaks hindered semantic alignment and slowed progress: correlation between round t point value and similarity of successive guesses was lower in wide vs narrow landscapes (β_wide = −0.34; 95% HDI [−0.62, −0.05]); no significant difference between narrow and no-decoy (β_No decoy = −0.06; 95% HDI [−0.45, 0.35]). Social information improved alignment versus solo (β_solo = −0.33; 95% HDI [−0.52, −0.16]). • Despite landscape difficulty, maximum achieved similarities (best guesses) did not differ meaningfully across landscapes; groups were less able to capitalize on occasional high-value discoveries in wide landscapes. • Solo vs group creativity: individuals’ performance in solo modestly correlated with their performance in social conditions (Pearson r = 0.18–0.23), much weaker than correlations among social conditions (r = 0.38–0.54), suggesting partially distinct traits for solo and group creativity. Solo participants explored less and submitted less semantically distant nouns.
Discussion
The findings show that simple, transparent autonomous agents can enhance collective creative discovery by shaping information flow in human networks. Specifically, bots that disseminate the most semantically similar neighbor ideas help groups converge on rewarding regions when the semantic landscape is coherent (no decoys or narrow decoy peaks). This suggests that humans have an intrinsic ability to navigate such landscapes, and most-similar bots amplify this ability by propagating high-quality ideas and reducing noise. In contrast, wide decoy peaks disrupt the mapping between semantics and rewards, impeding the formation of coherent mental models and nullifying the bot’s advantage. The results emphasize that social information itself improves alignment and exploration compared to solitary search. Importantly, most-similar bots did not mislead groups toward decoys, whereas least-similar and random bots sometimes did in certain landscapes. The study also highlights that individual creativity in solo contexts only weakly predicts performance in groups, reinforcing that collective intelligence emerges from group-level dynamics and information structure. These insights inform design of human–AI hybrid systems: low-cost, local, interpretable agents can boost group creativity without sophisticated global knowledge, but their effectiveness depends on problem structure.
Conclusion
Embedding simple, interpretable bots in human social networks can improve collective semantic discovery, particularly in easier, coherent landscapes where semantic similarity corresponds to reward structure. The most-similar bot strategy effectively propagates high-value ideas and supports group convergence without increasing susceptibility to local optima. The work contributes a naturalistic, NLP-based experimental paradigm for studying group creativity and social learning and clarifies conditions under which AI assistance is beneficial. Future research should examine varying group sizes, alternative network structures, richer idea representations (e.g., sentences and non-English languages), and more sophisticated AI agents (e.g., large language models) to assess broader impacts, as well as explore design choices like bot–bot and human–human cross-region sharing and mechanisms to mitigate potential downsides such as ideological echo amplification.
Limitations
The mixed design and finite sample sizes, despite being relatively large, limit power and raise the possibility of false discoveries, particularly in exploratory analyses. Bots necessarily altered network connectivity by adding edges, so observed effects may partially reflect changes in network structure rather than bot behavior alone. The task restricted responses to nouns and used a specific embedding (word2vec), potentially limiting generalizability to other semantic forms or domains. Decoy manipulation simplifies ruggedness to one or two peaks and may not capture complexity of real-world ideation landscapes. Participants were recruited from an online platform with brief rounds and constrained incentives, and strategic or freeriding behavior may have influenced group dynamics. Some preregistered analyses focused on averages rather than best solutions; effects on best guesses were not significant.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny