logo
ResearchBunny Logo
ChatGPT decreases idea diversity in brainstorming

Business

ChatGPT decreases idea diversity in brainstorming

L. Meincke, G. Nave, et al.

This research by Lennart Meincke, Gideon Nave, and Christian Terwiesch uncovers a fascinating paradox in brainstorming: while ChatGPT fuels individual creativity, it simultaneously dampens the diversity of ideas generated in group settings. Discover the implications of this trade-off in their in-depth study.

00:00
00:00
~3 min • Beginner • English
Introduction
This Matters Arising examines whether reliance on ChatGPT, while improving the creativity of individual ideas in brainstorming, reduces the diversity of ideas at the set level—a critical element for effective brainstorming. Building on Lee and Chung (2024), who found that ChatGPT enhances average creativity (originality and appropriateness) of individually submitted ideas across multiple creative tasks, the authors argue that originality is also a property of an idea set, often referred to as variety or diversity. They illustrate the distinction with hypothetical idea sets and empirical observations from Lee and Chung’s experiment 2A, noting clustered themes (for example, many “sprinkler” ideas), and pose the central question: does using ChatGPT inadvertently diminish the diversity of the overall output of ideas?
Literature Review
The article situates its argument within a growing literature on AI and creativity. Lee and Chung (2024) report that ChatGPT augments individual-level creativity across tasks (gift ideas, toy designs, repurposing household items). Torrance (1968) frames originality as the frequency of statistically infrequent responses, emphasizing set-level novelty. Recent work (e.g., Doshi & Hauser, 2024) shows that generative AI can enhance individual creativity while reducing collective diversity of novel content. Additional related studies discuss LLMs for idea generation, directed diversity in crowd ideation, and methods to increase AI idea variance, indicating the importance of measuring diversity beyond individual idea ratings.
Methodology
The authors conducted additional analyses of the publicly available data from Lee and Chung’s experiments. They operationalized diversity at the set level as the ratio of unique idea concepts to the total number of ideas in a condition. Ideas were embedded using Google’s Universal Sentence Encoder, and pairwise cosine similarities were computed. Two ideas were defined as overlapping if their cosine similarity exceeded 0.8; unique concepts are groups of one or more ideas that do not overlap with any other ideas in the set. The primary metric was the percentage of unique concepts per condition. Robustness checks included varying the overlap threshold (0.75 and 0.85) and employing alternative diversity metrics: aggregated comparisons of similarity scores, pairwise similarity comparisons, and mean edge distance in minimum spanning trees. Results were summarized across Lee and Chung’s experiments (1, 2a, 2b, 3, 4, 5). Figure 1 reports percentages of unique ideas with 95% confidence intervals; statistical comparisons, including Bonferroni-corrected confidence intervals for differences between conditions, are provided in Supplementary Table 1. Data and analysis code were obtained from/are available via OSF repositories referenced in the paper.
Key Findings
Across all five experiments, ChatGPT-aided idea sets exhibited lower diversity than comparison conditions without AI assistance. In the primary analysis (threshold 0.8), AI-aided sets were less diverse in all five experiments, with statistically significant differences in four of five comparisons; the non-significant result in experiment 4 is attributed to a likely ceiling effect. These effects persisted under alternative overlap thresholds (0.75, 0.85) and across alternative diversity metrics, with 37 of 45 cross-metric comparisons showing statistically significant differences. An extreme case appears in experiment 2b (toy idea using a brick and a fan): only 6% of ChatGPT-only ideas were unique versus 100% of human ideas, with many AI ideas overlapping and nine sharing the exact same name (“Build-a-Breeze Castle”). In Lee and Chung’s experiment 2A, 20 of 96 ChatGPT-aided ideas included the word “sprinkler,” compared with 7 in the web-search condition and 12 in the human-only condition, illustrating AI-driven thematic clustering.
Discussion
The analyses demonstrate that while ChatGPT boosts the creativity of individual ideas, its widespread use in brainstorming narrows the collective semantic space explored, yielding less diverse sets of ideas. This addresses the key question by showing that AI can promote originality at the individual level yet reduce variety at the set level—a crucial attribute for effective brainstorming and problem exploration. The findings align with prior evidence in creative writing and extend it to brainstorming and concept generation tasks. Practically, the value of brainstorming derives from generating a mosaic of distinct, non-overlapping ideas; AI-enabled ideation, without safeguards, risks producing clustered or repetitive concepts. The results suggest that maximizing the value of AI in creative problem-solving requires techniques that deliberately foster diversity across outputs.
Conclusion
This Matters Arising contributes evidence that ChatGPT reduces idea diversity at the set level across multiple brainstorming experiments, even as it enhances individual idea creativity. The authors’ reanalysis, robust across thresholds and metrics, generalizes concerns about collective diversity to brainstorming contexts. Future work should develop and test prompting strategies and system designs that increase AI idea variance and direct exploration toward more heterogeneous regions of the solution space, improving the balance between individual creativity gains and collective diversity needs.
Limitations
The paper does not present a formal limitations section. Noted constraints include a likely ceiling effect in experiment 4 that may have reduced detectable differences. The analyses rely on secondary data from Lee and Chung (2024). Diversity measurement depends on semantic similarity thresholds and embedding choices (Universal Sentence Encoder), though the authors conduct sensitivity analyses with alternative thresholds and metrics to assess robustness.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny