Computer Science

Generative AI enhances individual creativity but reduces the collective diversity of novel content

A. R. Doshi and O. P. Hauser

Generative AI can boost individual creativity but risks narrowing collective novelty. In an online experiment where some writers received story ideas from an LLM, AI-assisted stories were rated more creative, better written, and more enjoyable—especially for less creative writers—yet showed greater similarity across stories. Research conducted by Anil R. Doshi and Oliver P. Hauser.... show more

Introduction

Creativity underpins innovation and human expression across literature, art, and music. Recent advances in generative AI—particularly LLMs—challenge assumptions about the uniqueness and superiority of human-generated content. Prior work shows generative AI can assist in joint storyline development, improve quality and efficiency in white-collar tasks, boost customer support productivity, speed programming, and enhance persuasion. Yet, its impact on a fundamental human behavior—human creativity—remains unclear. This study investigates how access to generative AI affects creative written output, focusing on short (micro) fiction. Creativity is commonly assessed along novelty (departure from expectations) and usefulness (practicality and relevance). Adapting these constructs to short fiction, the novelty index captures novelty, originality, and rarity; the usefulness index captures appropriateness for the target audience, feasibility of development into a complete book, and likelihood of publisher development. Generative AI could enhance creativity by providing springboard ideas, overcoming writer’s block, and offering multiple starting points, potentially increasing creative output. Conversely, AI could hamper creativity by anchoring writers to specific ideas and reducing variability, yielding more similar and possibly derivative outputs. These pathways may not be mutually exclusive. To provide causal evidence, the authors conducted a preregistered, two-phase online experiment. In phase 1, 293 writers produced eight-sentence stories under randomized conditions with or without access to GPT-4–generated three-sentence ideas (one or up to five). In phase 2, 600 evaluators assessed stories’ novelty, usefulness, and emotional characteristics. The central question is how generative AI affects individual creativity and the collective diversity of outputs.

Literature Review

The paper situates its contribution within several streams: (1) generative AI’s role in collaborative content creation and joint AI-human storytelling; (2) productivity gains in routine and knowledge work, including customer support and software development; (3) enhanced persuasive messaging; and (4) creativity assessment traditions emphasizing novelty and usefulness (e.g., consensual assessment techniques). It also references work on micro-fiction and micro-videos, and theoretical perspectives on how novel works shift readers’ horizons of expectations. The study extends this literature by experimentally isolating the causal impact of exposure to LLM-generated ideas on human creative writing, including heterogeneity by inherent creativity and implications for content diversity and social dilemmas.

Methodology

Design: Preregistered, two-phase online experiment. Participants (writers): Recruited via Prolific, UK-based, ≥95% approval, 100–1,000,000 prior submissions. Of 500 starters, after consent checks, dropouts, and preregistered exclusions (3 in Human-only who admitted using AI), N=293 writers remained. Creativity trait measure: All writers completed the Divergent Association Task (DAT) before writing, providing 10 unrelated words. DAT score (cosine distance, scaled 0–100) computed for 291/293 writers with ≥7 valid words (mean 77.24, SD 6.48) as an inherent creativity measure. Story task: Writers were randomized to one of three topics: (i) adventure on the open seas, (ii) adventure in the jungle, (iii) adventure on a different planet. Instructions required exactly eight sentences, in English, suitable for ages 15–24. Experimental conditions (random assignment):

Human-only: No mention of or access to AI; text box with automatic length checks.
Human with one GenAI idea: Option to receive one GPT-4–generated three-sentence idea (prompt: “Write a three-sentence summary of a story about an adventure on the [topic].”).
Human with five GenAI ideas: Option to receive up to five distinct GPT-4–generated ideas. All ideas visible; copy-paste of AI text disabled. Writer self-evaluations: After writing, writers rated their story’s novelty and usefulness (nine-point scales) and stylistic/emotional characteristics (enjoyment, well written, boring, funny, plot twist, effect on expectations), plus views on profit shares and whether the story reflected their own ideas. Human-only participants were asked whether they had used AI; those answering “yes” were excluded per preregistration. Evaluators: A separate sample of N=600 Prolific participants (UK-based, ≥95% approval, non-overlapping with writers) each evaluated six stories (two per topic), randomized order. In total, 3519 evaluations across 293 stories; stories received 9–14 reviews (most 11–13). Evaluators rated novelty, usefulness, and emotional characteristics (nine-point scales). In a subsequent stage, they guessed human vs AI authorship likelihood (0–100%). Then evaluators were informed whether the writer had access to and used AI, and, if so, were shown the AI idea text. They then rated ownership (extent story reflects author’s ideas and author’s ownership claim; averaged into an ownership index, α=0.92) and indicated hypothetical profit shares if AI was used, and responded to ethicality and disclosure questions. Creativity indices: Novelty index averaged ratings for novel, original, rare (α=0.92). Usefulness index averaged appropriate, feasible, publishable (α=0.89). Emotional outcomes: enjoyable, well written, boring, funny, plot twist, changed expectations (nine-point scales). Similarity analyses: Used OpenAI embeddings to compute cosine similarity (×100; range 0–100). For each story: (i) similarity to average embedding of all other stories within the same condition; (ii) similarity to a generative AI idea. For Human-only or for writers who did not request an idea, a random AI idea from the same topic pool was assigned to enable comparisons; for those who used AI, the first available idea was used. Statistical analysis: OLS regressions. For writer-derived outcomes, robust SEs; for evaluator-derived outcomes, robust SEs clustered at evaluator level. Key independent variables: condition dummies with Human-only as reference. Additional specifications included evaluator fixed effects, story order fixed effects, story topic fixed effects, and an indicator for accessing at least one AI idea. Intention-to-treat analyses were primary. Heterogeneity: interactions of continuous DAT score with conditions. Preregistration and ethics: AsPredicted (ID 136723). Ethics approvals: UCL School of Management (UCLSOM-2023-002) and University of Exeter (1642263). Informed consent obtained for both studies. Data and code: Dryad (https://doi.org/10.5061/dryad.qfttdz0pm).

Key Findings

Adoption and use of AI ideas: 88.4% of writers in AI-access conditions requested at least one idea. In one-idea condition: 82/100 requested one idea. In five-ideas condition: 93/98 requested ideas; mean requests 2.55; 24.5% requested all five. Creativity indices (evaluator ratings): Relative to Human-only, AI access increased novelty and usefulness; effects larger with more ideas.

Novelty: One idea +5.4% (b=0.207, P=0.021). Five ideas +8.1% (b=0.311, P<0.001).
Usefulness: One idea +3.7% (b=0.185, P=0.039). Five ideas +9.0% (b=0.453, P<0.001). Five vs one idea difference +5.1% (P=0.0012; one-idea mean 5.21). Results robust to evaluator FE, order FE, topic FE, and AI-access indicator. Writers’ self-assessments: No statistically significant differences in novelty or usefulness across conditions. Emotional characteristics (evaluator ratings): AI access increased several positive attributes.
Enjoyable: One idea b=0.216 (P=0.028); five ideas b=0.375 (P<0.001).
Plot twist: One idea b=0.384 (P<0.001); five ideas b=0.468 (P<0.001).
Better written: Five ideas b=0.372 (P<0.001).
Changed expectations: Five ideas b=0.251 (P=0.005).
Less boring: Five ideas b=−0.200 (P=0.049).
Funny: No significant increase for five-ideas (b=−0.106, P=0.115). Self-assessed stylistic characteristics: No significant differences across conditions. Heterogeneity by inherent creativity (DAT): No differences in frequency of AI access by DAT. Among high-DAT writers, AI access had little effect; their stories were already rated highly. Among low-DAT writers, AI access substantially improved outcomes:
Novelty: +6.3% (one idea), +10.7% (five ideas).
Usefulness: +5.5% (one idea), +11.5% (five ideas).
For five-ideas, low-DAT writers saw increases up to: well written +26.6%, enjoyment +22.6%, boring −15.2%. Five-ideas condition effectively equalized creativity scores across low- and high-DAT writers. Similarity analyses: AI access increased similarity of stories to each other and to AI ideas, indicating anchoring.
Similarity to average of stories in same condition: One idea b=0.871 (P<0.001); five ideas b=0.718 (P=0.003). In Human-only, similarity range spanned 8.10 points; increases represent ~10.7% (one idea) and ~8.9% (five ideas) of this range.
Similarity to AI idea: One idea +5.2% (b=4.29, P<0.001; Human-only mean 82.85); five ideas +5.0% (b=4.11, P<0.001). Exploratory ownership/ethics (post-disclosure): Evaluators imposed an ownership penalty of at least 25% on writers who used AI ideas versus human-only stories; many indicated content creators underlying models should be compensated; most supported AI use as ethical and still a “creative act,” with preferences for disclosure and crediting AI.

Discussion

The study directly addresses whether generative AI enhances human creativity and how it affects the diversity of outputs. Access to GPT-4–generated ideas causally increased third-party evaluations of story novelty and usefulness, especially when writers could request up to five ideas, suggesting that multiple AI-generated starting points enable branching ideation and overcome blank-page barriers. Emotional qualities (enjoyability, writing quality, plot twists) also improved. Importantly, gains were heterogeneous: less inherently creative writers benefited most, with substantial increases that brought their evaluated creativity on par with highly creative writers. This suggests a democratizing or equalizing effect, consistent with findings from other domains where AI supports lower-performing individuals. However, AI access also increased the similarity among stories and increased resemblance to the AI-provided ideas, evidencing anchoring and reduced variance across outputs. This yields a social dilemma: individually, writers are better off using AI because their stories are evaluated as more creative and enjoyable, but collectively, the diversity of novel content declines as stories converge toward AI-inspired patterns. Post-disclosure perceptions further suggest that while audiences consider AI-assisted writing ethical and creative, they may reassign partial ownership and credit, preferring disclosure and compensation of upstream creators. Overall, the findings highlight complementarities between human writers and generative AI for enhancing individual creativity while cautioning about systemic effects on collective novelty and ownership norms.

Conclusion

This preregistered experiment shows that generative AI can enhance individual creative writing performance—raising novelty, usefulness, and several positive emotional characteristics—with the largest benefits for less inherently creative writers and when multiple AI ideas are available. At the aggregate level, AI use leads to more similar outputs and anchoring on AI ideas, implying reduced collective diversity and a potential social dilemma if AI-assisted writing becomes widespread. The results provide early causal evidence on AI’s dual role: professionalizing individual output while narrowing variation across creators’ works. Future research should expand to longer and more complex writing tasks, different creative media (images, music), and varied contexts (e.g., product innovation). Experiments could enable open-ended, interactive AI-human collaboration, customize prompts, and introduce incentives targeting specific outcomes (e.g., extreme novelty). Studies of self-selection into AI use, personalization of AI to writer profiles, and refined, domain-general measures of usefulness would deepen understanding of when and for whom AI most effectively augments creativity without eroding diversity.

Limitations

The task was constrained to eight-sentence stories in one medium (writing) and specific adventure topics, which may limit generalizability to other creative domains or longer-form works. Writers could not customize prompts or interact iteratively with the LLM; effects might be lower bounds relative to richer AI-human interactions. The usefulness construct was adapted for this context and may not transfer uniformly to other creativity domains. The participant pool comprised typical online participants rather than professional writers, limiting external validity. Although most AI-eligible participants chose to receive ideas, optional uptake introduces some self-selection (addressed via intention-to-treat). Similarity analyses rely on embeddings-based cosine similarity, which captures textual proximity but may not fully reflect conceptual uniqueness. Finally, findings are tied to the specific model (GPT-4) and time; rapid AI advances could alter effects.

Related Publications

Explore these studies to deepen your understanding of the subject.

Interdisciplinary Studies

Generative AI enhances individual creativity but reduces the collective diversity of novel content

A. R. Doshi and O. P. Hauser

Computer Science

The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation

J. Desdevises

Computer Science

The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation

J. Desdevises

Education

AI and ethics: Investigating the first policy responses of higher education institutions to the challenge of generative AI

A. Dabis and C. Csáki

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny