Interdisciplinary Studies

Establishing the importance of co-creation and self-efficacy in creative collaboration with artificial intelligence

J. Mcguire, D. D. Cremer, et al.

Across two experiments with advanced human–AI interfaces, the authors found people were most creative writing poetry on their own rather than editing AI‑generated drafts; however, the deficit vanished when people co‑created with AI, with creative self‑efficacy identified as a key mechanism. This research was conducted by Jack McGuire, David De Cremer, and Tim Van de Cruys.

00:00

~3 min • Beginner • English

Index

Introduction

Generative AI and large language models have spurred interest in human–AI collaboration for creative tasks, with several studies suggesting human-like creativity in narrow domains and potential augmentation of human creativity. However, limitations persist: expert writers outperform current LLMs; AI-generated writing is often repetitive and stylistically homogeneous, reducing originality. Prior work frequently relies on lay evaluations rather than expert (domain) judges, and often lacks rigorous control over experimental procedures when using public LLMs, compromising internal validity due to evolving model behavior. The authors propose that the role humans occupy when collaborating with AI—editor versus co-creator—shapes creative outcomes via effects on creative self-efficacy. Editing AI defaults may anchor and constrain human input, reduce autonomy and intrinsic motivation, and weaken self-efficacy. In contrast, co-creation can preserve improvisation, autonomy, and motivation, thereby bolstering self-efficacy and creativity. The paper focuses on poetry, a prototypical creative domain, to test these role-based hypotheses.

Literature Review

Prior literature documents both capabilities and shortcomings of generative AI in creative domains. Studies report human-like creativity in specific contexts, but experts note AI outputs can be arbitrary, repetitive, and stylistically homogeneous, undermining novelty and originality. Methodological issues include reliance on lay evaluations rather than expert judges (CAT as gold standard), limited examination of human–AI dyads and collaborative processes, and threats to internal validity when using public LLMs (prompt adherence uncertainty, model drift over time). Theoretical framing draws on role theory, self-determination, and creativity research: editing defaults can anchor individuals, reduce autonomy and intrinsic motivation, and impede divergent thinking; co-creation aligns with cyborg collaboration archetypes, fostering intrinsic motivation and self-efficacy, and enabling iterative, open-ended exploration.

Methodology

Two controlled experiments evaluated how human–AI interface design and role assignment (editor vs. co-creator) affect creativity in poetry writing, using expert evaluations via the Consensual Assessment Technique (CAT). Both studies used custom-built interfaces integrating a state-of-the-art neural poetry generation system (recurrent encoder-decoder) trained on filtered non-poetic CommonCrawl text (500 million words; ~15k vocabulary). The system enforces topical coherence via non-negative matrix factorization topic models and ABAB rhyme constraints, sampling 2048 candidate lines per position and selecting via a global optimization scoring rhyme compliance, topical alignment, syllable count, and log-probability criteria. Study 1: Participants (N=101; 6 failed attention check and were excluded; final N=96 for human and human–AI, plus AI-only condition N=50) recruited via Prolific. Demographics: mean age 27.23 (SD=8.97), 76% female. Random assignment to: (a) Human condition (n=48): write an 8-line poem (two 4-line stanzas), no time limit; (b) Human–AI condition (n=48): receive a system-generated 8-line poem and use an advanced interface to edit freely (direct line edits, dropdown alternative lines updating in real time with rhyme coherence when lines 1/2 edited affecting lines 3/4); (c) AI condition (n=50): poetry generated autonomously by the system (benchmark). Measures: self-reported creativity (1–40 scale); expert creativity via CAT (10 professional poets, blinded to conditions, each rated all 146 poems on 1–40 scale; randomized order; instructed to use full range). Study 2: Participants (N=152; no attention check failures; mean age 35.11, SD=12.06; 34.9% female) recruited via Prolific. Random assignment to: (a) Human (n=51); (b) Human–AI editor (original interface; n=50); (c) Co-creator human–AI (redesigned interface; n=51) where user selects topical theme (one of 100 triads) and alternates line-by-line with the AI in an iterative turn-taking process; users can edit any content at any stage; ABAB rhyme maintained (line 4 rhymes with line 2; line 8 with line 6). Measures: creative self-efficacy (Tierney & Farmer 3-item scale, α=0.92; 1–7), self-reported creativity (1–40), expert creativity via CAT (10 different poets, paid, blinded). Ethical approval obtained from NUS IRB (BIZ-MNO-20-0213); informed consent; eligibility: English first language; Prolific approval ≥97%; attention checks used; randomization via front-end programming. Technical implementation: Backend Flask (Python), storage via Firebase, deployment on Google Cloud Platform with Gunicorn; front-end with Jinja2, Bulma, jQuery, AJAX; HTML/CSS/JavaScript for interface and functions.

Key Findings

Study 1: Self-reported creativity was higher in the human condition (M=23.88, SD=8.40) than in the human–AI editor condition (M=19.13, SD=10.40), p=0.016, Cohen’s d=0.47. Expert CAT ratings differed across conditions, ANOVA F(2,145)=63.48, p<0.001, partial η²=0.47. Post hoc tests: Human (M=18.23, SD=4.94) > Human–AI editor (M=12.55, SD=4.32), p<0.001; Human > AI-only (M=9.45, SD=2.00), p<0.001; Human–AI editor > AI-only, p<0.001. Study 2: Creative self-efficacy differed across conditions, ANOVA F(2,149)=7.01, p=0.001, partial η²=0.09. Pairwise: Human (M=4.71, SD=1.45) > Human–AI editor (M=3.74, SD=1.43), p=0.001; Co-creator human–AI (M=4.62, SD=1.43) > Human–AI editor, p=0.003; Co-creator human–AI did not differ from Human, p=0.755. Self-reported creativity differed, ANOVA F(2,149)=6.08, p=0.003, partial η²=0.08. Pairwise: Human (M=23.29, SD=9.24) > Human–AI editor (M=16.96, SD=9.56), p=0.001; Co-creator human–AI (M=21.16, SD=9.03) > Human–AI editor, p=0.025; Co-creator human–AI vs Human, p=0.247. Expert CAT ratings differed, ANOVA F(2,149)=5.77, p=0.004, partial η²=0.07. Pairwise: Human (M=15.74, SD=4.98) > Human–AI editor (M=12.53, SD=4.45), p=0.001; Co-creator human–AI (M=14.70, SD=5.06) > Human–AI editor, p=0.026; Co-creator human–AI vs Human, p=0.278. Mediation: Creative self-efficacy mediated the effect of co-creator vs editor on expert creativity (dummy: 0=Human–AI editor, 1=Co-creator), indirect effect B=0.78, SE=0.39, 95% CI [0.16, 1.68]. No significant indirect effect for Human vs Co-creator (dummy: 0=Co-creator, 1=Human), B=0.09, SE=0.29, 95% CI [−0.50, 0.78]. Overall, editing AI-generated defaults reduced self-efficacy and creativity; co-creation restored self-efficacy and expert-rated creativity to levels comparable to human-only writing.

Discussion

Findings support the hypothesis that human role framing in AI collaboration critically shapes creative outcomes through creative self-efficacy. Editing a default AI output narrows autonomy, increases anchoring to defaults, and dampens intrinsic motivation, yielding lower self-efficacy and weaker expert-rated creativity. In contrast, co-creation—iterative, turn-taking generation with user-set topical themes—nurtures autonomy, improvisation, and intrinsic motivation, sustaining self-efficacy and elevating creativity to human-only levels. This challenges assumptions that human oversight as an editorial endpoint suffices to boost creativity and aligns with human-centered HCI principles emphasizing empowering interfaces. The results suggest designing generative AI tools to position users as proactive co-creators rather than reactive editors to better harness AI for creative work.

Conclusion

Across two rigorously controlled experiments, the study demonstrates that generative AI augments human creativity most effectively when interfaces and collaboration modes place users in a co-creator role, thereby maintaining creative self-efficacy. Editing AI defaults leads to a creativity deficit relative to human-only writing, while co-creation eliminates this deficit and achieves expert-rated creativity comparable to human-only output. Contributions include experimental evidence with expert CAT evaluations, identification of creative self-efficacy as a mechanism, and practical design guidance for human–AI creative tools. Future research should test long-term effects, boundary conditions by writer expertise, and generalization across more advanced AI models and real-world settings.

Limitations

The experiments were conducted in controlled, artificial settings, which may limit generalizability to complex real-world creative environments where task and organizational climate factors influence outcomes. Longitudinal impacts of co-creation versus editing were not assessed; effects on self-efficacy and creativity over time remain unknown. Writer expertise was not examined as a boundary condition; benefits of co-creation may differ for novices versus experts. The study did not test more advanced generative AI models; while the authors expect role effects to generalize, this assumption requires validation with systems such as GPT-4o or Claude 3.5.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

The impact of educational digitalization on the creativity of students with special needs: the role of study crafting and creative self-efficacy

Q. Zhang, B. Shi, et al.

Interdisciplinary Studies

The political and social contradictions of the human and online environment in the context of artificial intelligence applications

R. Rakowski and P. Kowaliková

Economics

Economic and legal approaches to the humanization of FinTech in the economy of artificial intelligence through the integration of blockchain into ESG Finance

O. P. Kazachenok, G. V. Stankevich, et al.

Environmental Studies and Forestry

Artificial intelligence and ESG in resources-intensive industries: Reviewing the use of AI in fisheries, mining, plastics, and forestry

R. Deberdt, P. L. Billon, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny