Interdisciplinary Studies

Experimental narratives: A comparison of human crowdsourced storytelling and AI storytelling

N. Beguš

This research by Nina Beguš explores a groundbreaking framework that merges behavioral and computational experiments using fictional prompts to unveil cultural artifacts and social biases in storytelling. Delving into 250 human stories and 80 AI-generated narratives, it uncovers how AI, particularly GPT-4, portrays progressive themes in gender roles and sexuality, while highlighting the imaginative superiority of human storytelling. Discover how fiction can bridge understanding between human and AI social dynamics.

00:00

Playback language: English

Index

Introduction

This research explores the intersection of humanities and artificial intelligence by analyzing storytelling generated by both humans and large language models (LLMs). The study is rooted in the premise that cultural imagination, as reflected in fictional works, significantly influences how we build and use technology. Recent research supports this, demonstrating that beliefs about AI shape interactions with LLMs and that user attitudes towards AI are malleable. The paper aims to utilize experimental narratives, specifically focusing on the Pygmalion myth (the amorous relationship between a human and a humanoid creation), as a novel tool to examine the collective imaginary and implicit social biases reflected in storytelling. The study directly compares 250 stories from Amazon Mechanical Turk crowdworkers (collected in June 2019) and 80 stories generated by GPT-3.5 and GPT-4 (March 2023), along with 50 stories from Meta's Llama 3 70b, all in response to identical prompts. This approach allows for a controlled comparison between human and AI-generated narratives, investigating innovativeness, gender and racial biases, and cultural influences.

Literature Review

The Pygmalion myth, the central trope of the study, has a rich history in literature and film, with reinterpretations spanning centuries. While initially focused on artistic endeavors, modern interpretations increasingly involve science and technology. The myth's common motifs include a lonely, eccentric creator and a beautiful, alluring artificial human, often female and initially lacking agency. However, later narratives show the artificial human gaining free will and agency. The study draws on existing scholarship analyzing the myth's presence in literature, film, and its recent relevance to AI technology. The paper also references existing research on creativity in both human and machine writing, examining diverse methodologies for assessing originality and innovativeness.

Methodology

The study employed two experimental designs: a behavioral experiment and a computational experiment. The behavioral experiment involved soliciting stories from 250 crowdworkers on Amazon Mechanical Turk. Participants received one of two prompts: Prompt 1: "A human created an artificial human. Then this human (the creator/lover) fell in love with the artificial human." Prompt 2: "A human (the creator) created an artificial human. Then another human (the lover) fell in love with the artificial human." Participants wrote 150-500 words and answered a follow-up questionnaire gathering demographic data and details about their narratives. The computational experiment used OpenAI's GPT-3.5 and GPT-4, generating 40 stories each using the same prompts. Default settings were used for both models. To compare with an open-source model, 50 stories were generated using Meta's Llama 3 70B. The analysis combined quantitative narratological analysis with inferential statistics to examine themes, gender and sexuality, race and ethnicity, cultural influences, and narrative skill. Qualitative methods were also used for comparative reviews of human and machine-generated texts, analyzing narrative elements (plot, discourse, setting, etc.). Logistic regression was used to assess the relationship between various predictors (prompt, gender, age, education) and the gender of fictional characters.

Key Findings

The analysis revealed several key findings. First, both human and AI narratives confirmed the prevalence of the Pygmalion myth in our collective imaginary. All stories depicted the creation of artificial humans through scientific or technological means. Second, the study found that GPT-3.5 and GPT-4 generated narratives with more progressive gender roles and sexuality compared to human-written stories. GPT-4, in particular, frequently cast female characters in traditionally male roles (e.g., creator) and included same-sex relationships. Third, while AI models showed some innovation in plot twists, human stories displayed a greater range of imaginative scenarios, settings, and rhetorical complexity. GPT-generated stories often followed a formulaic structure with predictable plots and moralistic conclusions. Fourth, the study found that human-written stories exhibited significantly greater racial diversity in characters than GPT-generated stories, primarily because the human participants were asked about character race in the questionnaire. Fifth, a comparison with the open-source Llama 3 70B model revealed remarkable similarity in generated stories compared to GPT-4, indicating a potential convergence in LLM storytelling approaches. Finally, the study found that while GPT-4 with default settings produced stories comparable to average human writing, higher-quality narratives required more sophisticated prompting and parameter adjustments, showcasing the potential of human-AI collaboration in creative writing.

Discussion

The findings demonstrate the potential of using fictional prompts to explore cultural artifacts and biases in both human and AI storytelling. The significant difference in gender and sexuality representation between human and AI narratives highlights the influence of training data and value alignment in shaping AI outputs. The greater progressiveness in AI narratives might reflect the changing social norms and increased representation of diverse genders and sexualities in the datasets used to train LLMs. However, the persistence of certain biases in AI-generated descriptions reveals the need for ongoing efforts in mitigating bias in AI training and development. The study also underscores the limitations of LLMs in capturing the full range of human creativity and imaginative potential. The formulaic nature of AI-generated stories suggests that human intervention and skilled prompting are crucial for generating narratives with depth, complexity, and originality. The convergence of open-source and closed-source LLM-generated stories points towards a potential homogenization of AI storytelling, emphasizing the need for diverse and nuanced training data.

Conclusion

This study establishes a novel framework for comparing human and AI storytelling using fictional prompts. The findings reveal both similarities and differences in the way humans and LLMs engage with the Pygmalion myth, shedding light on cultural biases and the potential of AI for creative writing. Future research could investigate the impact of different prompting strategies on AI-generated narratives, explore other fictional tropes, and further analyze the relationship between LLM training data and the narratives they generate. This interdisciplinary approach bridges the gap between humanities research and AI development, offering valuable insights for both fields.

Limitations

The study's limitations include the specific prompts used, which might not fully capture the complexity of human storytelling. The sample size, while substantial, may not represent the full diversity of human and AI creative output. The reliance on specific LLMs might limit the generalizability of the findings to other models. Finally, the subjective nature of evaluating creative writing could introduce some bias in the analysis. Future studies could explore a broader range of prompts, larger datasets, and diverse LLM models to address these limitations.

Related Publications

Explore these studies to deepen your understanding of the subject.

Food Science and Technology

Comparison of the acute metabolic effect of different infant formulas and human milk in healthy adults: a randomized trial

Y. Shahkhalili, C. Monnard, et al.

Computer Science

AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays

S. Herbold, A. Hautli-janisz, et al.

Business

Knowledge sharing and innovation performance: a case study on the impact of organizational culture, structural capital, human resource management practices, and relational capital of real estate agents

C. Lee, W. Yeh, et al.

Social Work

Winners and runners-up alike?—a comparison between awardees and special mention recipients of the most reputable science award in Colombia via a composite citation indicator

J. D. Cortés and D. A. Andrade

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny