Computer Science

ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

M. Reza, P. Dushniku, et al.

Discover how ABScribe revolutionizes the writing process by allowing seamless exploration of writing variations with LLM prompts. This innovative interface was developed by Mohi Reza, Peter Dushniku, Tovi Grossman, Nathan Laundry, Michael Yu, Michael Liut, Joseph Jay Williams, Ilya Musabirov, Kashish Mittal, and Anastasia Kuzminykh from the University of Toronto.

00:00

Playback language: English

Index

Introduction

Revision is a cornerstone of effective writing, often involving numerous iterations and explorations of alternative phrasing and ideas. While Large Language Models (LLMs) offer powerful tools for generating writing variations, existing interfaces struggle to manage the resulting multitude of options. Current methods, whether linear revision histories or the pasting of variations into the document, disrupt the writing flow and increase cognitive load. This paper introduces ABScribe, a novel interface designed to address these shortcomings. ABScribe aims to facilitate rapid exploration and organization of multiple writing variations within a human-AI co-writing workflow. The central hypothesis is that providing writers with effective tools for managing multiple variations simultaneously will significantly reduce workload and improve the perceived quality of the revision process. This contrasts with the current limitations where the ease of LLM-based variation generation exacerbates the difficulty in managing and comparing these variations, leading to potential cognitive overload and hindering the iterative nature of writing.

Literature Review

Existing research in HCI and design emphasizes the benefits of exploring multiple design alternatives simultaneously to avoid fixation on a single solution and encourage more creative outcomes. This principle, while well-established in design, is less developed in the context of writing. Studies on the writing process itself demonstrate that revision is a complex, iterative, and often non-linear activity, involving repeated cycles of refinement at multiple granular levels (word, sentence, paragraph). Current word processing tools do not adequately support this non-linear process, primarily offering linear representations of revision history. Recent HCI research on LLMs in writing has focused mainly on prompt engineering and exploring generative capabilities, with limited attention to the management of multiple variations generated by these models. Studies on presenting multiple AI-generated suggestions have highlighted benefits for ideation but also acknowledged efficiency costs. The authors draw on these findings, integrating the object-oriented interaction concepts with LLMs and drawing from revision control research to create an interface that manages multiple LLM variations effectively.

Methodology

To evaluate ABScribe, a controlled within-subjects user study was conducted involving 12 participants with diverse writing experience and varying levels of familiarity with AI tools. Participants completed two writing tasks—drafting an email and a social media post—each requiring the exploration of eight variations across three text segments (title/subject line, sentence, paragraph). ABScribe was compared to a carefully constructed baseline interface: a GPT-4 powered rich-text editor with a chat-based AI assistant, matching the core functionalities of ABScribe except for the specific features aimed at managing multiple variations. The two interfaces were counterbalanced across participants. Subjective task workload was measured using the NASA-TLX, complemented by 11 Likert-scale measures assessing different aspects of the revision process (variation granularity, search, prompt reuse, comparison, editing, control, divergence, draft quality, intent match, diversity, and document clutter). Semi-structured interviews following each task provided qualitative data. Pairwise one-sided t-tests were used to analyze quantitative data, comparing the summed NASA-TLX scores and Likert scale measures between ABScribe and the baseline. Reflexive thematic analysis was applied to qualitative data from the interviews.

Key Findings

The study revealed statistically significant improvements with ABScribe compared to the baseline condition. ABScribe users showed a significant reduction in subjective task workload (d = 1.20, p < 0.001) and a significant increase in positive perceptions of the revision process (d = 2.41, p < 0.001). Qualitative analysis revealed several key themes contributing to these improvements: 1. **Reduced Document Clutter:** ABScribe's non-linear storage of variations within Variation Fields, accessed via a Popup Toolbar and organized in a Variation Sidebar, significantly reduced document clutter compared to the linear approach of the baseline. This allowed participants to maintain a better overview and reduced the cognitive load associated with searching for and comparing different versions. While some participants expressed a preference for linear organization in certain situations, the overall benefit of reduced clutter was highlighted. 2. **Enhanced Variation Management:** ABScribe facilitated more efficient variation management. The ability to easily create, store, access, and compare variations in context significantly reduced the mental effort associated with tracking and revisiting previous work. Participants noted ease of comparison of small segments of text with the Popup Toolbar. 3. **Improved AI Interaction:** ABScribe encouraged the use of more concise and reusable LLM prompts through AI Modifiers, which automatically encapsulated user instructions into reusable buttons. The change to a more imperative prompt style, compared to the baseline's more conversational approach, was noted as more efficient. ABScribe's AI Drafter allowed for in-place LLM generation, reducing context switching and improving workflow. 4. **Contextual Editing:** The in-place editing and comparison of variations within ABScribe allowed for easier maintenance of context during revision. Participants felt that they were better able to see how changes to a particular section would impact the overall coherence of the document, which was not the case with the Baseline interface. 5. **Increased Granularity:** ABScribe facilitated more fine-grained revisions, with participants making changes at sentence or even word levels more easily than in the baseline.

Discussion

The findings strongly support the hypothesis that providing writers with tools for efficiently managing multiple variations significantly improves the writing and revision process. ABScribe's design elements address several limitations of existing AI-assisted writing interfaces. The non-linear approach, emphasizing parallel storage and contextual comparison of variations, aligns well with the iterative and often non-linear nature of writing. The shift towards a more action-oriented, imperative prompt style, facilitated by the AI Modifiers, is more efficient and streamlined than the more conversational approach often seen in chat-based interfaces. The reduction in cognitive load associated with managing multiple variations, along with improved contextual awareness, contributed significantly to the perceived benefits of ABScribe. The positive impact on both task workload and user perceptions of the revision process points to the importance of designing interfaces that effectively support the inherent complexity of the writing process in the age of AI.

Conclusion

ABScribe demonstrates the significant potential for improving human-AI co-writing through thoughtful interface design. The study's key contribution is the design and evaluation of an LLM-enhanced interface specifically focused on managing multiple variations efficiently. ABScribe's success in reducing workload and improving user perceptions highlights the importance of considering the unique cognitive demands of writing in the design of AI-powered tools. Future research could explore adaptations of this design to other creative domains, such as image or video editing, investigate the use of alternative methods for organizing and presenting variations, and explore the long-term impact of ABScribe on writing fluency and creativity.

Limitations

The study used a custom-built baseline interface instead of a commercially available tool, potentially impacting ecological validity. The study was limited to English-speaking writers and a relatively small sample size (12 participants) with a single 1.5-hour session. Future work should address these limitations by replicating the study with larger samples, diverse linguistic backgrounds, and comparisons against established AI writing tools. Additionally, the potential for LLM-generated hallucinations was acknowledged, and the design of ABScribe includes features such as the “AI Drafter” and “AI Modifiers” to provide some safeguards. Future research should investigate potential mitigation strategies for this issue.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

A framework for human evaluation of large language models in healthcare derived from literature review

T. Y. C. Tam, S. Sivarajkumar, et al.

Psychology

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination

A. Bhattacharjee, Y. Zeng, et al.

Psychology

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination

A. Bhattacharjee, Y. Zeng, et al.

Business

Gender stereotypes in artificial intelligence within the accounting profession using large language models

K. Leong and A. Sung

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny