logo
ResearchBunny Logo
ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

Computer Science

ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models

M. Reza, P. Dushniku, et al.

Discover how ABScribe revolutionizes the writing process by allowing seamless exploration of writing variations with LLM prompts. This innovative interface was developed by Mohi Reza, Peter Dushniku, Tovi Grossman, Nathan Laundry, Michael Yu, Michael Liut, Joseph Jay Williams, Ilya Musabirov, Kashish Mittal, and Anastasia Kuzminykh from the University of Toronto.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the challenge of revising text by exploring multiple writing variations, a core activity in effective writing. Existing writing tools largely support linear revision histories that make it difficult to simultaneously generate, organize, and compare multiple alternatives without overwriting content or cluttering documents. With the rise of LLMs that can easily generate variations, managing these proliferating alternatives becomes increasingly difficult. Drawing on HCI principles advocating consideration of multiple alternatives and the non-linear, iterative nature of revision, the authors hypothesize that supporting parallel exploration and visually structured management of multiple variations can reduce workload and improve the revision process. They introduce ABScribe, a human-AI co-writing interface featuring Variation Fields, a Popup Toolbar, a Variation Sidebar, AI Modifiers, and an AI Drafter to support rapid, in-context exploration and organization of variations.
Literature Review
The related work situates ABScribe at the intersection of (1) HCI and design practices promoting parallel exploration of multiple alternatives to avoid fixation and premature commitment; (2) writing and revision research emphasizing non-linear, recursive, and granular revision beyond surface-level edits; and (3) AI-assisted writing with LLMs, including prompting strategies and interfaces that present multiple suggestions. Prior work shows benefits of multiple parallel suggestions but flags efficiency costs and linear chat logs that impede in-place, context-sensitive comparison. The authors distinguish chat-based interfaces (e.g., ChatGPT) from in-place editing tools (e.g., Wordcraft, CoAuthor, Grammarly, Notion AI, Wordtune), noting that most overwrite content and preserve linear histories, complicating simultaneous variation management. ABScribe adopts in-place editing and introduces mechanisms for non-linear storage, organization, and comparison of multiple variations, and prompt reuse via reified interface elements.
Methodology
System design: Guided by four design requirements—minimize workload when exploring multiple variations; support visually structured variation management; enable context-sensitive comparison and revision; and support revision-centric, reusable, non-linear LLM usage—the authors iteratively designed five interface elements: (1) Variation Fields for storing multiple variations within flexible text segments without overwriting; (2) Popup Toolbar with hover-to-preview and click-to-select/delete interactions for rapid, in-context comparison; (3) Variation Sidebar organizing all Variation Fields and their variations in an accordion UI; (4) AI Modifiers that reify user-written LLM instructions into labeled, reusable buttons applicable across Variation Fields; and (5) AI Drafter enabling in-place generation via @ai <prompt>, with accept/modify/discard controls. Design process included paper prototyping, cognitive walkthroughs, and high-fidelity web prototypes with pilot users. Evaluation: A within-subjects controlled study (N=12; 5 women, 7 men; ages 18–34; varied writing and AI-tool experience) compared ABScribe to a purpose-built Baseline (rich text editor with GPT-4 chat assistant and in-document insertion, but without Variation Fields, Popup Toolbar, Variation Sidebar, or AI Modifiers). Tasks: two guided writing tasks (LinkedIn post; email to a professor), each requiring exploration of eight variations for three text segments (subject/title, a target sentence, and a paragraph), totaling 24 variations per condition. Measures: NASA-TLX (weighted) for subjective workload; eleven 7-point Likert items targeting aspects of variation granularity, search, prompt reuse, comparison, editing, control, divergence, draft quality, intent match, diversity, and perceived clutter. Procedure: Participants used both conditions (counterbalanced), received demonstrations, completed two 15-minute tasks, then NASA-TLX and Likert scales after each; followed by a 30-minute semi-structured interview. Analysis: Pairwise one-tailed t-tests (B < A for workload; B > A for perceptions) on aggregated scores; reflexive thematic analysis of interviews for qualitative insights; a priori power analysis indicated ability to detect d ≥ 0.8 with N=12, α=0.05, power=0.8.
Key Findings
Quantitative: Compared to the Baseline, ABScribe significantly improved summed agreement on the efficacy of the revision process (d = 2.41, p < 0.001) and significantly reduced NASA-TLX subjective task workload (d = 1.20, p < 0.001). Qualitative themes: - F1 (Variation management): Non-linear storage and navigation reduced pressure to commit early and supported exploration of more variations without clutter. - F2 (Mixed): Improved ability to explore finer-grained variations (e.g., sentences) in context, facilitated by in-place hover comparisons and easy AI application. - F3 (AI integration): ABScribe nudged users toward imperative, action-oriented prompts compared to conversational styles in chat. - F4 (AI integration): Users authored more generalizable, atomic prompts and combined them (“stacked”) via reusable AI Modifiers. - F5 (Variation management): Reduced document clutter was a major benefit; a minority preferred linear clutter for familiar workflows. - F6 (Variation management): Easier variation storage and comparison decreased overhead and cognitive load; for long, side-by-side comparison of large blocks, sequential viewing could be helpful, partially addressed by the Variation Sidebar. - F7 (Variation management): In-context viewing of variations aided coherence and fit within surrounding text. - F8 (Mixed): Reduced context switching—both for managing variations and for invoking AI within the editor—lowered effort and time. - F9 (AI integration): Prompt reuse via AI Modifiers substantially reduced memory load and rewriting effort.
Discussion
The findings address RQ1 by showing that ABScribe’s ensemble improves user perceptions of the revision process through features that support parallel, structured, and in-context exploration of multiple variations, and scaffolded, reusable LLM interactions that encourage direct, generalizable prompting. For RQ2, ABScribe reduces workload by minimizing clutter, simplifying variation storage and comparison, maintaining context during edits, and reducing context switching for LLM use. Design implications include: (D1) Emphasize parallel storage to encourage exploration of numerous, granular ideas; (D2) Provide complementary linear viewing mechanisms (e.g., enhanced Variation Sidebar or toggleable exploded views) for scenarios better served by linear representations; (D3) Scaffold prompts within discrete, action-oriented UI elements to promote succinct, imperative prompting; and (D4) Reify prompts into a customizable, reusable toolkit (AI Modifiers) while ensuring users can refine both AI outputs and instructions given AI unpredictability.
Conclusion
The authors present ABScribe, a human-AI co-writing interface enabling rapid, visually structured exploration and organization of multiple writing variations. In a within-subjects study with 12 writers, ABScribe significantly reduced subjective workload and improved perceptions of the revision process compared to a chat-based baseline. The interface’s non-linear variation management and reified, reusable AI interactions fostered fine-grained, in-context variation exploration and more direct, generalizable prompting workflows. The work informs HCI design of writing interfaces for managing proliferating AI-generated alternatives and suggests broader applications of parallel storage and prompt reification to other creative domains.
Limitations
- Baseline control: A custom-built baseline (rather than real-world systems) was used to maintain control and consistency, which may affect ecological validity. - External validity: Study limited to English-language tasks and a single 1.5-hour session with guided scenarios; results may not generalize to other languages, longer-term use, or self-chosen tasks. - AI risks: Potential for LLM hallucinations/confabulations; ABScribe’s AI features may introduce erroneous content, particularly in high-stakes contexts. - Instrumentation: No logging of detailed interaction metrics (e.g., word counts by human vs AI, feature usage frequencies), limiting fine-grained behavioral analysis.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny