Computer Science
ABScribe: Rapid Exploration of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language Models
M. Reza, P. Dushniku, et al.
The paper addresses the challenge that current writing tools primarily support linear revision histories, making it difficult to simultaneously consider and manage multiple alternative versions during revision—a process that in practice is iterative, granular, and non-linear. While LLMs make it easier to generate many variations, existing chat-based and in-place editing workflows make such variations hard to store, compare, and modify without clutter or overwriting. The authors hypothesize that supporting parallel exploration of multiple variations, in a visually structured and context-preserving manner, will reduce workload and improve writers’ perceptions of the revision process. They present ABScribe, a human-AI co-writing interface designed to enable rapid generation, organization, and in-place comparison of multiple variations, and evaluate it against a widely used baseline workflow. The research questions focus on how ABScribe influences user perceptions of the revision process (RQ1) and subjective task workload (RQ2).
The related work situates the problem at the intersection of HCI practices encouraging parallel exploration of alternatives to avoid fixation and improve idea quality, and writing research emphasizing revision as a recursive, non-linear process that includes deep idea-level changes. Prior HCI work has shown benefits of considering multiple alternatives and explored novel text-editing affordances (e.g., reified selections, variantlets), but more support is needed for simultaneously managing multiple textual variations. Work on LLM-powered tools highlights prompt-engineering challenges, non-determinism, and systematic exploration, with limited understanding of how to organize prolific AI outputs in writing. The authors contrast chat-based interfaces (intuitive but linear, burying variations in chat logs) with in-place editing (tighter WYSIWYG integration but often overwriting past text or relegating it to linear histories). Systems like Wordcraft, CoAuthor, and commercial tools (Grammarly, Wordtune) provide in-place suggestions but do not enable concurrent management of many variations. This motivates a non-linear, object-oriented interaction approach (e.g., reification and reuse) to better support exploration and management of multiple LLM-assisted text variations.
Design and Implementation: The authors derive four design requirements: (R1) minimize task workload while exploring multiple variations; (R2) provide visually structured management of variations; (R3) support context-sensitive comparison and revision; and (R4) enable revision-centric, reusable, non-linear LLM usage. ABScribe comprises five interface elements: (i) Variation Components—reified, flexible text segments that store multiple human- or AI-generated variations without overwriting; (ii) Hover Buttons—dynamic controls above a selected component enabling rapid, in-context preview and selection of each variation; (iii) Variation Accordion—a side pane organizing all Variation Components and their variations for navigation and side-by-side viewing; (iv) AI Buttons—reified, reusable buttons auto-created from user-entered LLM instructions (editable prompts and labels) that can be applied to different text segments; and (v) AI Insert—inline '@ai
Study Design: A within-subjects evaluation compared ABScribe to a Baseline interface representative of current workflows (rich-text editor, chat-based AI assistant akin to ChatGPT, and direct AI insertion without copy-paste). Participants (N=12; ages 18–34; 5 women, 7 men; proficient in English; varied writing genres and AI experience) performed two guided tasks (counter-balanced): crafting a LinkedIn post to seek a copywriting job and writing an email to a professor to introduce themselves. For each scenario, participants drafted text using provided prompts and then, with either ABScribe or Baseline, explored 8 variations (e.g., change length, formality, word diversity, add emojis, plus two user-chosen) for 3 distinct segments (title/subject line, third sentence of second paragraph, entire third paragraph).
Measures and Procedure: After training on each interface, participants had 15 minutes per task. Post-task, they completed NASA-TLX and 11 Likert-scale items assessing perceptions of the revision process (e.g., ease of creating, storing, comparing, editing, controlling, and diversifying variations; perceived draft quality; intent match; and document clutter). Sessions (~1.5 hours) concluded with a 30-minute semi-structured interview. Data included transcripts, observations, and ratings.
Analysis: Interview data were coded via reflexive thematic analysis. For quantitative comparisons, pairwise one-sided t-tests were used to compare summed NASA-TLX scores (hypothesis Baseline < ABScribe is worse; i.e., ABScribe reduces workload) and summed Likert-scale agreement (ABScribe > Baseline), with checks for normality. An a priori power analysis indicated 80% power to detect at least d = 0.8 with N = 12 at α = 0.05.
Quantitative outcomes: Compared to the Baseline, ABScribe yielded a significant increase in summed agreement on the efficacy of the revision process (d = 2.41, p < 0.001) and a significant reduction in NASA-TLX subjective workload (d = 1.20, p < 0.001), addressing RQ1 and RQ2.
Qualitative themes:
- F1: Reduced pressure to commit early; ABScribe’s non-linear variation storage (Variation Components, Hover Buttons, Accordion) encouraged exploring a greater number of variations before selecting.
- F2: Enabled finer-grained, in-context edits; users more readily worked at sentence-level while maintaining awareness of surrounding text, though some raised coherence concerns across small edits.
- F3: Prompt style shift; AI Buttons nudged toward imperative, concise prompts versus more conversational (and sometimes anthropomorphic) phrasing in chat.
- F4: Prompt generality and composability; users designed more generalizable, atomic prompts for reuse and sometimes “stacked” them to emulate complex instructions, trading off against highly specific, nuanced prompts used in chat.
- F5: Reduced document clutter; in-place, non-linear variation storage minimized walls of text and scrolling, though a minority preferred visible “messy” linear piles for familiar workflows.
- F6: Easier variation management; Hover Buttons and structured storage simplified tracking and comparison versus locating scattered versions in linear documents.
- F7: Context-preserving comparison; rapid in-place previews helped assess fit and cohesion within paragraphs, unlike linear lists or separate chats that obscured context.
- F8: Fewer context switches; in-place LLM use and integrated storage reduced effort compared to navigating version histories or external documents/chats.
- F9: Prompt reuse; reified AI Buttons lowered memory load and typing effort, making reuse straightforward and efficient.
Findings indicate that non-linear, object-oriented interactions for revision improve both perceived process efficacy and workload when exploring multiple LLM-assisted variations. ABScribe’s Variation Components and Hover Buttons offer an alternative to linear version histories, aligning with the recursive, granular nature of revision while preserving context. The AI Buttons scaffold shifted users from conversational prompting toward imperative, reusable instructions, enhancing efficiency but sometimes reducing nuance. The system’s structure mitigated clutter, eased variation comparison and management, and reduced context-switching, particularly beneficial for fine-grained edits.
The authors discuss implications for tool design: integrating non-linear revision controls into existing editors; scaffolding prompts around task-focused UI elements to influence prompt style and reuse; and extending beyond systematic exploration to systematic evaluation by connecting variation authoring with A/B testing frameworks (e.g., MOOClets, UpGrade, PlanOut) to measure real-world outcomes. Overall, the work suggests that aligning interface affordances with established theories of revision, and designing for parallel exploration, can help avoid premature fixation and support better variation discovery.
The paper introduces ABScribe, a human-AI co-writing interface comprising Variation Components, Hover Buttons, the Variation Accordion, AI Buttons, and AI Insert to support rapid, structured exploration of multiple text variations. In a within-subjects study with 12 writers, ABScribe significantly reduced subjective task workload (d = 1.20, p < 0.001) and increased agreement on revision-process efficacy (d = 2.41, p < 0.001) compared to a representative AI-integrated baseline. The study reveals a preference for non-linear over linear revision, especially for fine-grained edits, and shows that prompt scaffolding encourages reusable, imperative prompts. Future work includes integrating systematic evaluation (e.g., A/B tests) for variation selection and exploring broader contexts and languages, as well as longer-term, in-the-wild deployments.
Two main limitations affect external validity: (1) The evaluation focused on English-language tasks and participants, although the underlying LLM supports multiple languages; generalizability to other languages was not assessed. (2) The study was constrained to a single ~1.5-hour lab-like session with two guided tasks; longer-term, self-selected writing contexts may yield different behaviors and outcomes.
Related Publications
Explore these studies to deepen your understanding of the subject.

