logo
ResearchBunny Logo
A Computational Analysis of Vagueness in Revisions of Instructional Texts

Computer Science

A Computational Analysis of Vagueness in Revisions of Instructional Texts

A. Debnath and M. Roth

This research by Alok Debnath and Michael Roth dives into the intricacies of vagueness in instructional texts from the WikiHowToImprove dataset. By analyzing edits involving vagueness and developing a novel neural model to enhance clarity in instructions, they demonstrate significant advancements over existing techniques. Tune in to discover these insightful findings!... show more
Introduction

Instructional texts aim to clearly and concisely describe the actions needed to accomplish tasks. WikiHow provides an extensive set of instructional guides with publicly available revision histories that capture user edits. The wikiHowToImprove dataset compiles these revisions, covering phenomena from typos to clarifications of ambiguity and vagueness. This work focuses on lexical vagueness—lexemes with a single but nonspecific meaning—operationalized via changes to the main verb between original and revised instructions. If the revised version’s main verb is contextually more specific, the original is considered vague. Examples show original verbs like make, go, get being revised to design, visit, purchase, which add specificity. Identifying vague versus clarified instructions is a first step toward automatic text editing for clarification based on linguistic criteria. The paper creates a dataset of vague and clarified instructions, analyzes them using FrameNet frames, and evaluates neural models on a pairwise ranking task to distinguish original from revised versions, improving over existing baselines.

Literature Review

The study builds on using revision histories as corpora for NLP tasks (e.g., Wikipedia and WikiHow). Prior work categorized edit intentions in Wikipedia (Yang et al., 2016; 2017) and in WikiHow (Anthonio et al., 2020). Traditional computational treatments of vagueness often rely on logical representations (DeVault and Stone, 2004; Tang, 2008), whereas related work has examined context-dependent resolution of vague expressions such as color references (Meo et al., 2014), detection of vague definitions in ontologies (Alexopoulos and Pavlopoulos, 2014) and in privacy policies (Lebanoff and Liu, 2018), and uncertainty in historical texts (Vertan, 2019). For semantic characterization, the paper leverages FrameNet frame relations, relating to research on hypernymy/hyponymy detection using lexicosyntactic patterns and distributional methods (Snow et al., 2004; Shwartz et al., 2016; Roller et al., 2018).

Methodology

Data creation and preprocessing: Starting from the noisy wikiHowToImprove corpus of revision histories, the authors cleaned misspellings using the Enchant Python API, POS-tagged and dependency-parsed sentences with Stanza, and filtered sentence pairs to lengths between 4 and 50 words. They extracted a sub-corpus of instructional sentence pairs in which both original and revised versions met at least one criterion: imperative form (root verb without nominal subject), instructional indicative (nominal subject of root verb is 'you', 'it', or 'one'), or passive form with 'let'. They further filtered to pairs with character edit distance less than 10 to focus on minimal edits, typically verb changes, and to exclude additional edits or spam. This yielded 41,615 sentences.

Verb frame analysis: Using the INCEpTION FrameNet Tools neural parser, they identified frames evoked by the root verbs in original and revised sentences, and used the NLTK FrameNet API to determine frame-to-frame relations. Most edits fell into: Subframe-of (revised frame as a subevent of the original), Inherits-from (revised elaborates the original), and Uses (revised uses/weakly inherits properties of the original). Cases not covered by FrameNet or parser failures were grouped as Other. The analysis confirmed revised verbs are typically more specific than original verbs.

Pairwise ranking experiments: The task is to distinguish original versus revised versions. The proposed neural architecture uses two initial BiLSTM encoders (one per sentence version), a joint BiLSTM layer over concatenated hidden states, and re-encoding BiLSTMs conditioned on the joint representation. Inputs are token/subword embeddings from FastText or BERT (300-dim). The final classification layer uses self-attention and softmax to predict labels for each version (0=original, 1=revised), trained with cross-entropy. Hyperparameters: LSTM1A/1B/2A/2B hidden size 256, joint LSTMAB hidden size 512, dropout 0.2 (not applied to BiLSTMs or self-attention), batch size 32, learning rate 1e-5, 5 training epochs. Data splits follow wikiHowToImprove partitions: 30,044 training, 6,237 test, and 5,334 validation pairs.

Key Findings
  • Dataset: 41,615 instructional sentence revisions where the revised main verb is more specific; splits of 30,044 train, 6,237 test, 5,334 validation.
  • Frame relations: Most edits mapped to Subframe-of, Inherits-from, or Uses relations; some fell into Other due to parser limitations or missing frames.
  • Pairwise ranking performance: Baseline BiLSTM-Attention (Anthonio et al., 2020) achieves 64.08% accuracy on the filtered corpus. The proposed model with FastText reaches 71.16% accuracy (+7.08% over baseline). The model with BERT embeddings is the most accurate overall (exact number not specified), outperforming FastText and the baseline.
  • Error patterns by frame relation (validation examples): • Usage: 503 errors out of 1,965 pairs (example: Make a comic in Flash → Create a comic in Flash) • Inheritance: 352/1,793 (Check the "made in" label → Inspect the "made in" label) • Subframe: 137/926 (Let your hair dry → Allow your hair to dry) • Other: 160/443 (Next, try to sneak out... → Next, attempt to sneak out...)
  • Distinguishability by relation: Subframe-of revisions are easiest to distinguish; Uses relations are most often confused. BERT performs better than FastText on pairs without clear FrameNet relations (Other).
  • Embedding similarity and confusion: The most commonly confused verb pairs (e.g., allow/permit; choose/decide; create/make) have cosine similarity ≥ 0.8, while average verb-pair similarity is 0.47, indicating that embeddings alone can be insufficient for this task.
Discussion

The study shows that revisions in instructional texts often clarify vague instructions by making the main verb more specific, and that a neural pairwise ranking model can learn to distinguish original from revised versions. Incorporating a joint representation of both versions improves over a strong BiLSTM-Attention baseline, and contextualized embeddings (BERT) further help, especially when explicit frame relations are absent. Analysis across FrameNet relations indicates that specificity increases that correspond to subevent relations are more detectable than weaker Uses-type relations. Observed failures on near-synonymous verb substitutions suggest that semantic proximity in distributional space limits discrimination, motivating the integration of additional linguistic and discourse features (e.g., FrameNet properties, sentence position) beyond embeddings.

Conclusion

The paper presents a methodology to extract and analyze clarifications of vague instructions from noisy revision histories, creating a large dataset where revised main verbs are more specific than originals. Using FrameNet, the authors characterize the semantic relations underlying these edits, and demonstrate a pairwise ranking approach that improves over prior baselines, with further gains from contextual embeddings. This work lays groundwork for automated editing beyond grammar and style, toward linguistically informed clarification of instructional text. Future directions include expanding to other linguistic phenomena, leveraging richer FrameNet features (including roles), adding discourse/context indicators, and improving robustness to near-synonymous verb substitutions.

Limitations
  • Data noise and filtering: Source data contains typos, errors, and non-instructional content; despite cleaning and constraints (length, edit distance), residual noise may remain and could affect generalizability.
  • Focus on main verbs: The approach centers on root-verb changes; clarifications involving other parts of the sentence (arguments, modifiers, multiword expressions) may be missed.
  • FrameNet coverage and parsing: Automatic frame identification is imperfect; some verbs are missing from FrameNet or not recognized by the parser, leading to an Other category and reduced analytic granularity. Role information was not used.
  • Modeling limitations: Models struggle with near-synonymous verb pairs due to high embedding similarity, indicating that embeddings alone may be insufficient without additional linguistic features.
  • Reporting: Exact accuracy for the BERT-based model is not provided; results are summarized as outperforming FastText and baseline by about 7%, limiting precise comparison.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny