logo
ResearchBunny Logo
Abstract
Safe and reliable natural language inference (NLI) is crucial for clinical trial report (CTR) analysis. This paper introduces a novel data augmentation technique to improve the robustness of biomedical NLI models. Using generative models (like GPT-3.5) and biomedical knowledge graphs, the authors generate synthetic data through semantic perturbations, domain-specific vocabulary replacement, and a new numerical reasoning task. This increased data diversity reduces shortcut learning. Combining this with multi-task learning and the DeBERTa architecture yielded significant performance gains on the NLI4CT 2024 benchmark, achieving a 12th rank in faithfulness and 8th in consistency among 32 participants. Ablation studies validate the contribution of each augmentation method.
Publisher
SemEval-2024
Published On
Authors
Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De
Tags
natural language inference
data augmentation
synthetic data
biomedical models
NLI performance
multi-task learning
DeBERTa
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny