Medicine and Health

DKE-Research at SemEval-2024 Task 2: Incorporating Data Augmentation with Generative Models and Biomedical Knowledge to Enhance Inference Robustness

Y. Wang, Z. Wang, et al.

Discover groundbreaking research by Yuqi Wang and colleagues on enhancing the safety and reliability of natural language inference in clinical trial report analysis. This innovative approach uses generative models and biomedical knowledge graphs to create diverse synthetic data, leading to significant improvements in NLI performance.

00:00

~3 min • Beginner • English

Index

Abstract

Safe and reliable natural language inference is critical for extracting insights from clinical trial reports but poses challenges due to biases in large pre-trained language models. This paper presents a novel data augmentation technique to improve model robustness for biomedical natural language inference in clinical trials. By generating synthetic examples through semantic perturbations and domain-specific vocabulary replacement and adding a new task for numerical and quantitative reasoning, we introduce greater diversity and reduce shortcut learning. Our approach, combined with multi-task learning and the DeBERTa architecture, achieved significant performance gains on the NLI4CT 2024 benchmark compared to the original language models. Ablation studies validate the contribution of each augmentation method in improving robustness. Our best-performing model ranked 12th in terms of faithfulness and 8th in terms of consistency, respectively, out of the 32 participants.

Publisher

SemEval-2024

Published On

Authors

Yuqi Wang, Zeqiang Wang, Wei Wang, Qi Chen, Kaizhu Huang, Anh Nguyen, Suparna De

DOI

https://doi.org/https://doi.org/10.48550/arxiv.2404.09206

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Population Pharmacokinetic and Exposure–Response Analysis of Finerenone: Insights Based on Phase IIb Data and Simulations to Support Dose Selection for Pivotal Trials in Type 2 Diabetes with Chronic Kidney Disease

N. Snelder, R. Heinig, et al.

Engineering and Technology

Extracting accurate materials data from research papers with conversational language models and prompt engineering

M. P. Polak and D. Morgan

Mathematics

Practical parameter identifiability and handling of censored data with Bayesian inference in mathematical tumour models

J. Porthiyas, D. Nussey, et al.

Medicine and Health

Risk factors for and pregnancy outcomes after SARS-CoV-2 in pregnancy according to disease severity: A nationwide cohort study with validation of the SARS-CoV-2 diagnosis of Nordic Federation of Societies of Obstetrics and Gynecology (NFOG)

A. J. M. Aabakke, T. G. Petersen, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny