logo
ResearchBunny Logo
Abstract
This paper addresses the limitation of existing medical language models primarily focusing on English by presenting a multilingual medical language model. The contributions are threefold: (1) a multilingual medical corpus (MMedC) with 25.5B tokens in six languages; (2) a multilingual medical multi-choice question-answering benchmark (MMedBench) with rationale; and (3) an evaluation of several open-source LLMs, including those further trained on MMedC. The final model, MMed-Llama 3 (8B parameters), surpasses other open-source models on MMedBench and English benchmarks, even rivaling GPT-4.
Publisher
Nature Communications
Published On
Sep 27, 2024
Authors
Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie
Tags
multilingual
medical language model
question-answering
MMedC
MMedBench
evaluation
GPT-4
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny