logo
ResearchBunny Logo
Towards building multilingual language model for medicine

Medicine and Health

Towards building multilingual language model for medicine

P. Qiu, C. Wu, et al.

Discover groundbreaking advancements in multilingual medical language modeling as researchers from Shanghai Jiao Tong University unveil a model that outperforms existing frameworks. With an impressive multilingual medical corpus and a unique question-answering benchmark, this study redefines the boundaries of healthcare AI.... show more
Abstract
The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience. This work contributes: (1) a multilingual medical corpus (MMedC) of ~25.5B tokens across six languages to enable auto-regressive domain adaptation; (2) a multilingual medical multiple-choice QA benchmark with rationales (MMedBench); and (3) comprehensive evaluations of open-source LLMs with and without additional training on MMedC. The final model, MMed-Llama 3 (8B), achieves superior performance among open-source models on MMedBench and strong results on English benchmarks, rivaling GPT-4. The paper releases datasets (with licensing caveats for some books), code, and models to support multilingual medical LLM development.
Publisher
Nature Communications
Published On
Sep 27, 2024
Authors
Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yanfeng Wang, Weidi Xie
Tags
multilingual
medical language model
question-answering
MMedC
MMedBench
evaluation
GPT-4
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny