logo
ResearchBunny Logo
Abstract
Limited access to textual data, particularly de-identified mental health records, hinders NLP development in the clinical domain. This study presents a method for generating artificial clinical documents using discharge summaries from a large mental healthcare provider and an intensive care unit. The generated text undergoes intrinsic evaluation (text preservation, memorization of training data, clinical validity) and extrinsic evaluation (impact on NLP text classification). Results show that using artificial data for training yields comparable classification results to using original data, and that a small amount of original data is sufficient to condition generation, minimizing the risk of retaining sensitive information. This approach is promising for creating shareable artificial clinical data to advance computational methods using healthcare data.
Publisher
npj Digital Medicine
Published On
May 14, 2020
Authors
Julia Ive, Natalia Viani, Joyce Kam, Lucia Yin, Somain Verma, Stephen Puntis, Rudolf N. Cardinal, Angus Roberts, Robert Stewart, Sumithra Velupillai
Tags
NLP
mental health
artificial data
clinical documents
text classification
sensitive information
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs—just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny