Computer Science

Generation and evaluation of artificial mental health records for Natural Language Processing

J. Ive, N. Viani, et al.

Limited access to mental health records has stifled NLP innovations in clinical settings. Researchers, including Julia Ive and Natalia Viani from Imperial College London, unveil a promising method for generating artificial clinical documents, showing that such data can match original records for training NLP models while safeguarding sensitive information.

00:00

~3 min • Beginner • English

Index

Abstract

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

Publisher

npj Digital Medicine

Published On

May 14, 2020

Authors

Julia Ive, Natalia Viani, Joyce Kam, Lucia Yin, Somain Verma, Stephen Puntis, Rudolf N. Cardinal, Angus Roberts, Robert Stewart, Sumithra Velupillai

DOI

https://doi.org/https://doi.org/10.1038/s41746-020-0267-x

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Natural language processing system for rapid detection and intervention of mental health crisis chat messages

A. Swaminathan, I. López, et al.

Medicine and Health

Natural language processing of multi-hospital electronic health records for public health surveillance of suicidality

R. Bey, A. Cohen, et al.

Medicine and Health

The evolving field of digital mental health: current evidence and implementation issues for smartphone apps, generative artificial intelligence, and virtual reality

J. Torous, J. Linardon, et al.

Medicine and Health

Cohort design and natural language processing to reduce bias in electronic health records research

S. Khurshid, C. Reeder, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny