Medicine and Health

Can We Trust Synthetic Data in Medicine? A Scoping Review of Privacy and Utility Metrics

B. Kaabachi, J. Despraz, et al.

This research by Bayrem Kaabachi, Jérémie Despraz, Thierry Meurers, Karen Otte, Mehmed Halilovic, Fabian Prasser, and Jean Louis Raisaro tackles the critical balance between sharing health data for research and protecting privacy. The study presents a scoping review, uncovering a significant lack of standard evaluation methods for synthetic data's privacy and utility, urging for awareness and uniform approaches in medical data sharing.... show more

Abstract

Introduction: Sharing and re-using health-related data beyond the scope of its initial collection is essential for accelerating research, developing robust and trustworthy machine learning algorithms methods that can be translated into clinical settings. The sharing of synthetic data, artificially generated to resemble real patient data, is increasingly recognized as a promising means to enable such a re-use while addressing the privacy concerns related to personal medical data. Nonetheless, no consensus exists yet on a standard approach for systematically and quantitatively evaluating the actual privacy gain and residual utility of synthetic data, de-facto hindering its adoption. Objective: In this work, we present and systematize current knowledge on the field of synthetic health-related data evaluation both in terms of privacy and utility. We provide insights and critical analysis into the current state of the art and propose concrete directions and steps forward for the research community. Methods: We assess and contextualize existing knowledge in the field through a scoping review and the creation of a common ontology that encompasses all the methods and metrics used to assess synthetic data. We follow the PRISMA-SCR methodology in order to perform data collection and knowledge synthesis. Results: We include 92 studies in the scoping review. We analyze and classify them according to the proposed ontology. We found 48 different methods to evaluate the residual statistical utility of synthetic data and 9 methods that are used to evaluate the residual privacy risks. Moreover, we observe that there is currently no consensus among researchers regarding neither individual metrics nor family of metrics for evaluating the privacy and utility of synthetic data. Our findings on the privacy of synthetic data show that there is an alarming tendency to trust the safety of synthetic data without properly evaluating it. Conclusion: Although the use of synthetic data in healthcare promises to offer an easy and hassle-free alternative to real data, the lack of consensus in terms of evaluation hinders the adoption of this new technology. We believe that, by raising awareness and providing a comprehensive taxonomy on evaluation methods that takes into account the current state of literature, our work can foster the development and adoption of uniform approaches and consequently facilitate the use of synthetic data in the medical domain.

Publisher

medRxiv

Published On

Nov 28, 2023

Authors

Bayrem Kaabachi, Jérémie Despraz, Thierry Meurers, Karen Otte, Mehmed Halilovic, Fabian Prasser, Jean Louis Raisaro

DOI

https://doi.org/10.1101/2023.11.28.23299124

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Can We Trust Synthetic Data in Medicine? A Scoping Review of Privacy and Utility Metrics

B. Kaabachi, J. Despraz, et al.

Psychology

Effectiveness of Augmented and Virtual Reality-Based Interventions in Improving Knowledge, Attitudes, Empathy and Stigma Regarding People with Mental Illnesses-A Scoping Review

T. J.l., X. H., et al.

Agriculture

A scoping review of adoption of climate-resilient crops by small-scale producers in low- and middle-income countries

M. Acevedo, K. Pixley, et al.

Medicine and Health

Can artificial intelligence improve the diagnosis and prognosis of disorders of consciousness? A scoping review

M. Bonanno, D. Cardile, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny