This study emphasizes the importance of responsible machine learning datasets by focusing on fairness, privacy, and regulatory compliance. A large audit of computer vision datasets, particularly in biometrics and healthcare, reveals widespread issues in these areas. The authors propose a rubric for evaluating dataset responsibility and, after analyzing 60 datasets, demonstrate the urgent need for improved dataset creation methodologies to address fairness-privacy paradoxes and comply with global data protection legislation.
Publisher
Nature Machine Intelligence
Published On
Aug 12, 2024
Authors
Surbhi Mittal, Kartik Thakral, Richa Singh, Mayank Vatsa, Tamar Glaser, Cristian Canton Ferrer, Tal Hassner
Tags
machine learning
fairness
privacy
data protection
dataset responsibility
computer vision
healthcare
Related Publications
Explore these studies to deepen your understanding of the subject.