This paper introduces PURPLE, a machine learning method designed to accurately estimate the relative prevalence of underreported health conditions, such as intimate partner violence (IPV). PURPLE addresses the challenge of skewed prevalence estimates caused by underreporting, which varies across different demographic groups. The method leverages positive unlabeled learning but avoids restrictive assumptions about data separability. Experiments on synthetic and real health data demonstrate PURPLE's superior accuracy in recovering relative prevalence compared to existing methods. Applying PURPLE to two large emergency department datasets reveals higher IPV prevalence among Medicaid recipients, non-white patients, unmarried individuals, lower-income populations, and those residing in metropolitan counties. Correcting for underreporting using PURPLE yields more plausible estimates than methods that don't account for underdiagnosis.
Publisher
npj Women's Health
Published On
May 15, 2024
Authors
Divya Shanmugam, Kaihua Hou, Emma Pierson
Tags
machine learning
health conditions
underreporting
intimate partner violence
prevalence estimation
demographics
data accuracy
Related Publications
Explore these studies to deepen your understanding of the subject.