Introduction
Significant disparities exist in women's health across various demographic factors. Accurately quantifying these disparities is crucial for developing effective and equitable healthcare policies. Relative prevalence—the ratio of a condition's prevalence in one group to another—provides a valuable metric for identifying groups disproportionately affected. However, accurate relative prevalence estimation is hindered by widespread underreporting of many women's health conditions, including intimate partner violence (IPV). Underreporting rates often vary across groups, leading to biased estimates of relative prevalence and obscuring true health disparities. Existing epidemiological methods often rely on unavailable data (ground truth annotations, multiple tests) or unrealistic assumptions. Machine learning approaches, such as those based on positive unlabeled (PU) learning, typically assume a region in the feature space where cases are certain to be true positives, an assumption that rarely holds in healthcare settings, particularly for complex conditions like IPV where symptoms are not unique. This study aims to address these limitations by developing a novel method, PURPLE, for accurately estimating relative prevalence even when underreporting is prevalent and varies across groups.
Literature Review
The paper reviews existing methods for addressing underreporting and estimating prevalence. Epidemiological approaches often require data unavailable in many women's health contexts, such as ground truth labels or the sensitivity and specificity of diagnostic tests. Machine learning approaches, particularly those framed within the PU learning framework, have also been explored. However, many PU methods rely on restrictive assumptions, such as the existence of a region of feature space where all examples are true positives. This assumption is often unrealistic in healthcare due to the non-specificity of symptoms. The authors argue that their proposed PURPLE method offers a complementary approach that avoids these limitations.
Methodology
PURPLE (Positive Unlabeled Relative Prevalence Estimator) is a novel method designed to estimate the relative prevalence of underreported conditions. It differs from existing epidemiological and PU learning approaches by requiring no external information (like test sensitivity and specificity) and avoiding restrictive assumptions about data separability. PURPLE makes three assumptions: (1) no false-positive diagnoses; (2) random diagnosis within each group; and (3) the probability of having the condition given symptoms is constant across groups. The method jointly estimates the probability of a true positive given symptoms and the group-specific diagnosis probability. This is achieved by modeling the probability of a positive diagnosis as a product of these two probabilities. A logistic regression model is used to estimate the probability of having the condition given symptoms, and group-specific constants estimate the probability of a positive diagnosis given a true positive. These parameters are optimized by minimizing cross-entropy loss. Even though these probabilities are only estimated up to a constant multiplicative factor, it is sufficient for calculating relative prevalence. The relative prevalence is estimated by dividing the mean of the estimated probability of a true positive given symptoms in one group by the mean in another group. The paper includes procedures for checking the validity of the underlying assumptions and shows that even when the assumption of a constant probability of the condition given symptoms is violated, PURPLE provides a useful lower bound on the magnitude of disparities. The method is validated using synthetic data and semi-synthetic data generated using MIMIC-IV, demonstrating superior accuracy compared to existing PU learning methods in both separable and non-separable settings. Specifically, the paper generates semi-synthetic data by simulating a disease label given a set of symptoms in four scenarios: (1) common symptoms; (2) symptoms less common in one group; (3) endometriosis symptoms; and (4) IPV-related symptoms. The method’s application to these conditions establishes the potential for accurate and actionable relative prevalence estimation. The method was implemented in PyTorch using a single-layer neural network (equivalent to logistic regression) with L1 regularization for high-dimensional data.
Key Findings
PURPLE was applied to two large emergency department datasets, MIMIC-IV ED and NEDS, to estimate the relative prevalence of IPV across various demographic groups. The MIMIC-IV ED dataset contained 293,297 emergency department visits, while the NEDS dataset contained approximately 33 million visits, representing 143 million US emergency department visits in 2019. The analysis was restricted to female patients aged 18 and older due to better understanding of symptoms in this population and distinguishing between IPV and child abuse. The results consistently showed higher IPV prevalence in several groups across both datasets. In both datasets, patients on Medicaid had a significantly higher prevalence of IPV than those not on Medicaid (NEDS: 2.44 ± 0.07; MIMIC-IV ED: 2.65 ± 0.31). In contrast, patients on Medicare showed lower relative prevalence. While racial disparities were less consistent across the datasets, the NEDS dataset indicated a significantly lower prevalence among white patients compared to other racial groups (relative prevalence 0.82 ± 0.02). The MIMIC-IV dataset showed higher IPV prevalence among patients who were not legally married (relative prevalence 1.48 ± 0.21). Furthermore, analysis using the NEDS dataset, which included income and county population data, revealed higher IPV prevalence among patients in lower-income zip codes (relative prevalence 1.16 ± 0.02 in the bottom quartile vs. 0.87 ± 0.03 in the top quartile) and in metropolitan counties (relative prevalence 1.18 ± 0.02). Importantly, correcting for underreporting through PURPLE resulted in more plausible estimates of relative prevalence across income groups, unlike methods that did not correct for underdiagnosis. The estimates produced by not correcting for underreporting showed inconsistent patterns with respect to income. The findings suggest that IPV is less likely to be accurately diagnosed in lower-income women.
Discussion
The findings demonstrate PURPLE's ability to accurately estimate the relative prevalence of IPV, accounting for underreporting and variations in diagnosis rates across groups. The results highlight significant disparities in IPV prevalence across various socioeconomic and demographic factors. The observed higher prevalence among Medicaid recipients, unmarried individuals, lower-income groups, and those living in metropolitan areas aligns with previous research on IPV disparities. The somewhat inconsistent findings on racial disparities across the datasets underscore the importance of using large, diverse datasets to reliably assess such disparities. Correcting for underreporting is crucial for obtaining accurate and reliable estimates of health disparities. The observation that the probability of accurate diagnosis is lower among lower-income groups reflects a broader pattern of underdiagnosis among disadvantaged populations. PURPLE's ability to provide a lower bound on the magnitude of disparities even when assumptions are violated enhances its practical utility.
Conclusion
This study presents PURPLE, a novel machine learning method for accurately estimating the relative prevalence of underreported health conditions. PURPLE's success in accurately quantifying IPV disparities across various demographic groups demonstrates its value in public health research. The method offers a valuable tool for researchers and policymakers seeking to understand and address health inequities. Future work could explore applying PURPLE to other underreported conditions in women's health and beyond, including polycystic ovarian syndrome, endometriosis, and traumatic brain injuries. Expanding the method's applicability to other domains, such as quantifying disparities in police misconduct or hate speech, is also a promising avenue for future research.
Limitations
The study's reliance on emergency department data limits the generalizability of the findings to populations who do not seek care in emergency departments. While PURPLE addresses underreporting, it still relies on assumptions that might not always hold perfectly in real-world data. The assumption checks provided can help identify datasets where the method might be less reliable. Further research could explore more robust methods for handling violations of these assumptions and evaluating the impact of unobserved confounding factors.
Related Publications
Explore these studies to deepen your understanding of the subject.