logo
ResearchBunny Logo
A Discrimination Report Card

Business

A Discrimination Report Card

P. Kline, E. K. Rose, et al.

Discover how researchers Patrick Kline, Evan K. Rose, and Christopher R. Walters have developed an innovative Empirical Bayes grading scheme that effectively measures racial biases in U.S. employers. This method not only provides insight into the grading process but also mitigates the chances of ranking errors. Their findings reveal striking patterns in corporate racial contact gaps, helping to set the stage for future explorations into bias in hiring practices.

00:00
00:00
Playback language: English
Introduction
Despite the Civil Rights Act, employer discrimination persists. This paper addresses the lack of reliable information about specific organizations' discriminatory tendencies, hindering both job seekers and corporate executives. Job seekers cannot avoid biased firms without knowing their identities, and executives lack the data to benchmark their organizations against peers. The authors aim to create a discrimination report card summarizing the relative biases of Fortune 500 companies using a massive correspondence experiment (up to 1,000 applications per firm, totaling roughly 84,000 applications). Previous work using this experiment established that applications with Black names received 7% fewer contacts than those with White names, with this penalty varying significantly across firms. This paper builds upon this by developing methods to grade firms based on their biases while ensuring statistical reliability, offering a digestible summary for diverse audiences (including lay audiences, who often struggle with interpreting point estimates and standard errors). The authors draw parallels to existing report card systems for schools, hospitals, and other institutions, highlighting the increasing demand for simple summaries of complex performance data.
Literature Review
The paper builds on existing literature on Empirical Bayes (EB) ranking methods. A substantial body of empirical research uses James-Stein shrinkage rules to rank entities like teachers, schools, and hospitals. Laird and Louis (1989) proposed directly computing posterior mean ranks, but this can be noisy, especially with many units. The recent econometrics literature acknowledges this issue, proposing methods to test hypotheses about ranks or levels of highly-ranked units. Gu and Koenker (2020) utilize non-parametric EB methods to select tail performers while controlling the False Discovery Rate (FDR). This paper generalizes this work to accommodate more than two grades and avoids treating one grade as a null hypothesis. The authors also discuss parallels to social choice literature such as Borda and Condorcet methods, highlighting the connection between their proposed method and these preference aggregation schemes.
Methodology
The authors introduce a novel grading scheme using an Empirical Bayes approach. They frame the problem as ranking pairs of firms based on noisy measurements of discrimination, where correctly ranking yields a payoff and incorrectly ranking yields a loss. The expected utility of assigning grades is formulated, leading to a simple posterior threshold rule for pairwise comparisons. The trade-off between information (rank correlation) and reliability (Discordance Rate, an analogue of FDR) is formalized using a parameter λ that controls discordance aversion. When λ = 1, the method maximizes rank correlation; when λ < 1, it prioritizes reliability by only ranking pairs with sufficiently high posterior probability. The problem of ranking all firms is recast as ranking all pairs subject to transitivity constraints, resulting in an integer linear programming problem. The solution minimizes posterior expected loss, balancing information and reliability. The authors further extend the method to incorporate weighted loss, where the magnitude of ranking mistakes is considered, and explore different weighting schemes (binary and square-weighted loss). They use a Bartlett transformation to stabilize the variance of contact rates before applying the ranking procedures. To estimate the distribution of latent discrimination levels, they employ both non-parametric maximum likelihood (NPMLE) and Efron's log-spline deconvolution approach. The hierarchical random effects model allows them to estimate separate distributions of discrimination within and between industries. The authors use the GMM method to estimate the parameters of the model, including the precision dependence.
Key Findings
The authors first apply their method to rank the contact rates of first names used in their correspondence experiment. A non-parametric deconvolution reveals two distinct clusters, corresponding to distinctively White and Black names. With a specific choice of λ (requiring 80% posterior confidence for strict ordering), their method stratifies names into two groups strongly predictive of race, but not sex. The primary application focuses on ranking firms based on their biases against Black applicants. Using λ = 0.25 (80% posterior certainty), the baseline specification yields three unique grade levels, capturing 19% of between-firm variation in contact penalties. The worst-graded firms exhibit a 24% contact gap, while the best-graded firms show only a 3% gap. Incorporating industry information, using a hierarchical random effects model, significantly improves results. The preferred four-grade ranking explains 47% of the variation in contact penalties and yields a correlation of 0.32 with the true ranks, while limiting mis-ranking to 5.2%. The worst-graded firms in this model still exhibit a contact gap of around 22%. The authors also find considerable variation in bias levels across industries, with the auto dealers/services industry showing the highest bias. Analyzing results under square-weighted loss yields a more granular grading scheme (six grades without industry effects, eight with), which explains even more of the variance in contact penalties, but maintains control over the probability of large ranking mistakes.
Discussion
The findings directly address the paper's research question by providing a reliable and informative ranking of firms based on their discriminatory practices. The significant portion of variance explained by the relatively small number of grades (especially with industry effects) demonstrates the method's effectiveness in summarizing complex data. The large differences in average contact gaps across grades highlight the substantial heterogeneity in discriminatory behavior. The use of both ordinal and cardinal information in the report cards facilitates a more nuanced understanding of firm-level bias. The high reliability of the rankings, particularly for comparing high- and low-performing firms, makes the report card robust and useful for practical applications. The research also addresses the challenge of communicating complex statistical results to both experts and lay audiences.
Conclusion
This paper introduces a novel Empirical Bayes method for constructing informative and reliable report cards that rank and grade the discriminatory conduct of firms. The method successfully balances information and reliability, producing concise summaries suitable for diverse audiences. The application to a large-scale correspondence experiment reveals substantial heterogeneity in firm-level bias and provides actionable insights for improving hiring practices. Future research could explore the effectiveness of specific interventions to reduce discrimination, focusing on policies that may be most effective for firms exhibiting high levels of bias.
Limitations
The study's generalizability might be limited by its focus on Fortune 500 companies. The correspondence experiment relies on simulated applications, which might not perfectly capture real-world applicant behavior. The authors acknowledge the potential for correlation between firm bias and the precision of its estimation, addressed by including standard errors and precision dependence in the analysis. The study could benefit from a broader range of industries.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny