Sociology

Estimating social bias in data sharing behaviours: an open science experiment

C. Acciai, J. W. Schneider, et al.

Explore the intriguing dynamics of ethnic, gender, and status biases in data-sharing willingness among scientists. This research conducted by Claudia Acciai, Jesper W. Schneider, and Mathias W. Nielsen reveals unexpected disparities in responsiveness towards data requests based on perceived ethnicity, shedding light on underlying stereotypes that may obstruct scientific collaboration.

00:00

Playback language: English

Index

Introduction

Open data sharing is crucial for scientific advancement, enabling scrutiny, re-examination of results, and meta-analysis. However, many scientists fail to share data despite promises, hindering cumulative knowledge building. Past research has focused on supply-side factors (journal requirements, funding, perceived risks) and individual factors (perceived risks, publication opportunities, effort) influencing data sharing decisions. Less attention has been given to demand-side issues – namely, whether data access varies based on the requestor's identity. This study addresses this gap by investigating potential gender, ethnic, and status biases in data sharing, hypothesizing that scientists will be less willing to share data with requestors from China, lower-status universities, with Chinese-sounding names, and with feminine-coded names, based on status characteristics theory suggesting that nationality, ethnicity, gender and institution prestige influence perceptions of trustworthiness and competence, especially in ambiguous situations. The experiment also specifically targets possible disadvantages for researchers with Chinese names and affiliations given China's significant contributions to scientific output.

Literature Review

Existing research demonstrates a correlation between various factors and data sharing willingness. These factors include journal requirements, funding incentives, disciplinary competition, perceived risks of data misuse, potential loss of publication opportunities, effort required to share data, experience level, tenure status, and gender. However, the demand-side aspects, focusing on biases in data-sharing based on the requestor's characteristics, are less explored. While some studies examined prestige bias in data sharing among economists, broader multidisciplinary investigations of potential ethnic, gender, and status biases are lacking. This research gap highlights the need for a comprehensive study to assess how requestor characteristics impact data-sharing willingness across disciplines.

Methodology

A preregistered (https://osf.io/378qd) randomized audit experiment was conducted. Data requests were sent to authors (N=1826 initially, 1634 after exclusions) of papers published in PNAS and Nature-portfolio journals (2017-2021) indicating data availability upon request. The fictitious data requestor's identity was manipulated across twelve treatment conditions, varying: (i) country of residence (China vs. US), (ii) institution prestige (high vs. low status university), (iii) ethnicity (putatively Chinese vs. Anglo-Saxon), and (iv) gender (masculine vs. feminine coded names). Names were selected using a historical dataset of Olympic athletes and the R package "rethnicity" to ensure name appropriateness. Emails were sent from fictitious ‘about-to-become’ PhD students using Gmail accounts, with warm-up activities to avoid spam filters. The dependent variable was a binary measure of data-sharing willingness (coded from email responses), and a secondary, unregistered outcome measured response rates. Two linear probability models were used to analyze data, one for US-affiliated treatments and one for Chinese-affiliated treatments. Interaction analyses were performed to examine the interplay between ethnicity and gender.

Key Findings

Of 1634 author-paper pairs, 884 (54%) responded to data requests, and 226 (14%) shared or indicated willingness to share data. Analysis of US-affiliated treatments revealed no significant effects of university status or gender on response rates. However, treatments with Chinese-sounding names had a 7 percentage point lower likelihood of receiving a response (odds ratio of 0.66). Analysis of Chinese-affiliated treatments showed no significant associations between treatment conditions and responsiveness or data-sharing willingness. A combined analysis (US and China treatments) showed a 7 percentage point lower likelihood of receiving a response for treatments with Chinese-sounding names (odds ratio of 0.67). Interaction analysis, focusing on combined US and China treatments, revealed that male treatments with Chinese-sounding names faced consistent disadvantages in responsiveness (β = -0.10) and willingness (β = -0.07), whereas female treatments with Chinese-sounding names showed inconclusive results. Notably, the preregistered hypotheses regarding bias in data-sharing willingness were not supported.

Discussion

The study's findings challenge initial hypotheses, indicating that bias might primarily manifest in initial engagement (responsiveness) rather than outright data-sharing refusal. The observed lower response rates for Chinese treatments compared to Anglo-Saxon treatments suggest an implicit ethnic bias, potentially driven by rapid, unreflective judgments about trustworthiness. This is supported by other research highlighting the role of implicit attitudes in discriminatory behavior. The unexpected lack of effect of university prestige on data sharing willingness for US-affiliated treatments might be attributable to perceived career risks being higher when requests come from high-prestige universities. The unexpected higher data-sharing willingness for female Chinese requestors highlights the complex interaction of gender and ethnicity. The interaction effect suggests that male Chinese requestors experience significantly more bias than female counterparts, possibly due to stereotypes about trustworthiness and deservingness, potentially aggravated by recent geopolitical tensions and anti-Asian sentiment.

Conclusion

This study reveals noteworthy ethnic bias in responsiveness to data requests, particularly affecting male Chinese requestors, highlighting a demand-side limitation to open science. The high non-response rate and low willingness to share data underscore the shortcomings of ‘data available upon request’ policies. Future research should explore the causal mechanisms underlying these biases and investigate data-sharing behaviors across various journals with differing open-science commitments.

Limitations

The use of Gmail accounts might have increased the likelihood of emails ending up in spam filters. The generic nature of the data requests could have increased suspicion. The sample was limited to authors in PNAS and Nature portfolio journals, limiting generalizability. The non-confirmation of preregistered hypotheses weakens the strength of the interaction effects. The study only examined two countries.

Related Publications

Explore these studies to deepen your understanding of the subject.

Biology

Open science and data sharing in cognitive neuroscience with MouseBytes and MouseBytes+

S. Memar, E. Jiang, et al.

Social Work

Survey of open science practices and attitudes in the social sciences

J. Ferguson, R. Littman, et al.

Medicine and Health

Digital Disease Surveillance for Emerging Infectious Diseases: An Early Warning System Using the Internet and Social Media Data for COVID-19 Forecasting in Canada

Y. Yang, S. Tsao, et al.

Urban Studies

Assessing urban livability in Shanghai through an open source data-driven approach

Y. Long, Y. Wu, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny