logo
ResearchBunny Logo
How ethics combine with big data: a bibliometric analysis

Computer Science

How ethics combine with big data: a bibliometric analysis

M. Kuc-czarnecka and M. Olczyk

Marta Kuc-Czarnecka and Magdalena Olczyk conducted a bibliometric analysis revealing that the exploration of ethical concerns in Big Data is surprisingly sparse in scientific literature, showing slow growth with a primary focus on health and technology.

00:00
00:00
Playback language: English
Introduction
The concept of Big Data has rapidly grown, becoming a significant area of research for both academics and practitioners. While substantial literature exists on the technical aspects of Big Data, its intersection with ethical considerations remains under-explored. This study aims to address this gap by analyzing the evolution of ethical concerns within Big Data research using bibliometric methods. Big Data, as a subfield of Data Science, involves the analysis of large datasets to extract insights, improving decision-making in organizations. However, this process presents ethical dilemmas, ranging from discriminatory practices towards minorities to abusive labor practices and consumer exploitation, to the use of algorithms in cyber warfare and election manipulation, as noted by various researchers (Boyd and Crawford, 2012; O’Neil, 2016; Halpern, 2019; McNamee, 2019; Broniatowski et al., 2018; Rankin, 2020; Harari, 2018). The complexity of these concerns necessitates in-depth analytical approaches to understand the field's structure and development. This research, employing bibliometric analysis, seeks to provide comprehensive insights into publication patterns and fill the gap in the existing literature.
Literature Review
The authors reviewed existing literature to establish a context for their bibliometric analysis. They noted the limited research specifically combining ethics and Big Data, despite the considerable body of work on Big Data's technical potential and the established field of technology ethics. A key early contribution was the work by Boyd and Crawford (2012), which highlighted the ethical implications of data privacy in social media and the oversight of ethical considerations in Big Data processes. Other works examined the use of online data for social research, and growing ethical concerns related to algorithms in various contexts, including the dangers to liberty posed by Big Data-driven digital dictatorship. The researchers emphasized the scarcity of studies specifically addressing the combination of ethics and Big Data using bibliometric methods.
Methodology
The study employed three bibliometric methods: descriptive analysis, network-citation analysis, and co-occurrence analysis. Descriptive analysis involved examining indicators such as the number of research papers over time and global/local citation counts. Local cited references (LCR) focused on citations within the created database, reflecting contributions to the field's development, while global citation scores (GCS) captured total citations in the Web of Science Core Collection. Network-citation analysis, using HistCite software, generated a historiograph to visualize the relationships between the most-cited publications, revealing timelines and publication influence. Co-occurrence analysis, performed with VOSviewer software, measured the frequency of terms co-occurring in the text to identify trends and research hotspots. The Web of Science Core Collection Database (WoS) was chosen as the data source, despite acknowledging limitations in its coverage of social sciences, humanities, and non-English language journals, due to superior data quality compared to Scopus. The software used NLP to identify the strength of association among noun phrases and create co-occurrence maps using the SMACOF algorithm, grouping related terms into clusters to reveal major research themes.
Key Findings
The analysis of 892 records revealed a substantial dispersion of publications across journals and moderate author concentration per journal. Figure 2 shows the growth of publications on ethics in Big Data from 2011 to 2020, with a peak in 2018 and 2019. Key early publications include those by Helbring and Balietti (2011) and Boyd and Crawford (2012), the latter being particularly influential. Table 2 ranks the ten most-cited authors, with Crawford and Boyd leading, although Vayena published the most papers. Table 3 presents the ten most-cited publications, prominently featuring Boyd and Crawford (2012) and Mittelstadt and Floridi (2015), often focusing on biomedical contexts. The HistCite historiograph (Figure 3) illustrates the citation network, showing the seminal influence of Boyd and Crawford (2012) and Mittelstadt and Floridi (2015). The VOSviewer co-occurrence map (Figure 4) reveals three clusters: a legal cluster (governance, regulation, rights); a scientific cluster (data sharing, knowledge access); and a medical cluster (medicine, healthcare, AI). The medical cluster strongly dominated, with "science," "health," and "medicine" as the most frequent terms. The study also noted an interesting gender imbalance, with female scholars receiving substantially more citations (429) compared to male scholars (211). Despite the study's scope, there was a surprisingly low number of papers focusing on the economic, political, and sociological implications of Big Data ethics.
Discussion
The findings address the research question by revealing the current state and trends in the intersection of ethics and Big Data. The dominance of Boyd, Crawford, Mittelstadt, and Floridi, and the focus on health and medical issues, highlight the field's current trajectory. The limited representation of other fields (economics, sociology, political science) indicates a significant gap in ethical considerations within Big Data applications beyond biomedical contexts. This concentration might be due to the sensitivity of medical data and potential consequences of faulty research, as seen with the controversy surrounding vaccines. The COVID-19 pandemic has further intensified ethical concerns regarding data privacy and security in contact tracing apps and challenge studies, underscoring the need for robust ethical assessments of Big Data applications in diverse settings. The relatively slow growth in this area, however, suggests a need for more interdisciplinary collaboration and research to address the broad ethical implications of Big Data in various societal domains.
Conclusion
This study provides a valuable bibliometric analysis of ethical considerations within Big Data research. The findings reveal the field's current concentration on health and medicine, the key influential authors, and the limited exploration of ethical issues in other sectors. The COVID-19 pandemic highlights the continuing relevance and urgency of this research. Future studies should focus on expanding ethical analysis into other fields and fostering interdisciplinary research to fully address the multifaceted implications of Big Data.
Limitations
The study's limitations stem from the inherent constraints of bibliometric research and the database used (Web of Science Core Collection). The database's limited coverage of social sciences, humanities, non-English publications, and books might influence the results. The focus solely on published articles and conference publications excludes other forms of scholarly communication. The co-occurrence analysis relies on the terms used by authors and may not fully capture the nuances of the underlying concepts.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny