logo
ResearchBunny Logo
A focus groups study on data sharing and research data management

Interdisciplinary Studies

A focus groups study on data sharing and research data management

D. R. Donaldson and J. W. Koepke

Discover how data sharing is revolutionizing scientific research in a focus group study conducted by Devan Ray Donaldson and Joshua Wolfgang Koepke. This research reveals crucial insights into the challenges of data repositories and management practices, highlighting the need for better metadata quality and security features to enhance repository utilization.

00:00
00:00
Playback language: English
Introduction
Sharing scientific data offers numerous benefits, including strengthening initial publications through peer review and validation, enhancing research integrity and transparency, and fostering further scientific inquiry by making data readily available for reuse. Open data, particularly, lowers research costs, especially for researchers in developing nations. The rapid development of COVID-19 vaccines serves as a compelling example of the societal impact of data sharing. Despite these advantages, many scientists do not share their data, primarily due to a lack of established data sharing or management guidelines across most disciplines. Existing scholarship on this topic is fragmented, lacking interdisciplinary studies and comprehensive analyses of important aspects like data librarianship and scientists' perceptions of repository features. This study, utilizing the Knowledge Infrastructures (KI) framework, focuses on individual scientists, their repository use, perceptions of librarians, and data management plans to address the research question: What features are necessary in data repository systems to help scientists implement data sharing and preservation aspects of their data management plans (DMPs)?
Literature Review
The introduction cites various studies highlighting the benefits of data sharing and the challenges associated with its implementation. It discusses the lack of established guidelines, fragmented scholarship, and the scarcity of interdisciplinary research that adequately addresses scientists' perceptions of repository features and the role of librarians in data management. The Knowledge Infrastructures (KI) framework is introduced as a lens for understanding the creation, flow, and maintenance of knowledge, highlighting repositories and human resources as key areas for improvement in data management and sharing.
Methodology
This study employed a mixed-methods approach, using focus groups as its primary data collection method. Participants were recruited from major conferences in five disciplines (atmospheric and earth science, computer science, chemistry, ecology, and neuroscience) using a combination of random sampling, snowball sampling, and informal network recruitment. All participants were vetted for scientific credentials. The focus groups were conducted via Zoom and involved approximately 1.5 hours of discussion on topics such as data management practices, DMPs, desired help with data management, the role of librarians, and preferred repository features. Incentives ($50 Amazon gift cards) were provided. The study was approved by the Indiana University Human Subjects Office (IRB Study #1907150522). Data analysis was performed using MAXQDA software, following a thematic analysis approach described in the literature. The collected data is publicly available on figshare. The study acknowledges limitations in its relatively small sample size, but argues that the benefits of the findings outweigh these limitations.
Key Findings
The study found that participants across disciplines generally had DMPs and utilized a variety of storage solutions, including institutional repositories (IRs) and proprietary cloud storage. Key desired repository features included: * **Data Traceability:** Tracking data usage (views, citations, publications), changes post-deposit, and providing versioning and notification systems. * **Metadata:** High-quality, automated metadata creation, quality control, expanded metadata types (e.g., GIS data), enhanced searchability, and machine readability. * **Data Use Restrictions:** Clear explanations of permitted uses (research, publication, commercial use), simplified licensing options, and limited embargo periods. * **Stable Infrastructure:** Long-term data preservation, versioning, format updates, and ensuring data usability. * **Security:** Protecting data from unauthorized access, particularly sensitive information. Concerns were raised about potential data breaches and "scooping" of research. Participants also identified areas needing improvement in data management, including help with metadata standardization and quality control, procedures for verifiable data deletion (to comply with IRBs), and more comprehensive data management training. Knowledge of FAIR principles varied among participants, with some highlighting challenges in applying these principles to complex datasets. Views on the role of librarians were mixed, with some considering librarians' expertise too limited or their time commitments too demanding to offer significant assistance, while others saw valuable roles for librarians in providing assistance with publication, literature searches, patents, copyright searches, data mandates, embargo enforcement, information literacy, and metadata standardization.
Discussion
This research contributes to understanding scientists' perspectives on data management, repositories, and librarians. It presents a rubric (Supplementary Table 1) based on the identified repository features. The findings suggest improvements to research data management and sharing within the KI framework, pointing to the interrelation between routines, practices, and technology allowances in repositories. The need for improved metadata quality control and standardized metadata, particularly for specialized data types like GIS data, is highlighted, acknowledging challenges faced by open access initiatives. The findings also concur with previous research on cloud storage utilization and low IR usage, suggesting areas for future research focusing on DMP implementation and repository integration. The diverse views on the role of librarians highlight the need for exploring scientists' perceptions to foster collaboration and improve data integrity and usage.
Conclusion
The study provides a valuable rubric for scientists, librarians, and repository managers to evaluate and improve data repositories. The rubric aims to encourage increased data deposit, benefiting scientific advancement and research integrity, particularly for researchers in developing countries who may rely more heavily on existing datasets. Future studies can expand upon these findings by investigating scientists from other disciplines and geographical locations to assess the generalizability of the results and refine the rubric.
Limitations
The study's relatively small sample size may limit the generalizability of the findings, although the focus group sizes are consistent with prior research. The convenience sampling methods may also limit the representation of different demographic groups. Most participants were from developed Western countries, potentially limiting the generalizability to different contexts.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny