logo
ResearchBunny Logo
A focus groups study on data sharing and research data management

Interdisciplinary Studies

A focus groups study on data sharing and research data management

D. R. Donaldson and J. W. Koepke

Discover how data sharing is revolutionizing scientific research in a focus group study conducted by Devan Ray Donaldson and Joshua Wolfgang Koepke. This research reveals crucial insights into the challenges of data repositories and management practices, highlighting the need for better metadata quality and security features to enhance repository utilization.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper motivates the importance of sharing scientific research data, noting benefits such as enabling peer review and validation prior to publication, enhancing integrity and transparency, and fostering further inquiry without the costs of generating new data. Open data can be especially beneficial for researchers in developing countries by allowing lower-cost research through data reuse, as illustrated by rapid COVID-19 vaccine development facilitated by open sharing of genome data. Despite benefits, many scientists still do not share data, and most disciplines lack established data sharing and management guidelines, with exceptions tied to funder or IRB requirements. The literature on data management and sharing is fragmented, with few interdisciplinary studies and limited consideration of data librarianship and repository feature needs; many single-discipline studies are dated. The study applies the Knowledge Infrastructures (KI) framework (norms and values, artifacts, people, institutions, policies, routines and practices, technology) to examine scientists’ data management and sharing practices, repository use, views on librarians, and data management plans. The research question asks: what features do scientists think are necessary to include in data repository systems and services to help them implement the data sharing and preservation parts of their DMPs? The authors report consensus on certain desired repository features and identified areas where scientists need help (especially metadata and training), discipline-specific issues, and mixed views on librarians’ roles.
Literature Review
The authors situate the study within literature that documents benefits of open data for research integrity, transparency, and reuse, as well as barriers to sharing. Prior work shows many scientists do not share data, disciplines often lack standardized management/sharing guidelines, and existing scholarship is fragmented with few interdisciplinary analyses; where present, studies often omit data librarianship and repository feature perspectives and many single-discipline studies are dated. The Knowledge Infrastructures framework provides a lens to understand how norms, practices, policies, people, and technologies interact to shape data sharing. Prior studies also highlight metadata quality and standardization problems in repositories, variability in research data management practices across disciplines, rising reliance on cloud storage, and generally low but evolving institutional repository (IR) usage. The FAIR principles are widely promoted but can be challenging to apply to large or multimodal datasets. Collectively, the literature underscores the need for better metadata standards, tools, training, and repository features that support trust, integrity, and reuse.
Methodology
Design: Qualitative focus group study using the Knowledge Infrastructures framework. Participants: Convenience sample identified by browsing participant lists from major disciplinary conferences (AGU for atmospheric/earth sciences; American Chemical Society for chemistry; SOUPS’19/’20 for computer science; Society for Freshwater Science for ecology; Neuroscience’19/’20 for neuroscience). Randomly selected individuals received recruitment emails; snowball sampling and informal community networks (e.g., discipline-specific Discord groups) added a few participants. All were vetted for credentials (graduate-level training or in-progress and disciplinary expertise). Participants worked and resided in developed Western countries (mostly U.S., two from Western Europe). They represented varied institutions (universities, government, private enterprise) and career stages (graduate students, researchers, professors, professionals such as lab data managers). Procedure: Five discipline-specific focus groups were conducted via Zoom between April and August 2021. After introductions and demographics, moderators asked about participants’ data, recent/past projects, data management practices, DMPs, areas where help is needed, the role of libraries/librarians, and data sharing. This led to discussion of the FAIR principles and expected repository features (file size acceptance, licensing, embargo periods, discoverability, reuse). Each session lasted ~90 minutes. Incentives: $50 Amazon e-gift card per participant. Ethics: Approved by Indiana University Human Subjects Office (IRB #1907150522); informed consent obtained. Analysis: Recordings were transcribed and analyzed in MAXQDA using thematic analysis following Braun and Clarke’s steps: familiarization; code generation; theme construction; theme revision and definition; and reporting. The de-identified dataset is publicly available on figshare (DOI: 10.6084/m9.figshare.19493060.v1).
Key Findings
Data practices: All participants reported having a DMP for at least one recent/current project. Storage solutions included institutional repositories (IRs) across four focus groups (atmospheric/earth science, chemistry, computer science, neuroscience) and proprietary cloud storage (Dropbox, GitHub, Google Drive) across four focus groups (atmospheric/earth science, computer science, ecology, neuroscience). Concerns included file size limits, costs, long-term preservation, provider data mining, and managing multiple storage solutions. Desired repository features: 1) Data traceability: Participants in four focus groups (atmospheric/earth science, chemistry, ecology, neuroscience) wanted repositories to track views, citations, and publications derived from deposited data; support explicit versioning; allow updates post-deposit; and provide notifications for new versions or derivative works and for when data are viewed/cited/used in publications. 2) Metadata: Participants in three focus groups (atmospheric/earth science, chemistry, neuroscience) wanted high-quality metadata, automated metadata creation to save time, assistance with metadata quality control, expanded metadata types (e.g., richer spatial metadata for GIS), and improved searchability/machine readability to enable variable-level search with multiple parameters. 3) Data use restrictions: All five focus groups wanted clear, dataset-level licensing/permissions indicating whether reuse for new research, publication, or commercial purposes is allowed. Participants noted confusion around licenses and mislabeling (e.g., ill-suited licenses). While many used Open Access/Creative Commons, they emphasized options for restrictive/proprietary licenses. Embargoes were seen as a “necessary evil” if limited (e.g., a few years or until publication); indefinite embargoes were viewed negatively. 4) Stable infrastructure: Two focus groups (atmospheric/earth science, chemistry) emphasized long-term stability, preservation commitments, sustainable funding, and support for file format migration and versioning to ensure usability over time. 5) Security: Four focus groups (atmospheric/earth science, chemistry, computer science, neuroscience) wanted strong security to prevent data compromise and “scooping,” and to protect confidential/sensitive/PII data to maintain participant trust and comply with IRB mandates. Desired help with data management: - Metadata standardization and quality control help (four focus groups: atmospheric/earth science, chemistry, ecology, neuroscience) to support DMPs and improve discoverability. - Verification of deletion of sensitive data when required (especially university-affiliated participants), due to IRB compliance concerns and student turnover. - Training needs (four focus groups: atmospheric/earth science, chemistry, ecology, neuroscience): awareness of discipline-specific repositories; more robust training for graduate students/new researchers; current training perceived as piecemeal or limited to simpler tools. Computer science participants did not cite additional training needs. Knowledge of FAIR: Mixed results—12 participants across all groups knew about FAIR; 10 participants (chemistry, computer science, ecology, neuroscience) did not. Applying FAIR to large/multimodal datasets was challenging. Role of librarians: Two focus groups (atmospheric/earth science, chemistry) felt librarians should not have a role due to perceived technical specialization and librarian workload. Others across all groups saw roles for librarians in publication support, literature and patent/copyright searches, managing mandates and embargoes, information literacy, and metadata standardization, with strongest support for search assistance and data management support. Quantitative highlights: - 12 participants knew FAIR; 10 did not. - In atmospheric/earth sciences, 60% desired GIS-related metadata enhancements for discoverability. - IR usage remained generally low but present across four focus groups; cloud storage use was common across four focus groups.
Discussion
The findings deepen understanding of how KI entities (routines/practices, technology, policies, norms/values) interact in research data management and sharing. Participants’ desire for better metadata searchability may hinge on upstream metadata quality provided by data producers during deposit—an area where repositories can guide but not enforce standards—illustrating interplay between practices and repository capabilities. The study corroborates persistent metadata regulation and standardization challenges and the need for training to improve data integrity and trust, which are prerequisites for reuse. Discipline-specific needs emerged, such as atmospheric/earth scientists’ call for enhanced GIS metadata, underscoring that repository features must accommodate heterogeneous data types. Trends aligned with prior work: increasing cloud storage use; generally low but slightly increasing IR use. Perceptions of librarians’ roles were inconsistent and sometimes dichotomized between technical versus traditional support, suggesting institutional context shapes collaboration; clarifying librarian roles could enable joint efforts to improve dataset integrity and usage. Despite general support for open data, some scientists hesitate to deposit, potentially due to perceived repository feature gaps (traceability, metadata, licensing clarity, stability, security). To address this, the authors propose a repository evaluation rubric reflecting features valued by scientists, intended to guide scientists and librarians in repository selection and to help repository managers prioritize feature development. Wider adoption could increase deposits, enhance integrity through expert scrutiny, and expand reuse—benefiting especially resource-constrained contexts. Future research should test generalizability by engaging additional disciplines and researchers from countries with varying development levels to refine and validate the rubric and feature set.
Conclusion
This study identifies cross-disciplinary and discipline-specific needs for repository features and data management support through focus groups with scientists in five fields. Key contributions include: (1) empirical characterization of desired repository features (data traceability, high-quality and searchable metadata, clear data use restrictions and licensing, stable infrastructure, and robust security); (2) identification of priority support needs (metadata standardization/quality control, verifiable deletion procedures, and targeted training); (3) insight into mixed perceptions of librarians’ roles; and (4) development of a practical repository evaluation rubric to aid repository selection and improvement. Collectively, these contributions advance the application of the Knowledge Infrastructures framework to research data practices and provide actionable guidance for scientists, librarians, and repository managers. Future work should assess the rubric’s applicability across more disciplines and international contexts, develop and test interventions to improve metadata quality at deposit, evaluate training models for graduate researchers, and examine mechanisms for integrating data traceability and notification features that incentivize sharing and reuse.
Limitations
Focus groups enabled rich, unscripted interactions and in-depth exploration of complex, individualized data management practices; however, the overall sample size was small, which may limit generalizability and repeatability. While group sizes align with prior similar studies, the convenience and snowball sampling within developed Western contexts may constrain representativeness. As with self-reported practices, responses may reflect perception rather than verified behavior.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny