Introduction
The sharing and management of animal research data lags behind best practices established for human studies. The authors highlight the scarcity of publicly available animal neuroimaging datasets on platforms like OpenNeuro and Zenodo, often lacking accompanying data such as microscopy files. This lack of standardization and sharing hinders reproducibility and contrasts with the 3R principle of minimizing animal experiments. The authors aim to promote FAIR data principles (Findable, Accessible, Interoperable, Reusable) and Open Science by creating an easily applicable method for establishing and maintaining multimodal animal datasets. This approach ensures access to raw and processed data, methods, and results, fulfilling requirements of funding agencies and publishers.
Literature Review
The introduction cites several works illustrating the challenges of data management and sharing in neuroimaging, particularly for small animal studies. References highlight the lack of standardization in data acquisition, storage, and sharing, and the consequent difficulties in comparing studies across laboratories. The reproducibility crisis and the need for adherence to FAIR principles and Open Science are underscored, emphasizing the importance of efficient and transparent research data management (RDM).
Methodology
The authors detail their workflow, using three main tools: an in-house relational database for experimental metadata, the GIN data platform, and the DataLad research data management software. The database collects metadata on longitudinal and multimodal animal experiments, including MRI, histology, electrophysiology, and behavior. GIN and DataLad leverage Git and git-annex for version control and large file management. The workflow comprises four stages: project planning, documentation, data acquisition/storage, and data sharing. Project details are recorded in the database, using standardized study IDs for findability. Data are organized using YODA principles and are compatible with BIDS standards. An automated incremental backup routine transfers data from a local workstation to a central network drive. DataLad is used for version control, tracking changes, and restoring previous states. Dataset nesting allows for organization by publication, data type, or storage location. The authors describe the process of initializing a DataLad dataset, version controlling data, creating a GIN repository, and uploading data to GIN. They explain how to use DataLad commands (e.g., `datalad create`, `datalad save`, `datalad push`, `datalad clone`, `datalad get`), detailing optional steps for programmatic changes, and working with published data. The methodology also covers metadata collection and dataset publication, emphasizing the importance of annotation and GIN's ability to provide DOIs for datasets. Different file formats and their handling within DataLad (holistic vs. difference tracking) are also discussed. The hardware and software requirements are specified, including a Mac Pro workstation, LaCie Thunderbolt drive, Carbon Copy Cloner, Python, GIN, and DataLad.
Key Findings
The authors present a successful and practical workflow for managing multimodal animal datasets using DataLad and GIN. The approach is designed to be easily adoptable by researchers without specialized expertise. The use of DataLad enables detailed version control, transparent collaboration, and efficient management of large files. The use of a standardized file structure, study IDs, and YODA principles enhances data findability. The implementation of a structured backup scheme ensures data security. The use of GIN facilitates data sharing and publication with DOIs, promoting reproducibility and reuse. The dataset nesting capability allows for flexible organization and controlled access to different aspects of a project. The workflow is compatible with various data formats and existing IT infrastructures, supporting the flexibility of integrating various data types and research methodologies. The authors provide a detailed, step-by-step guide for implementing this workflow, accompanied by video tutorials and example datasets. The approach supports open data sharing and transparent research practices, enhancing collaboration and the reliability of animal research findings.
Discussion
This workflow directly addresses the critical need for improved data management and sharing in animal research. The authors successfully demonstrate a practical and easily implemented solution leveraging free and open-source tools. The approach fosters reproducibility by providing access to raw and processed data, code, and detailed processing steps. The structured data organization enhances data findability and simplifies collaboration. The integration of DataLad and GIN optimizes version control, data sharing, and long-term data preservation. While the approach is tailored for animal research, its principles can be adapted to other fields. The authors acknowledge that DataLad requires an initial time investment but highlight the long-term benefits in terms of efficiency, quality, and reliability. The authors also discuss the importance of metadata, promoting compatibility with standards like BIDS while acknowledging the practical need for flexibility when working with diverse data types. The emphasis on the availability of a detailed, step-by-step guide further aids in facilitating widespread adoption.
Conclusion
This paper provides a valuable blueprint for establishing and maintaining FAIR multimodal animal research datasets. The proposed workflow using DataLad and GIN facilitates reproducible research, improves data management, and promotes collaborative efforts. The detailed, step-by-step guide, combined with freely available software and online resources, lowers the barrier to entry for researchers seeking to enhance their data management practices. Future research could explore the integration of this workflow with other data platforms and the development of standardized metadata schemas for diverse animal research modalities.
Limitations
While the proposed workflow offers significant advantages, the authors acknowledge the need for an initial investment of time and resources for learning and implementing DataLad. The workflow's reliance on specific software (DataLad and GIN) might pose a limitation for researchers unfamiliar with these tools. Although the authors emphasize the compatibility of their workflow with various data formats and platforms, complete interoperability across all systems might not be immediately achievable. The successful implementation of the workflow depends on the researchers' adherence to the standardized naming conventions and data structure.
Related Publications
Explore these studies to deepen your understanding of the subject.