logo
ResearchBunny Logo
Modeling community standards for metadata as templates makes data FAIR

Computer Science

Modeling community standards for metadata as templates makes data FAIR

M. A. Musen, M. J. O'connor, et al.

This paper explores a template-based approach to determine the FAIRness of datasets, emphasizing rich metadata and community standards. Conducted by Mark A. Musen, Martin J. O'Connor, Erik Schultes, Marcos Martínez-Romero, Josef Hardi, and John Graybeal, it showcases how CEDAR and FAIRware Workbenches can transform data management and sharing.

00:00
00:00
Playback language: English
Introduction
The FAIR Guiding Principles, published in 2016, advocate for making research data findable, accessible, interoperable, and reusable. While widely adopted, the principles' abstract nature hinders practical implementation. Many FAIR principles relate to repository management, outside investigators' direct control. However, investigators are responsible for ensuring rich metadata that adhere to domain-relevant community standards. Existing automated tools for evaluating FAIRness struggle because they cannot autonomously determine metadata richness or the relevance of community standards. This paper proposes a solution: machine-processable metadata templates created by scientific communities to embody their standards and guide data stewards.
Literature Review
The paper reviews the evolution of community standards for describing experimental data, highlighting the shift from textual reporting guidelines to machine-readable templates. It mentions examples such as CONSORT (for clinical trials), MIAME (for microarray experiments), and MIATA (for T-cell assays). These guidelines, while helpful, lack machine-readability, hindering automated validation. The paper also cites the rise of minimal information checklists and the Microsoft Research proposal for 'datasheets' for datasets, emphasizing the need for machine-actionable representations of data standards.
Methodology
The core of the methodology is the creation of machine-readable metadata templates using JSON Schema. These templates define attribute-value pairs that characterize standard metadata specifications and encode community-based standards for creating research metadata. The templates encapsulate subjective elements needed to operationalize the FAIR principles, making them interpretable by computers. Two applications, CEDAR and FAIRware, utilize these templates. CEDAR, a web-based platform, helps investigators author rich, standards-adherent metadata. Users select a template, and CEDAR generates a form to fill in values, ensuring compliance with data types and ontology terms. FAIRware, a prototype system, evaluates existing datasets for FAIRness by comparing their metadata against CEDAR templates. It identifies non-compliant fields, suggests improvements, and generates alternative, more standards-adherent metadata records. The paper details the process of formulating metadata via "Metadata for Machines Workshops" (M4Ms) and the encoding of metadata in JSON-LD.
Key Findings
The paper's key finding is the successful implementation of a template-based approach to operationalize the FAIR principles. The CEDAR Workbench facilitates the creation of rich, standards-compliant metadata by guiding users through community-defined templates. The system integrates with standard ontologies and repositories to ensure data consistency and interoperability. FAIRware, using the same templates, evaluates existing datasets for FAIRness, offering valuable insights into metadata quality and areas for improvement. The use of JSON Schema and JSON-LD allows for seamless exchange of metadata templates and instances between different applications. The HuBMAP project serves as a real-world example of the successful application of CEDAR, demonstrating the feasibility and benefits of the approach. FAIRware analyzes HuBMAP datasets, identifying issues and proposing corrections, highlighting its ability to enhance the FAIRness of existing datasets. The integration of CEDAR and FAIRware showcases the potential for a cohesive ecosystem of tools supporting FAIR data. The Metadata for Machines Workshops (M4Ms) prove efficient in creating templates, translating abstract FAIR principles into concrete solutions.
Discussion
The template-based approach addresses the challenges posed by the subjective nature of FAIR principles by encoding community standards directly into machine-readable templates. This contrasts with previous efforts to automate FAIRness assessment, which struggle with context-dependent criteria. The CEDAR and FAIRware system shifts responsibility for defining FAIRness from individual evaluators to the scientific community, promoting consensus-based standards. The system's ability to identify problematic areas in existing metadata and suggest improvements is a significant contribution to data stewardship. The use of standard ontologies and the JSON-LD format enhances the interoperability and reusability of metadata. The paper acknowledges limitations, such as the potential for over-ambitious templates and the effort required to create them.
Conclusion
This research demonstrates a successful method for operationalizing the FAIR principles through community-defined metadata templates. The CEDAR and FAIRware workbenches provide a robust framework for authoring and evaluating FAIR data. Future work could explore extending the system's capabilities, such as automated translation between community standards and more sophisticated reasoning about dataset relationships.
Limitations
The success of this approach relies heavily on the active participation of scientific communities in creating and maintaining accurate, comprehensive metadata templates. Overly ambitious templates might burden data curators, hindering adoption. While the system aims to be technology-independent, effective implementation requires ongoing community engagement and maintenance.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny