
Mathematics
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing
T. O. F. Conrad, E. Ferrer, et al.
This research delves into open data systems in mathematical research, highlighting the critical challenges and offering actionable solutions to enhance data sharing, as explored by Tim O. F. Conrad, Eloi Ferrer, Daniel Mietchen, Larissa Pusch, Johannes Stegmüller, and Moritz Schubotz.
Playback language: English
Introduction
The paper emphasizes the increasing importance of data sharing and citation in scientific research, accelerating knowledge generation. While some disciplines have a long history of data sharing, many, including mathematics, are still developing robust infrastructure. The authors define key terms such as repository, portal, and platform, clarifying their distinctions in the context of open data systems. Mathematics, with its inherent structure and rigor, is well-suited for data sharing, as demonstrated by the emergence of various repositories for theorems, proofs, and number sequences over the past 15 years. However, the paper highlights the complementary roles of scholarly publications (presenting theoretical advancements) and research data repositories (providing diverse data types for empirical scrutiny and verification). The authors introduce their research questions, focusing on the current status of open data systems, their essential requirements, and the obstacles hindering their successful implementation, particularly in mathematical research. The paper then situates the importance of open data within the broader context of the Open Science movement and the FAIR principles, underscoring the synergistic benefits of both.
Literature Review
The paper reviews the existing literature on open data systems and FAIR principles, identifying key features for evaluating these systems, including: free use, accessibility, data submission mechanisms, FAIR principles compliance, data quality, metadata management, data format and standardization, security and privacy, and user-friendly interface. The authors specifically discuss the benefits of Open Science, particularly the enhanced transparency and reproducibility of findings enabled through open data. The role of the FAIR principles in data management and accessibility, focusing on metadata, is also discussed. The literature review establishes a foundation for evaluating existing systems against established best practices and community standards.
Methodology
The authors employed a mixed-methods approach to identify and evaluate mathematical research data systems. Their methodology involved a literature review using Google Scholar and zbMATH Open, supplemented by direct searches on search engines using relevant keywords. They also used data repository aggregators like FAIRsharing, MathHub, OpenDOAR, and re3data. The initial list of identified systems was then refined based on predefined criteria: systems had to be free to use, publicly accessible, and offer data submission options. Several types of systems were excluded: those that did not meet these criteria, systems that primarily aggregate metadata without offering direct data submission, and systems not containing significant mathematical research data. Wiki-based systems, while important, were largely excluded due to their often limited adherence to FAIR principles. The remaining systems were then assessed using criteria adapted from Austian et al. (2015) and Wilkinson et al. (2016), covering aspects such as infrastructure, preservation, security/privacy, archiving, submission, access/sharing, policy, and FAIR principles compliance. FAIR compliance was evaluated based on detailed criteria under each principle: Findability (F1-F4), Accessibility (A1-A2), Interoperability (I1-I3), and Reusability (R1-R1.3). Each system's alignment with these criteria was verified through documentation reviews, literature analysis, and, when needed, direct communication with system providers. The authors acknowledge limitations related to the search criteria, language restrictions, reliance on publicly available sources, and potential bias due to the authors' knowledge.
Key Findings
The authors present a comprehensive list of 22 open data systems relevant to mathematical research, categorized as multidisciplinary or specialized. They analyze the systems across several key features: persistent identifiers for data, authors, and publications; metadata management features, including the use of APIs and controlled vocabularies; and data review and curation processes. The analysis reveals a wide range in the systems' capabilities and adherence to FAIR principles. Many multidisciplinary systems (like arXiv, Figshare, Zenodo) offer robust features such as DOI assignment, version control, and various licensing options but often fall short on aspects of interoperability and reusability. Specialized systems, on the other hand (like the Archive of Formal Proofs or the OEIS), often excel in data quality and curation within their specific domain but lack the broader functionality of multidisciplinary platforms. A detailed analysis of FAIR compliance across the 22 systems shows high compliance with findability and accessibility principles, but considerably lower compliance with interoperability and reusability principles. The authors highlight that while most systems provide unique identifiers (like DOIs) for data, there is less emphasis on providing persistent identifiers for authors. Similarly, while many systems reference publications using DOIs or other identifiers, they often lack the use of controlled vocabularies and qualified references to other metadata. The analysis also underscores the challenges of ensuring data quality and curation, especially in multidisciplinary repositories. Several systems lack explicit licensing information and do not fully implement data provenance information, which hinders data reusability. The authors discuss the varied approaches to data review and curation, noting that specialized repositories often have more rigorous processes compared to general-purpose repositories.
Discussion
The findings highlight the diverse landscape of open data systems in mathematics and the challenges in achieving comprehensive FAIR compliance. The authors discuss the trade-offs between specialized, domain-specific repositories and general-purpose, multidisciplinary platforms. Specialized systems excel in data quality and curation but may lack broad functionality and interoperability, while general-purpose systems offer broader reach but may not provide the same level of domain-specific expertise in data curation. The analysis emphasizes the importance of persistent identifiers for data, authors, and publications, along with rich metadata and well-defined APIs, for enhancing data discoverability, accessibility, and reusability. The authors’ assessment of FAIR compliance suggests a need for improvement in the interoperability and reusability aspects, particularly related to the use of controlled vocabularies, qualified references to other metadata, and explicit licensing information. The discussion underscores the complex interplay of technical, user-centric, and legal factors influencing the success of open data systems. The varied levels of FAIR compliance across the examined systems suggest that a one-size-fits-all approach is insufficient, and tailored strategies are needed to address the specific needs of different mathematical subfields and data types.
Conclusion
The study concludes that while significant progress has been made in making mathematical research data more readily available, there is still a considerable gap in achieving comprehensive FAIR compliance. The authors emphasize the need for a multi-faceted approach involving technical improvements, increased user engagement, and clearer standards for metadata and licensing. Future research should focus on developing tools and infrastructure that promote interoperability across different mathematical repositories and facilitate seamless integration of data from diverse sources. The paper suggests potential future directions, such as developing automated data anonymization tools, open-source tools that support open data standards, and more robust data validation and curation procedures. The authors also stress the importance of addressing the funding and sustainability challenges faced by open data initiatives in mathematics.
Limitations
The authors acknowledge several limitations of their study. The search strategy, focusing on English-language resources and using specific keywords, might have overlooked niche or emerging systems. The reliance on publicly available information may have missed systems that do not have comprehensive online documentation. The authors also note the potential subjective bias due to their own expertise influencing the selection and interpretation of the data. Finally, the evaluation of FAIR compliance relied on existing definitions and interpretations, and variations in the understanding and implementation of FAIR principles across different platforms may have influenced the results.
Related Publications
Explore these studies to deepen your understanding of the subject.