Mathematics
Making Mathematical Research Data FAIR: Pathways to Improved Data Sharing
T. O. F. Conrad, E. Ferrer, et al.
The paper addresses the growing importance of sharing and citing research data to enhance reproducibility, validation, and the pace of knowledge creation. It distinguishes between repositories (long-term storage and dissemination), portals (aggregation and discovery), and platforms (broader collaboration and analysis tools), using the umbrella term open data systems when not distinguishing. Mathematics, with its structured nature (e.g., theorem libraries, number sequences), has made progress in data sharing, and repositories complement traditional scholarly publications by providing empirical and computational research data that enable verification. The study focuses on assessing the landscape of open data systems relevant to mathematics and their role in Open Science and FAIR data practices. Research questions: (1) What is the current status of open data systems in academia? (2) What are the main requirements for an open data system? (3) What are the biggest challenges and obstacles preventing widely used open data systems? The introduction underscores the importance of integrating repositories with publications to strengthen credibility and facilitate reuse, and it provides a taxonomy of mathematical research data types (symbolic, numeric, geometric, models, observational, and text).
The background situates the work within the Open Science movement and the FAIR principles. Open Science promotes accessible publications, software, data, and educational materials, exemplified by rapid COVID-19 data sharing. FAIR focuses on machine-friendly data management and metadata (Findable, Accessible, Interoperable, Reusable), complementing Open Science. For mathematics, open data enables replication, validation, cross-fertilization, and new research avenues. The paper reviews open data systems—repositories, portals, platforms—as digital environments that provide storage, exchange, access, metadata standardization, and sometimes analysis tools. Notable cross-disciplinary systems used by mathematicians include Zenodo and Figshare. The literature and community standards inform key features to evaluate systems: (a) essential features: free use with open licensing, broad accessibility (web), and data submission mechanisms; (b) additional features: FAIR compliance, data quality and curation, robust metadata management, adherence to data format standards and standardization (e.g., DDI, ISO/IEC 11179), security and privacy, and user-friendly interfaces. These criteria, shaped by prior work (e.g., Austin et al., Wilkinson et al.), frame the subsequent evaluation.
The authors combined a literature-informed and web-based discovery approach to identify open data systems relevant to mathematics. Primary discovery used Google Scholar with keyword combinations (e.g., mathematics, research data/scientific data/metadata, portal/repository/infrastructure/platform, metadata management, FAIR), augmented by searches inzbMATH Open, FAIRsharing, MathHub, OpenDOAR, and re3data, and complemented by authors’ domain knowledge. Searches were conducted June–October 2022 and March–June 2023, restricted to English-language content. Inclusion required that systems be free to use, publicly accessible, and allow user data submission irrespective of affiliation or location. Exclusions encompassed: systems without free submission; institutional repositories with member-only deposits; aggregators lacking direct user deposition (e.g., re3data, DataCite, Dimensions.ai); systems with insufficient mathematical data at the time (e.g., B2share, Dryad, Fairdomhub, Mendeley Data, Vivli); and many wiki-based portals that, despite being open and collaborative, typically lack FAIR essentials (e.g., PIDs, API metadata, explicit licenses). A non-exhaustive list of wiki-style systems is provided separately. Included systems were evaluated using Austin et al.’s framework across Infrastructure, Preservation, Security/Privacy, Archiving, Submission, Access/Sharing, Policy, and FAIR compliance, with detailed FAIR assessment mapped to F1–F4, A1–A2, I1–I3, and R1–R1.3. Verification drew on documentation, literature, and when needed, direct communication with providers.
- Scope: The study identifies and analyzes 22 open data systems with significant mathematical relevance, spanning multidisciplinary repositories (e.g., arXiv, HAL, Zenodo, Figshare, Harvard Dataverse, Science Data Bank, OSF, Open Science Library/Code Ocean, Wikidata) and math-specialized systems (e.g., Archive of Formal Proofs, OEIS, FindStat, SuiteSparse Matrix Collection, House of Graphs, Encyclopedia of Graphs, π-Base, polyDB, Database of Ring Theory, Network Repository).
- Persistent identifiers and metadata: Most systems assign persistent identifiers (often DOIs; others use internal IDs), facilitating citability and discoverability. However, many do not explicitly reference the persistent identifier within the metadata, limiting compliance with FAIR F3. Author identifiers (e.g., ORCID) are only optionally supported and are more common in multidisciplinary systems.
- APIs and interoperability: About two-thirds provide APIs for metadata access; fewer use controlled vocabularies and qualified references (e.g., DOIs for publications, ORCID for authors). This gap reduces interoperability across systems and tools.
- FAIR compliance patterns (Table 5): Findability and Accessibility are generally well met; shortcomings concentrate in Interoperability and Reusability. A2 (metadata available even if data are removed) and F3 (explicit PID references in metadata) are frequently unmet. Reusability is hindered by absent or inconsistent licenses, limited contextual metadata, and sparse provenance.
- Curation and review: Specialized repositories tend to curate and review submissions to ensure quality; general-purpose systems often provide limited curation (policy compliance checks, metadata review) with timestamped versioning. Some offer curation services (e.g., Dataverse-based).
- Functional features (Table 4): Many general-purpose systems support DOIs, ORCID, versioning/timestamps, and private datasets; advanced integration is available in several platforms (e.g., GitHub/GitLab, OAI-PMH, REST APIs). Specialized math repositories vary widely in machine-readability and licensing clarity.
- Wiki-based math sharing: Commonly used but frequently lack FAIR essentials (persistent IDs, comprehensive machine-readable metadata via APIs, clear licenses, qualified references).
The findings show a maturing but uneven ecosystem of open data systems for mathematics. Regarding RQ1 (current status), the landscape comprises general-purpose multidisciplinary repositories with robust infrastructure and identifiers, alongside math-specific repositories offering deep domain utility but varying FAIR adherence and interoperability. For RQ2 (requirements), evidence across systems highlights the need for persistent identifiers across resources and references, rich and standardized metadata with qualified links (authors, publications, datasets), machine-readable access via APIs, clear licensing, provenance, and accessible, user-friendly tooling. For RQ3 (challenges), the study identifies technical barriers (lack of controlled vocabularies, inconsistent metadata schemas, limited APIs), organizational issues (varying curation models, sustainability and funding), and conceptual gaps (ambiguous interpretations of FAIR sub-principles, lack of community standards in niche math subfields). The results emphasize selecting repositories aligned with data type, discipline specificity, and required features (e.g., DOIs, versioning, APIs, long-term preservation), and they argue that improved metadata management, PIDs for creators and references, and stronger standardization would substantially increase interoperability and reuse. The analysis also clarifies that while findability and access are largely solved, reusability and interoperability demand coordinated community and infrastructure efforts—especially around controlled vocabularies, provenance capture, and domain standards.
This work surveys and evaluates 22 open data systems relevant to mathematical research and assesses their features and FAIR compliance. It contributes: (1) a structured overview of systems and their capabilities; (2) an empirically grounded appraisal of FAIR adherence, revealing strong performance on findability and accessibility but gaps in interoperability and reusability; (3) a set of foundational requirements for open data systems spanning technical, user-centric, and legal/ethical dimensions; (4) an articulation of challenges and obstacles with proposed solution directions; and (5) practical lessons learned to guide repository selection and development. Future work should focus on: establishing domain-specific community standards for mathematical data and metadata; implementing consistent, machine-actionable licensing and provenance; expanding support for persistent identifiers for creators and referenced resources; enhancing APIs and controlled vocabularies; developing automated and community-assisted curation workflows; improving usability and training resources; and exploring sustainable funding and governance models to ensure longevity and quality. Strengthening wiki-based math platforms with FAIR-enabling features is a promising avenue to broaden participation while improving machine-readability and reuse.
The study is limited by: (1) search criteria constraints—specialized or emerging systems not matching chosen keywords may have been missed; (2) English-language restriction, potentially excluding non-English systems and repositories; (3) reliance on publicly available sources (publications, search engines, repository aggregators), which may omit low-profile or community-specific systems; and (4) potential bias from authors’ knowledge and networks informing the final list. Consequently, the catalog is valuable but not exhaustive.
Related Publications
Explore these studies to deepen your understanding of the subject.

