Introduction
Open computational science relies on open access to data, software, and infrastructure to ensure reproducibility and verifiability of scientific results. The challenge lies in translating this principle into sustainable practice. Funding agencies increasingly recognize this need, promoting guidelines for data management that align with FAIR (Findable, Accessible, Interoperable, Reusable) principles. The paper proposes that open-science platforms, similar to successful code-sharing platforms like GitHub, can revolutionize scientific discourse. While this applies broadly to computational science, the paper focuses on materials science, a field where digital data is naturally produced and many tools are openly available. Existing platforms like nanoHUB, AFLOWlib, the Materials Project, OQMD, and JARVIS have made progress, but challenges remain. Materials simulations often involve complex workflows combining simulations at different scales and iterative post-processing. Screening materials for applications might involve thousands of simulations, demanding substantial computational resources and efficient record-keeping. An ideal open-science platform (OSP) should support open-source codes and tools, provide an open framework for managing workflows, offer turnkey solutions accessible to diverse users, and enable FAIR data and workflow sharing for reproducibility. The Materials Cloud platform aims to meet these needs.
Literature Review
The introduction section reviews several existing platforms and initiatives that have contributed to open science in computational materials science. These include nanoHUB, which offers interactive simulation tools and educational materials; integrated data repositories and software frameworks like AFLOWlib (with aflow), the Materials Project (with pymatgen, custodian, fireworks, atomate), OQMD (with qmpy), and JARVIS (with JARVIS-Tools); and central data repositories like NOMAD, which collect materials science calculations. However, the review highlights the limitations of these existing platforms in handling complex workflows, managing large-scale simulations, and ensuring complete and efficient record-keeping, thus motivating the development of the Materials Cloud platform.
Methodology
Materials Cloud is structured around five sections: LEARN, WORK, DISCOVER, EXPLORE, and ARCHIVE, mirroring the research lifecycle. ARCHIVE acts as a moderated repository for long-term storage of research data with persistent identifiers (DOIs). It supports various data formats and ensures data longevity through pre-paid storage at CSCS and a contingency plan. DISCOVER offers curated datasets with tailored visualizations acting as points of entry for exploring the data. EXPLORE provides access to the provenance graph of AiiDA workflows, allowing interactive exploration and reproducibility. WORK focuses on simulation services and tools accessible through a web browser, including stand-alone tools and the AiiDAlab environment. AiiDAlab provides a containerized workspace with access to applications, databases, and computational resources, simplifying software setup and workflow management. LEARN provides educational materials, video lectures, and tutorials. The platform's architecture is modular, using AngularJS applications for the frontend and containerized tools for the backend. The EXPLORE section interacts directly with AiiDA’s REST API, enabling JSON-based data access. ARCHIVE leverages the Invenio 3 framework for scalability and data management. Materials Cloud is deployed on a CSCS OpenStack cloud platform with automated deployment using Ansible, ensuring redundancy and resilience. The platform uses various visualization libraries, including Highstock/Highcharts, D3js, JSMol, and Vis.
Key Findings
Materials Cloud successfully integrates data archiving, simulation services, data analytics, and educational resources into a single, user-friendly platform. The ARCHIVE section guarantees long-term data storage and FAIR data principles implementation using DOIs, metadata harvesting, and compliance with major funding agencies' requirements. The platform allows researchers to share data with interactive visualizations (DISCOVER), fully reproducible workflows (EXPLORE through AiiDA), and easily accessible simulation tools (WORK, AiiDAlab). AiiDAlab simplifies the access to high-performance computing resources and removes barriers related to the installation and configuration of simulation software. The LEARN section provides valuable educational resources, fostering knowledge dissemination and training. The modular design and use of containerization ensures flexibility, scalability, and ease of maintenance and deployment. The platform already hosts a significant amount of data and tools from various research communities. The platform's modular design and open-source components encourage extensibility and community contribution.
Discussion
Materials Cloud addresses the challenges of reproducibility and accessibility in computational materials science by providing a comprehensive platform that supports the entire research lifecycle. Its emphasis on FAIR data principles, detailed provenance tracking, and easy-to-use interfaces makes it a valuable resource for researchers of all levels. The platform's success in attracting contributions and integrating various tools and datasets demonstrates its potential to become a central hub for open computational materials science. The modular design and open-source nature of the platform foster community engagement and continuous improvement.
Conclusion
Materials Cloud is a successful example of an open-science platform for computational materials science. It promotes reproducibility through its emphasis on detailed provenance tracking and FAIR data principles. The platform's modular architecture ensures flexibility and ease of maintenance, while its user-friendly interfaces make it accessible to a wide range of users. Future work will focus on lowering the barrier to submission of new tools and visualizations, improving the platform's governance model, and promoting interoperability with other open-science platforms through standards like OPTIMADE. The long-term sustainability of such digital research infrastructures is crucial and requires a robust funding model.
Limitations
While Materials Cloud significantly advances open science, certain limitations exist. Currently, submitting new tools and interactive visualizations requires technical expertise. The platform relies on community contributions for the maintenance and further development of its functionalities. Ensuring long-term financial stability and the continued maintenance of the platform beyond project-based funding is a challenge. The platform's effectiveness depends on the adoption and participation of the research community.
Related Publications
Explore these studies to deepen your understanding of the subject.