logo
ResearchBunny Logo
Materials Cloud, a platform for open computational science

Engineering and Technology

Materials Cloud, a platform for open computational science

L. Talirz, S. Kumbhar, et al.

Discover the innovative Materials Cloud platform, designed to revolutionize open sharing in computational science through invaluable resources and tools for materials modeling. This pioneering approach enables seamless collaboration and reproducibility of research outcomes. The exciting work was conducted by authors including Leopold Talirz, Snehal Kumbhar, Elsa Passaro, and many others from the École Polytechnique Fédérale de Lausanne.... show more
Introduction

The paper addresses the challenge of implementing open, reproducible computational science by providing an accessible platform that makes data, software, and infrastructure FAIR and reproducible. Drawing inspiration from code-sharing platforms like GitHub, the authors argue that similar open-science platforms can transform scientific practice, especially in computational materials science where data is born digital and tools are often open-source. The field faces challenges of complex, multi-step workflows, large-scale screening requiring HPC resources, and the need for meticulous provenance to ensure reproducibility and reusability. The authors outline requirements for an open-science platform: support for open-source tools, workflow management, turnkey solutions and curated datasets, and FAIR sharing of data and workflows. With this, they introduce Materials Cloud as their implementation of such a platform.

Literature Review

The authors review prior efforts that partially address aspects of open computational materials science: nanoHUB provides interactive simulation tools and educational content via the browser; integrated data-software platforms such as AFLOWlib (with aflow), Materials Project (pymatgen, custodian, fireworks, atomate), OQMD (qmpy), and JARVIS (JARVIS-Tools) link repositories to computation frameworks; NOMAD aggregates large numbers of materials calculations. Despite progress, gaps remain in handling complex, multi-scale workflows, comprehensive provenance, and scalable sharing and reuse, motivating the need for a flexible, provenance-centric open-science platform.

Methodology

Materials Cloud employs a modular architecture to enable independent evolution of its sections (LEARN, WORK, DISCOVER, EXPLORE, ARCHIVE). Front-end user interfaces are AngularJS applications themed with Bootstrap and Angular Material, using visualization libraries such as Highcharts/Highstock, D3.js, JSmol, and Vis. LEARN uses a Slideshot backend for synchronized video-slide playback. WORK tools are encapsulated in Docker containers with their own web frontends; AiiDAlab is a customized JupyterHub deploying one container per user with persistent storage and optional connections to external HPC resources. EXPLORE interfaces directly with the AiiDA REST API to retrieve calculations, workflows, codes, and data from AiiDA provenance graphs in JSON, enabling interactive provenance browsing and data downloads. ARCHIVE is implemented on the Invenio 3 framework and stores file payloads in an OpenStack Swift Object Store, with daily tape backups; metadata is served via Invenio REST and OAI-PMH endpoints. Materials Cloud is deployed on OpenStack VMs at the Swiss National Supercomputing Centre (CSCS), with duplicated production servers, periodic backups to object storage, and external monitoring at 60-second intervals. Deployment and configuration are automated via Ansible playbooks and roles, facilitating redeployment or federation. AiiDA underpins workflow execution and provenance capture; its plugin system determines what outputs are persisted while ensuring sufficient information is retained to reproduce results. Generic APIs such as OPTIMADE are supported for cross-database structure search.

Key Findings

Materials Cloud delivers an integrated, FAIR-compliant, provenance-centric platform for computational materials science across five sections: LEARN (educational resources), WORK (simulation tools and AiiDAlab apps), DISCOVER (curated datasets with tailored visualizations), EXPLORE (interactive AiiDA provenance graphs), and ARCHIVE (moderated repository with persistent identifiers). Key features and data points include: - ARCHIVE: DOIs for every record; public metadata under CC BY-SA 4.0; machine-readable metadata via HTML meta tags (Dublin Core), OAI-PMH, and JSON-LD (schema.org); storage at CSCS; non-commercial and free; guaranteed preservation for at least 10 years; size limits of 5 GB for general records and 50 GB for AiiDA databases; moderators can approve larger sets with 0.5 PB allocated overall. - Indexing and compliance: ARCHIVE is listed in re3data and FAIRsharing, indexed by Google Dataset Search and B2FIND, and recommended by Nature Scientific Data; DMP templates are provided. - DISCOVER: curated, interactive visualizations linked to underlying ARCHIVE records; examples include a 2D materials dataset and a COF dataset with nearly 70,000 structures and computed properties. - EXPLORE: interactive access to complete AiiDA provenance (calculations, inputs/outputs, codes, computers, timestamps); node-specific visualizations and file downloads; databases can be imported locally into AiiDA to continue research from published results. - WORK and AiiDAlab: browser-based launching and control of automated workflows encoded in Jupyter notebooks; persistent, containerized user environments; app store model for sharing workflows; connectivity to external HPC and OPTIMADE-compatible databases; availability of Quantum Mobile for local teaching and offline use. - Education: LEARN hosts lecture series (e.g., MARVEL distinguished lectures), synchronized slides/video via Slideshot; Quantum Mobile bundles major open-source codes (Quantum ESPRESSO, Yambo, FLEUR, SIESTA, CP2K, Wannier90), SSSP pseudopotentials, visualization tools, and AiiDA/AiiDAlab preconfigured. Collectively, these components implement FAIR data practices, enhance reproducibility through automated provenance, and lower barriers to reusing and extending computational workflows and datasets.

Discussion

Materials Cloud strengthens the reliability and reproducibility of computational materials research by coupling FAIR-compliant data publication with explicit provenance via AiiDA. The approach enables researchers to inspect and reuse entire workflow histories, not just end results, thereby facilitating verification, extension, and data mining. The platform’s open, modular design supports both dissemination of computed results (ARCHIVE, DISCOVER, EXPLORE) and execution of standardized workflows (WORK/AiiDAlab), broadening access to non-specialists through turnkey apps and educational resources. Interoperability is advanced through adoption of community APIs (OPTIMADE) and alignment with emerging metadata standards, while automated provenance tracking future-proofs datasets pending convergence of ontologies and schemas. The platform model provides a path for institutions and companies to redeploy services locally, fostering a federated ecosystem of reproducible computational research.

Conclusion

The paper presents Materials Cloud as a comprehensive, open, and reproducible research platform for computational materials science that integrates FAIR data publication, interactive provenance exploration, curated visualizations, browser-based workflow execution, and educational outreach. By leveraging AiiDA for workflow management and provenance, and by adopting interoperable interfaces such as OPTIMADE, the platform enables rigorous reproducibility and reuse. Future directions include lowering the technical barriers for submitting interactive tools and visualizations via a platform-as-a-service architecture, evolving community-driven governance (e.g., through the Materials Cloud GO FAIR Implementation Network), further standardizing semantic assets and data formats, and ensuring sustainable long-term funding models for digital research infrastructure.

Limitations

Current barriers include the technical expertise required to submit new tools and interactive visualizations; the authors plan to address this via a platform-as-a-service approach. Long-term sustainability depends on continued funding and governance structures; while a 10-year data preservation guarantee is prepaid and contingency plans exist (LTS at CSCS), indefinite maintenance is not assured. It is not always feasible to preserve all raw outputs; AiiDA plugins select which data to retain while aiming to keep all information necessary for reproduction. Repository moderation enforces scope and basic interoperability, which may limit some submissions. Interoperability standards and ontologies across platforms are still evolving, potentially affecting immediate cross-repository integration.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny