Engineering and Technology
Applying FAIR4RS principles to develop an integrated modeling environment for the magnetic confinement fusion
X. Liu, Z. Yu, et al.
Discover how researchers Xiaojuan Liu, Zhi Yu, and Nong Xiang are revolutionizing research software management with FyDev, a tool that integrates FAIR4RS principles for enhanced transparency and reproducibility in scientific research.
~3 min • Beginner • English
Introduction
Magnetic confinement fusion experiments (tokamaks) involve multiple physical processes across scales, requiring many research software packages to interoperate within an integrated modeling (IM) and analysis environment. The field has moved from isolated proprietary tools to open, flexible platforms that enable sharing, communication, and workflow orchestration. Parallel advances in data stewardship introduced the FAIR principles for data and, more recently, FAIR4RS for research software, emphasizing findability, accessibility, interoperability, and reusability. In IM, large volumes of simulation data and complex software stacks make transparency of computational environments and reproducibility particularly challenging. The build and runtime environments are often insufficiently described, hindering provenance and reproducibility. Research software management is therefore as critical as data management. This work presents FyDev, an IM management tool for EAST that operationalizes FAIR4RS principles. FyDev unifies building, deploying, and invoking diverse research software into a Python-based context, assigns human- and machine-readable identifiers, captures rich metadata and provenance, and integrates package-building tools to ensure consistent and reproducible software environments in IM.
Literature Review
Methodology
The authors implement and demonstrate FyDev within the EAST IM environment, using Python as the orchestration layer. FyDev centers on three core elements: IDs, metadata, and a Build module. IDs: FyDev assigns a human- and machine-readable identifier of the form name-version-toolchain-version-suffix. The toolchain-version represents a set of base packages (e.g., compiler, MPI, math/data libraries) and the suffix captures custom modifications. IDs are used to retrieve both installed executables (software repository) and metadata templates (metadata repository). FyDev maintains an internal mapping from IDs to software and environments, prioritizing readability while acknowledging that global uniqueness is outside its scope. Metadata: FyDev uses YAML key-value metadata templates stored in a remote Git-based metadata repository and local description files stored alongside executables. The template records software information (name, version, dependencies, build and run methods, I/O parameters, source locations, license, webpages). When software is built on a specific platform, the local description file augments the template with environment-adaptive provenance such as installation directories, module loads, run commands and parameters, and checksums. Build module: When an ID is requested and the corresponding local description file does not exist, FyDev triggers the Build module within the current Python runtime. It parses the metadata, fetches source code (e.g., from Git or archives), generates or uses existing build configuration files for an underlying package management system, and invokes the tool to compile, install, and deploy the software. FyDev encapsulates mature tools (primarily EasyBuild with Lmod for HPC), while exposing flexible interfaces to incorporate others (e.g., Spack, Guix, containers). The build process is dynamically tracked and recorded into the local description file, ensuring complete provenance and reproducibility. Repository structures and APIs: FyDev standardizes lookup paths so IDs are directly retrievable as local description files and as metadata records (e.g., RootPath/software/fydev/physics/name and RootPath/repository/{name}-{version}-{toolchain}{version-suffix}.yaml). A unified Python API (e.g., module.fetch(ID), module.load(ID)) handles source retrieval and executable loading. Interoperability and I/O: FyDev records standardized I/O descriptions in metadata/description files and integrates with SpDB, a data integration tool based on the IMAS data model, to translate inputs/outputs and provide a unified API across heterogeneous software components. Demonstration with GENRAY and CQL3D: Using a code snippet, the authors show FyDev initializing its repositories, generating IDs for GENRAY and CQL3D, and executing them. For GENRAY (not yet installed), FyDev queries metadata, fetches sources, activates the build system (EasyBuild), compiles, deploys, writes the description file, parses it to construct a Python-callable module, and executes genray.run(). For CQL3D (present under an existing Lmod stack but not in FyDev), FyDev discovers it via Lmod, wraps it into a Python-callable module using existing environment information, and runs it; if it were absent, FyDev would not auto-install via Lmod in this scenario. Execution flow is summarized in a workflow (Fig. 2) detailing ID generation, description file search, metadata fallback, build activation, and run.
Key Findings
- FyDev operationalizes FAIR4RS in an IM context by unifying find, access, build, and invocation of research software via a consistent Python API and standardized IDs. - Findability: Human- and machine-readable IDs (name-version-toolchain-version-suffix) enable retrieval of both executables and metadata; repository directory conventions improve discoverability. - Accessibility: A simple, unified API encapsulates ID-based access to binaries and metadata. Executable access is governed by system-level UID/GID; metadata ensures long-term accessibility as long as source is available. - Interoperability: Metadata/description files specify standardized I/O and runtime context; SpDB provides a unified API based on a standard data model to facilitate software coupling. - Reusability: The Build module integrates the build process into runtime, automatically recording complete provenance (dependencies, configuration, commands) in local description files and converting software into Python wrapper modules, thereby enhancing reproducibility and reusability. - Metadata design: YAML templates capture static and dynamic fields (e.g., install_dir, prescripts, run commands, inputs/outputs). Local description files are stored with executables as upgraded, platform-specific provenance records. - Tooling integration: EasyBuild and Lmod are encapsulated as default HPC-oriented tools, with flexibility to incorporate others (e.g., Spack, Guix, containers). A comparative analysis highlights trade-offs among software management approaches. - Compatibility: FyDev coexists with traditional Lmod-based stacks and can wrap externally managed software while offering a richer provenance model when FyDev performs the build. - Provenance and portability: By coupling IDs, metadata, and the Build module, FyDev supports migration of IM software environments across sites while maintaining consistency of versions and base toolchains. - Community alignment: FyDev emphasizes readable IDs and practical reproducibility within its domain, with plans to integrate metadata hashes and potentially map to globally unique identifiers maintained by community registries.
Discussion
The work demonstrates a practical pathway to applying FAIR4RS principles to complex, heterogeneous IM software environments in magnetic confinement fusion. Focusing on the usage perspective, FyDev shows that combining readable, structured identifiers with rich metadata and an integrated Build module can significantly improve transparency, provenance, reproducibility, and reuse. The unified Python API provides consistent access and execution, while standardized I/O descriptions and integration with a community data model (via SpDB) support interoperability between diverse codes. The authors note that while FyDev inherits source-level identifiers such as DOIs, its build-level IDs prioritize human readability and within-system uniqueness; true global uniqueness would require a community registry. Similarly, FyDev concentrates on deployment and invocation rather than prescribing development practices, acknowledging that some FAIR4RS elements (e.g., licensing, community development standards) remain outside its scope. Overall, FyDev addresses key barriers to reproducibility in IM by embedding the build process into the runtime context, closing gaps between package building and scientific usage metadata, and maintaining detailed, platform-specific provenance alongside executables.
Conclusion
The paper presents FyDev as a prototype software management tool that applies FAIR4RS principles in the EAST IM environment. Its core contributions include: a human- and machine-readable ID scheme linking software, versions, and toolchains; a dual-level metadata approach (templates plus platform-specific description files) that records comprehensive provenance; and an integrated Build module that automates deployment using mature tools (e.g., EasyBuild/Lmod), ensuring consistent, reproducible invocation through a unified Python API. FyDev enhances findability, accessibility, interoperability, and reusability of research software in IM, supports coexistence with traditional stacks, and facilitates environment migration across sites. Future directions include incorporating metadata hashes, establishing mappings to globally unique community-maintained identifiers, expanding supported package management backends, and further standardizing I/O semantics and provenance capture to strengthen cross-community interoperability and reproducibility.
Limitations
- Scope: FyDev focuses on software usage, deployment, and invocation, not on software development practices; FAIR4RS items such as explicit licensing (R1.1) and adherence to community development standards (R3) are not addressed. - Global uniqueness: The ID scheme emphasizes readability and within-system uniqueness; FyDev does not guarantee global uniqueness of builds without a community registry. - Authentication/authorization: The current API does not implement application-level authentication; access control relies on system-level UID/GID and HPC user/group management. - External stacks: When software exists only under external environment modules (e.g., Lmod) and not in FyDev, automatic installation via those stacks may not be attempted, limiting automation in some scenarios. - Generalizability: While designed for IM in fusion and HPC contexts, effectiveness may vary across domains until broader community standards and registries are established. - Data sharing: No datasets accompany the work, limiting empirical evaluation beyond the presented demonstration.
Related Publications
Explore these studies to deepen your understanding of the subject.

