
Chemistry
A dynamic knowledge graph approach to distributed self-driving laboratories
J. Bai, S. Mosbach, et al.
Dive into the innovative architecture for distributed self-driving laboratories developed by Jiaru Bai, Sebastian Mosbach, Connor J. Taylor, and their team. This research showcases a dynamic knowledge graph that radically enhances design-make-test-analyze cycles through autonomous agents, culminating in a remarkable closed-loop optimization for an aldol condensation reaction across continents in just three days.
Playback language: English
Introduction
The accelerating pace of scientific discovery demands efficient resource integration and knowledge sharing across organizations, especially for tackling global challenges. Self-driving laboratories (SDLs), automating experimental processes, have significantly advanced scientific progress across various disciplines. However, current SDL implementations often remain centralized within single organizations, limiting global collaboration. This paper addresses the need for decentralized, globally collaborative research networks by proposing a novel architecture for distributed SDLs based on a dynamic knowledge graph. This approach leverages semantic web technologies, specifically ontologies, to create a common language for communication and resource allocation among geographically dispersed research groups. The World Avatar project, an encompassing digital twin built on a dynamic knowledge graph, provides the foundational framework for this architecture. This dynamic knowledge graph not only represents static information about resources and data but also integrates software agents as executable knowledge components. These agents enable dynamic adjustments to the knowledge graph based on evolving research goals and experimental results. The inclusion of FAIR (Findable, Accessible, Interoperable, Reusable) data provenance ensures the reproducibility and reliability of experimental findings. The key challenges addressed by this architecture include orchestrating diverse resources across different computing environments and vendors, sharing data across organizations through standardized languages, and ensuring data provenance following FAIR principles. Existing solutions for resource orchestration (ChemOS, ESCALATE, HELAO), data sharing (XDL, AnIML), and data provenance recording have limitations regarding interoperability and customized data interfaces. The proposed dynamic knowledge graph approach aims to overcome these limitations by providing a holistic and adaptable solution, aligning with the goals of the Nobel Turing Challenge: to create automated systems capable of scientific discovery.
Literature Review
Several efforts have been made to address the challenges of building collaborative research environments. Middleware like ChemOS, ESCALATE, and HELAO have been developed to manage diverse components and hardware resources within SDLs. For data sharing, standardized protocols such as XDL and AnIML have been created for synthesis and analysis respectively. Studies like Mitchell et al. (2022) have proposed data pipelines for managing large datasets, while Statt et al. (2023) have explored using knowledge graphs to record experiment provenance in materials research. While these initiatives provide valuable contributions, they typically operate in isolation with limited interoperability. This research builds upon previous work using semantic web technologies, particularly knowledge graphs, to address these limitations. The authors' previous research (Bai et al., 2022) highlights the advantages of using knowledge graphs to facilitate seamless data exchange and adaptability in laboratory automation. The World Avatar project aims to encompass all aspects of scientific research laboratories as a digital twin, integrating experimental setups, handling agents, and laboratory infrastructure, all within a unified knowledge graph.
Methodology
The architecture of distributed SDLs proposed in this work focuses on integrating data and material flows, alongside the interface between virtual and physical worlds, within a design-make-test-analyse (DMTA) cycle. The system consists of several key components: a research goal monitor that parses scientist requests, a research goal iterator that manages experiment iterations, a design of experiment (DoE) component that proposes new experimental conditions, an experiment scheduler that allocates experiments to available laboratories, and a data processing component that analyzes results and determines the next steps. This architecture operates within a dynamic knowledge graph framework. The dynamic knowledge graph serves as a central hub for information exchange, where software components are abstracted as agents that communicate via messages. Physical entities are represented as digital twins, enabling real-time control regardless of geographic location. The closed-loop optimization problem is reformulated as information flowing through the knowledge graph and actuating changes in the physical world. Ontologies are crucial to this system. Specifically, ontologies were developed to represent various levels of abstraction: from high-level research goals to chemical reactions, DoE strategies, and physical hardware. The ontologies capture the relationships between abstract chemical knowledge and the concrete hardware used for experimentation, bridging the gap between the virtual and the physical world. Chemical ontologies, drawing inspiration from OntoCAPE and ORD/UDM schemas, capture information about reactants, catalysts, solvents, reaction conditions, and performance indicators. Hardware ontologies, building on the SAREF ontology, represent the digital twins of laboratories and equipment (e.g., Vapourtec reactors and HPLC systems). The knowledge graph’s dynamic nature is facilitated by software agents, each responsible for a specific task within the DMTA cycle. The agents utilize the derived information framework, updating the knowledge graph and triggering actions in the physical world. The system supports seamless integration of new devices and algorithms by modifying the knowledge graph, thereby enabling flexibility and extensibility. For closed-loop optimization, the system draws parallels between the pursuit of optimal objectives and goal-driven reasoning cycles. A goal set with individual goals, plans, and steps is formulated and managed using the derived information framework, mirroring iterative workflows.
Key Findings
The proposed framework was validated through a real-time, collaborative closed-loop optimization experiment involving two SDLs located in Cambridge and Singapore. The chosen reaction was a pharmaceutically relevant aldol condensation of benzaldehyde and acetone, catalyzed by sodium hydroxide. The optimization targeted two objectives: run material cost and yield. The system successfully generated a Pareto front for these objectives over three days, demonstrating real-time collaboration between the two geographically separated SDLs. The highest yield achieved was 93%. The system demonstrated resilience to hardware failures (an HPLC failure in the Singapore lab), with the Cambridge SDL continuing to contribute to the optimization progress. Analysis of the experimental data revealed correlations between design variables (molar equivalents of reactants, residence time, and reaction temperature) and the objectives. For instance, increasing the molar equivalent of acetone beyond a certain point resulted in reduced yield due to side product formation. The system successfully managed data provenance, recording all data in the knowledge graph, allowing for visualization of the optimization progress. The complete provenance records are provided as supplementary data along with an interactive animation.
Discussion
This study successfully demonstrates the feasibility of a dynamic knowledge graph approach to creating distributed SDLs. The results showcase the system's ability to integrate diverse resources, share data seamlessly across organizations, and manage experimental provenance effectively. The real-time collaboration between geographically dispersed SDLs led to faster optimization progress and a robust system resilient to hardware failures. The integration of chemical ontologies and hardware digital twins enables interoperability between different experimental setups and data sources, addressing significant challenges encountered in previous approaches. This approach offers a promising path towards globally collaborative research networks, allowing scientists to focus on high-level research goals while the system autonomously manages the execution and data management aspects of experimentation.
Conclusion
This work presents a novel dynamic knowledge graph approach for realizing distributed self-driving laboratories, successfully demonstrating real-time collaboration between geographically separated labs. The system leverages ontologies and autonomous agents to manage data and material flow, ensuring data provenance and facilitating seamless integration of new resources. Future research should focus on improving robustness to network disruptions and enhancing automated quality control mechanisms, including incorporating human-in-the-loop strategies for handling unexpected experimental results. Federating SDLs, with local data storage and a central registry, should be explored. The approach is not limited to flow chemistry; its principles can be adapted to other domains.
Limitations
While this work successfully demonstrates the potential of a dynamic knowledge graph approach to distributed SDLs, several limitations exist. The system's robustness against network disruptions needs further improvement, as internet outages could impact operation. The current quality control mechanisms require enhancement, particularly to handle and diagnose abnormal data points generated from unexpected equipment failures or software malfunctions. More sophisticated strategies for handling high-dimensional optimization problems, involving complex reactions, are needed. Finally, security and access control need further development, including mechanisms for authentication and authorization, especially in a federated setting.
Related Publications
Explore these studies to deepen your understanding of the subject.