Interdisciplinary Studies
SciAGENTS: Automating Scientific Discovery Through Multi-Agent Intelligent Graph Reasoning
A. Ghafarollahi and M. J. Buehler
Discover how SciAgents, developed by Alireza Ghafarollahi and Markus J. Buehler at MIT, harnesses the power of large-scale ontological knowledge graphs and large language models to redefine scientific discovery. This innovative system not only uncovers hidden interdisciplinary relationships but also accelerates materials development by leveraging nature's design principles.
~3 min • Beginner • English
Introduction
The paper addresses the grand challenge of automating parts of the scientific discovery process by enabling AI systems to mine, organize, and reason over vast, multi-disciplinary scientific corpora. Traditional human-driven research is limited by individual expertise and the scale of available data. Large language models have advanced idea generation and hypothesis formation, yet face issues of domain expertise, hallucinations, accountability, and transparency. The authors propose enhancing LLMs via in-context learning with accurately retrieved knowledge, ontological knowledge graphs to structure scientific concepts and relations, and multi-agent collaboration to decompose complex reasoning steps. The research question is whether a modular, LLM-powered multi-agent system grounded in a large-scale ontological knowledge graph can autonomously generate, refine, and critique scientific hypotheses with novelty and feasibility that rival or surpass conventional methods, particularly in bio-inspired materials design.
Literature Review
The work builds on several strands of prior research: (1) advances in LLMs (e.g., GPT-4 family) and their emergent abilities for reasoning, code generation, and scientific text processing; (2) in-context learning and retrieval augmentation to mitigate hallucinations and inject domain knowledge; (3) construction of ontological knowledge graphs from scientific literature to structure entities, relations, and mechanisms; (4) applications of LLMs to scientific information extraction and hypothesis generation; and (5) multi-agent AI systems (e.g., AutoGen) that enable role-specialized collaboration, adversarial critique, and planning to improve reasoning depth. Prior work has shown LLMs can extract structured data from text, unify with knowledge graphs, and assist in materials design, yet a comprehensive framework that integrates graph-grounded reasoning, multi-agent orchestration, and novelty/feasibility assessment tools for end-to-end scientific ideation remains underexplored. This study positions SciAgents as a synthesis of these components, extending earlier graph-reasoning and agentic approaches to automated hypothesis generation and critique in materials science.
Methodology
System overview: The authors design two complementary multi-agent frameworks powered by GPT-4-family models accessed via OpenAI API and orchestrated with AutoGen. Both approaches ground agent reasoning in an ontological knowledge graph derived from ~1,000 papers on biological and bio-inspired materials. The graph contains 33,159 nodes and 48,753 edges (giant component; 92 communities) and uses the BAAI/bge-large-en-v1.5 embedding model for node embeddings.
Knowledge path extraction: A heuristic pathfinding algorithm combines embedding-based distance heuristics with a randomized variant of Dijkstra’s algorithm (randomness factor α=0.2) and random waypoints to discover diverse, non-shortest paths between two nodes (user-specified or randomly chosen). After path discovery, a subgraph consisting of the path nodes and second-hop neighbors is generated to provide rich context. This strategy contrasts with earlier shortest-path-only approaches and is intended to yield richer, cross-domain conceptual substrates for ideation.
Agent roles and pipelines:
- Pre-programmed interaction pipeline: A fixed sequence of agents acts over the sampled subgraph: Ontologist (defines concepts/relations); Scientist 1 (drafts a detailed, 7-key research hypothesis: hypothesis, outcome, mechanisms, design principles, unexpected properties, comparison, novelty); Scientist 2 (expands each aspect with quantitative details and methods); Critic (summarizes, reviews, identifies weaknesses, proposes improvements, and prioritizes modeling/experimental tasks).
- Autonomous interaction pipeline: A flexible, planner-driven team (Human, Planner, Assistant, Ontologist, Scientist 1, Scientist 2, Critic, Assistant with tool access, and Group chat manager) dynamically selects next speakers based on context, shares memory among agents, and integrates external tools. Tools include: (a) knowledge path generator (as above) and (b) a novelty assistant that queries the Semantic Scholar API (three distinct keyword combinations; top 10 results per query) to rate novelty and feasibility (1–10) with a critical summary.
Graph reasoning workflow: From initial keywords, a subgraph is sampled via the path algorithm. The Ontologist expands nodes and edges with definitions and relation context. Scientist 1 produces a structured JSON hypothesis with the seven keys. Each key is subsequently expanded through targeted prompts to add quantitative details (chemical formulas, sequences, parameters, processing conditions), mechanisms across scales, and modeling/experimental plans. The expanded document is then critically reviewed by the Critic, which also outlines high-impact modeling (e.g., MD, DFT, FEA, CFD) and experimental (e.g., synthetic biology, materials fabrication, characterization) priorities. The system compiles final outputs as integrated documents (PDF/CSV) for further analysis.
Implementation details: The autonomous system is implemented in AutoGen using AssistantAgent for AI agents and GroupChatManager for coordination; the Human uses UserProxyAgent. Profiles (system messages) define roles (Planner, Assistant, Ontologist, Scientist 1/2, Critic). Function tools are defined with names, descriptions, and typed inputs. Semantic Scholar analysis is invoked by a dedicated "novelty assistant" agent that repeatedly calls the API until success, analyzes abstracts, assigns novelty/feasibility ratings, and returns a recommendation.
Case-study and experiments: The framework is applied to bio-inspired materials. Two principal demonstrations are reported: (1) A deep-dive case study starting from the concepts "silk" and "energy-intensive," comparing shortest vs random path contexts and generating an 8,100-word research dossier; (2) Five automated experiments from randomly selected endpoints producing diverse hypotheses and novelty/feasibility ratings.
Key Findings
- Richer graph sampling improves ideation: Random path sampling (vs shortest path) injects additional concepts and relations, leading to more sophisticated, cross-domain hypotheses and deeper agentic reasoning.
- Case study (silk + dandelion pigments composite):
• Proposed integrating silk fibroin with dandelion-derived pigments via low-temperature processing (<50°C) to create structurally colored, energy-efficient biomaterials.
• Predicted tensile strength up to ~1.5 GPa (vs typical silk 0.5–1.0 GPa), attributed to hierarchical organization and pigment reinforcement (e.g., luteolin C15H12O6; taraxasterol C30H50O).
• Estimated ~30% reduction in processing energy vs traditional high-temperature degumming.
• Mechanisms and design principles included pigment-guided nano-assembly (photonic structures), hierarchical structuring, pigment concentration control (0.1–1.0 wt%), and biocompatible crosslinking (e.g., genipin, C11H14O5).
• Unexpected properties predicted: self-healing (up to ~80% mechanical recovery within 24 h), humidity/temperature-responsive color shifts (10–50 nm per 10% RH change), UV protection (>90%), antimicrobial activity (10–15 mm inhibition zones vs E. coli, S. aureus).
• Modeling/experiments: MD (100–500 ns, CHARMM/AMBER; CGenFF for pigments); optical FDTD (MEEP) for reflectance; FEA for mechanics; AFM/SEM/TEM/SAXS, FTIR/XRD/DSC/DMA; LCA for energy.
- Autonomous multi-agent vs pre-programmed: Both generate coherent hypotheses; the autonomous system, with shared memory and tool access (novelty scoring), provides deeper integration, dynamic planning, and external validation against literature.
- Automated experiments (5 examples with novelty/feasibility):
1) Biomimetic microfluidic chips (lamellar keratin-inspired) for improved heat transfer: +20–30% heat transfer, −15% failure rate; novelty/feasibility 8/7.
2) Collagen hierarchical 3D porous material for crashworthiness and stiffness memory: +30% crashworthiness, ~85% stiffness recovery, +25% Young’s modulus; 8/7.
3) Collagen scaffolds with tunable processability and nanocomposites (GO/HA/CNT): +50% tensile strength, +40% elasticity, controlled pore sizes; 6/8.
4) Nacre-inspired coatings with amyloid fibrils: water contact angle >150°, fracture toughness ≥10 MPa·m^0.5; 7/8.
5) Graphene–amyloid fibril composites for bioelectronics: improved conductivity/stability with gene-circuit control over protein expression/assembly; 8/7.
- Novelty assistant example: For biomimetic microfluidic chips, literature checks yielded novelty 8/10 and feasibility 7/10; no direct matches found, with recognized implementation challenges.
- Scale and outputs: The system consistently generates extensive, structured research documents (thousands of words), including critique and prioritized modeling/experimental plans.
Discussion
The findings show that grounding LLM-based agents in ontological knowledge graphs and orchestrating them in multi-agent workflows can meaningfully address the challenges of automated scientific discovery. The richer context from random graph paths increases conceptual diversity and hypothesis novelty. The multi-agent division of labor (ontology, synthesis, expansion, critique, planning) mirrors human scientific workflows, with adversarial/critical interactions analogous to peer review, thereby improving rigor and feasibility. Tool integration for literature-based novelty assessment adds a reality check that filters redundant ideas. The case studies demonstrate that SciAgents can produce quantitatively detailed, testable hypotheses (mechanisms, parameters, modeling/experimental protocols), bridging the gap from ideation to actionable research plans. These capabilities suggest that AI can scale ideation across thousands of iterations to create high-quality hypothesis databases, accelerate cross-disciplinary discovery, and stimulate the generation of new physics-based datasets via prioritized simulations and experiments.
Conclusion
SciAgents integrates ontological knowledge graphs, LLM-based multi-agent reasoning, and external tooling into a modular framework that autonomously generates, refines, and critiques scientific hypotheses. Applied to bio-inspired materials, it reveals hidden relationships, proposes novel, feasible research directions, and outlines actionable modeling and experimental priorities. The system’s autonomous planning and collaboration indicate emergent problem-solving behaviors that are likely to strengthen as foundation models improve. Future work includes incorporating agents that can run simulations or execute laboratory protocols, expanding to multimodal data ingestion, scaling to generate large first-principles datasets, and deploying the framework at scale to build extensive, novelty-filtered ideation repositories for generative materials informatics.
Limitations
Identified limitations include: (1) integration challenges at the nanoscale (e.g., uniform pigment incorporation, stable self-assembly); (2) scalability and reproducibility of fabrication workflows (e.g., electrospinning, soft lithography, biomimetic templating); (3) environmental and safety concerns (e.g., solvent use for pigment extraction), suggesting green chemistry alternatives; (4) limited long-term stability data (optical/mechanical durability, UV/weathering resistance); (5) dependency on LLMs with known issues in hallucination and explainability; (6) reliance on retrieval and novelty tools (Semantic Scholar coverage, query sensitivity); and (7) potential biases from the source corpus used to build the knowledge graph and embeddings. The authors propose pilot studies, detailed energy and durability analyses, improved solvent strategies, and enhanced validation protocols to address these gaps.
Related Publications
Explore these studies to deepen your understanding of the subject.

