Computer Science

Large language models empowered agent-based modeling and simulation: a survey and perspectives

C. Gao, X. Lan, et al.

Agent-based modeling and simulation reveal emergent behaviors in complex systems, and integrating large language models promises to enhance those capabilities. This survey examines the motivation, key challenges (environment perception, human alignment, action generation, evaluation), and recent LLM-empowered simulations across cyber, physical, social, and hybrid domains, and outlines open problems and future directions. This research was conducted by the authors present in <Authors>.

00:00

~3 min • Beginner • English

Index

Introduction

The paper surveys how large language models (LLMs) can enhance agent-based modeling and simulation (ABMS), a computational paradigm that emulates real-world processes by simulating heterogeneous agents interacting within environments. The authors argue that LLM-driven agents can better capture human-like decision-making, communication, and adaptation, potentially yielding more realistic simulations across domains such as economics, biology, sociology, and ecology. Motivations include LLM agents’ ability to act without explicit instructions, plan adaptively, and interact with other agents and humans. The survey identifies the need for agents with autonomy, social ability, reactivity, and proactiveness, and posits LLMs as a new paradigm enabling these capabilities. The work addresses a gap in the literature by systematically summarizing recent advances, unresolved issues, and research directions, and by articulating how LLM agents meet ABMS requirements through perception, reasoning/decision-making, adaptive learning, and heterogeneity. Following PRISMA guidelines, the authors outline their structured approach to literature collection, filtering (ensuring true ABMS with LLM agents), and categorization by domain and environment. They contribute: (1) a first comprehensive review of LLM-based ABMS and why LLMs advance ABMS beyond rule-based and conventional neural approaches; (2) a taxonomy of ABMS applications into physical, cyber, social, and hybrid domains; and (3) identification of open problems around scaling, open platforms, robustness, and ethical risks.

Literature Review

The survey provides extensive background on ABMS, including its core components (agents, environment, interaction), required agent capabilities (autonomy, social ability, reactivity, proactiveness, bounded rationality), traditional methodologies (predefined rules, symbolic equations, stochastic modeling, machine learning), and their limitations (reactive architectures, lack of generalizable agents across environments, difficulty supporting descriptive/explanatory/predictive/exploratory goals simultaneously). It then reviews LLMs’ evolution and capabilities, highlighting their human-like reasoning, planning, and communication. The authors frame LLM agents’ critical abilities for ABMS—perception (direct/indirect via language and tools, multi-modality, first-person perspective), reasoning and decision-making (autonomy, planning), adaptive learning and evolution (in-context learning, continual adaptation), and heterogeneity/personalization (prompting and fine-tuning for diverse roles and preferences). Challenges and approaches are reviewed at a high level: (1) environment construction and interfaces (virtual vs real environments; text-based I/O augmented with tools and multimodal inputs; agent-agent communication), (2) human alignment and personalization (prompt engineering, domain-specific fine-tuning, RLHF and preference modeling), (3) action generation mechanisms (planning, memory, reflection), and (4) evaluation (micro- and macro-level realism, explanation, ethics). The paper then surveys recent advances across domains: Social (social network dynamics, cooperation frameworks, individual social behavior; economic systems at individual, interactive/game-theoretic, and market levels), Physical (mobility, navigation, transportation, wireless networks), Cyber (web browsing and recommendation), and Hybrid (sandbox societies, epidemics, war, macroeconomics, urban digital twins). The literature indicates LLM agents can replicate social phenomena (e.g., bias propagation, multi-peak epidemic patterns, network formation principles), collaborate via roles and debates to solve complex tasks, exhibit human-like economic behaviors and reasoning (yet sometimes deviate from Nash equilibria), and operate in embodied or real-world web environments. The review also covers evaluation practices and ethics (bias, fairness, harmful content).

Methodology

As a review, the paper follows PRISMA principles for literature identification and synthesis. Method steps include: (1) Eligibility criteria: include works that (a) use LLM agents and (b) address agent-based modeling and simulation; exclude papers where LLMs serve only as assistants or decision helpers (20+ papers filtered). (2) Information sources: peer-reviewed journals, conferences, preprint archives (arXiv, SSRN), and databases (IEEE Xplore, ACM DL, Elsevier, Clarivate WoS), plus citation chaining via Google Scholar. (3) Search strategy: keyword and controlled vocabulary searches on terms like “large language models,” “agent-based simulation,” “intelligent agents,” “AI-driven simulation,” and combinations referencing LLMs and ABMS. (4) Selection process: verify ABMS with LLM agents; categorize selected works by domain (social, physical, cyber, hybrid) and environment type (virtual or real). (5) Data collection and items: extract problem domain, environment setup, agent design and interfaces (text, tools, multimodal), alignment/personalization techniques (prompting, tuning), action generation components (planning, memory, reflection), and evaluation protocols (micro/macro realism, explanation, ethics). Beyond PRISMA, the paper organizes analysis around four capability dimensions crucial to ABMS: perception; reasoning and decision-making (including autonomy and planning); adaptive learning and evolution (in-context and continual adaptation); and heterogeneity/personalization (role conditioning, preference modeling via prompts/tuning). The authors also present a structural overview for environment construction and interfaces (virtual sandboxes and real-world settings; text I/O augmented by tool use and multimodal perception; direct agent-agent communication), and detail typical cognitive mechanisms for LLM agents—planning (task decomposition, curriculum, adaptive refinement), memory (external stores, episodic/semantic skills, retrieval), and reflection (self-evaluation, verbal RL). Evaluation methodology comprises micro-level behavior prediction and rationality checks, macro-level emergent pattern matching (e.g., epidemic peaks, macroeconomic regularities like Okun’s law and the Phillips curve), explanation-based assessment (agent-reported rationales), and ethics auditing (bias/fairness, harmful content). Statistics reported across surveyed works: ~50% use planning, ~50% use reflection, nearly all implement memory; ~80% conduct real-world evaluation; ~40% include ethical evaluation.

Key Findings

- LLM agents align well with ABMS requirements (autonomy, social ability, reactivity, proactiveness) and address limitations of rule-based/reactive architectures by offering human-like perception, reasoning/planning, adaptation, and controllable heterogeneity. - Environment and interfaces: Most systems use text as the primary I/O, augmented by tool use and increasingly multimodal inputs (vision/audio). Agents communicate via text directly; indirect interactions arise from domain rules (e.g., economic markets). - Alignment and personalization: Prompt engineering and tuning (including domain-specific fine-tuning and RLHF) effectively inject domain knowledge and values and enable controllable heterogeneity (e.g., personalities, preferences). - Action generation: Core cognitive modules—planning (task decomposition, curriculum, adaptive plan refinement), memory (external, long/short-term, skill libraries), and reflection (self-evaluation, verbal RL)—substantially improve performance and realism. - Evaluation practices: Realness validation at micro (individual behavior prediction) and macro levels (emergent patterns) is widespread; explanation-based evaluation leverages agents’ textual rationales; ethics evaluations consider bias, fairness, and harmful content. Reported survey statistics: about 50% of papers implement planning, 50% reflection; almost all have memory; ~80% conduct real-world evaluation; ~40% include ethics evaluation. - Social domain findings: LLM agents can reproduce social network dynamics (e.g., multi-peak epidemic patterns), exhibit biases mirroring human communication (stereotype-consistent and threat-related content), and replicate social principles in network formation (triadic closure, homophily). Cooperative task solving via role-based frameworks (e.g., CHATDEV, MetaGPT, CAMEL, AgentVerse) demonstrates effective multi-agent collaboration; debate mechanisms (MAD, ChatEval) foster divergent thinking and improved judgments. Individual social behavior can be made more human-like by modeling needs, emotions, and relationships (Humanoid Agents) and via social alignment training (Stable Alignment). - Economic systems: LLMs show human-like behavior in behavioral economics tasks (altruism, fairness, status quo bias) and rationality in budget decisions, but can underperform specialized predictors for stock movement without domain-specific tuning. In games, LLMs often display partial rationality, sometimes failing to reach Nash equilibria, with GPT-4 stronger than smaller models; preferences embedded in prompts modulate cooperation. System-level market simulations reproduce known effects (price convergence, Matthew effect, collusion under communication) and enable rational intermediary roles in information markets. - Physical and cyber domains: In physical environments, LLMs support navigation and planning (LM-Nav, LLM-Planner, NLMap), driving behavior simulation (reducing collisions, more human-like driving), and multi-agent wireless tasks (energy saving with iterative adaptation). In cyber, real-world web agents (WebAgent, Mind2Web, WebArena) decompose instructions, condense HTML, and synthesize programs; LLM user simulators interact with recommender systems (RecAgent, Agent4Rec) revealing filter bubbles and causal dynamics. - Hybrid domain: Generative Agents demonstrate credible individual and social behaviors (memory, reflection, planning); epidemic simulations combine social relations and spatial movement; war simulations model international conflicts; macroeconomic simulations with hundreds of LLM agents produce realistic indicators and regularities (e.g., correct Phillips curve) surpassing rule-based/RL baselines; urban digital twin platforms (UGI) enable city-scale embodied simulations. - Open problems: Scaling efficiency (compute/memory), comprehensive benchmarks for ABMS simulation (beyond planning), robustness to adversarial/OOD and multi-agent propagation risks, open platforms integrating complex environments, stability/reproducibility, and ethical safeguards.

Discussion

The survey demonstrates that LLM agents can substantially advance ABMS by providing general-purpose, human-like cognition and communication, addressing core challenges of perception, decision-making, adaptability, and heterogeneity. By integrating planning, memory, and reflection, LLM agents bridge the gap between reactive architectures and deliberative reasoning required for complex, long-horizon tasks. Empirical results across social, economic, physical, cyber, and hybrid scenarios show that LLM-driven simulations can replicate micro-level decisions and macro-level emergent patterns, offering interpretable explanations for actions and enabling richer evaluations. The taxonomy and synthesis indicate LLM agents are versatile across virtual and real environments, with text interfaces extended via tools and multimodality. However, consistent performance, scalability, robust behavior under adversarial or distributional shifts, and ethical concerns remain significant. The findings argue for standardized benchmarks tailored to ABMS realism and emergence, and for open platforms that lower integration barriers with complex environments (e.g., urban digital twins), to accelerate progress and validate generalizability. Overall, the survey positions LLM-empowered ABMS as a promising paradigm for simulating complex systems and informing decision-making across domains, while underscoring the need for methodological rigor and safeguards.

Conclusion

This review takes a first systematic step in surveying LLM-empowered agent-based modeling and simulation. It explains why LLM agents are well-suited to meet ABMS requirements and how to design environments, interfaces, agent cognition (planning, memory, reflection), and evaluation protocols. It synthesizes advances across social, physical, cyber, and hybrid domains, showing that LLM agents can replicate human-like behaviors and emergent phenomena and provide interpretable rationales. The paper identifies pressing future directions: (1) efficiency and scalability for large multi-agent societies; (2) comprehensive ABMS-specific benchmarks and evaluation standards (micro/macro realism, explanation, ethics); (3) robustness and stability against adversarial prompts, OOD shifts, and multi-agent propagation of failures; (4) open platforms that integrate LLM agents with rich, real or digital-twin environments; and (5) ethical alignment, interpretability, and governance. These directions aim to unlock LLM agents’ full potential as realistic simulators and tools for scientific inquiry and policy analysis.

Limitations

- Computational and scalability constraints make large-scale multi-agent simulations expensive, limiting exploration of emergent phenomena and practical deployment. - Robustness issues (adversarial vulnerability, out-of-distribution generalization, stability/reproducibility of agent outputs) can undermine reliability, especially in tool-using, human-interactive, or multi-agent settings where failures may propagate. - Evaluation gaps: existing benchmarks emphasize planning/decision tasks rather than ABMS realism; quantitative and qualitative metrics for emergent social/market dynamics and long-horizon behaviors are still developing. - Ethical risks: biases in LLMs can be reflected and amplified in simulations; harmful content generation remains a concern despite safeguards. - Interface constraints: heavy reliance on text interfaces may limit fidelity in multimodal or embodied settings without careful tool/multimodal integration. - Review scope limitations: despite PRISMA-inspired procedures, rapid developments and filtering (excluding LLM-as-assistant works) may omit relevant studies, and categorizations by domain/environment may oversimplify hybrid scenarios.

Related Publications

Explore these studies to deepen your understanding of the subject.

Computer Science

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

Y. Zhang, X. Chen, et al.

Political Science

Performance and biases of Large Language Models in public opinion simulation

Y. Qu and J. Wang

Medicine and Health

Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions

O. R. Sarrias, M. P. M. D. Prado, et al.

Psychology

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

E. C. Stade, S. W. Stirman, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny