Computer Science

PENTESTGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

G. Deng, Y. Liu, et al.

LLMs promise to transform penetration testing—this study builds a real-world benchmark and shows they excel at sub-tasks but struggle with whole-context reasoning. Introducing PENTESTGPT, a three-module, LLM-driven framework that boosts task completion by 228.6% over GPT-3.5 and succeeds on real-world targets and CTFs. The research was conducted by the authors present in <Authors> tag and PENTESTGPT is open-sourced with strong community uptake.... show more

Abstract

Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, and their emergent abilities suggest their potential to revolutionize industries. In this work, we establish a comprehensive benchmark using real-world penetration testing targets and further use it to explore the capabilities of LLMs in this domain. Our findings reveal that while LLMs demonstrate proficiency in specific sub-tasks within the penetration testing process, such as using testing tools, interpreting outputs, and proposing subsequent actions, they also encounter difficulties maintaining a whole context of the overall testing scenario. Based on these insights, we introduce PENTESTGPT, an LLM-empowered automated penetration testing framework that leverages the abundant domain knowledge inherent in LLMs. PENTESTGPT is meticulously designed with three self-interacting modules, each addressing individual sub-tasks of penetration testing, to mitigate the challenges related to context loss. Our evaluation shows that PENTESTGPT not only outperforms LLMs with a task-completion increase of 228.6% compared to the GPT-3.5 model among the benchmark targets, but also proves effective in tackling real-world penetration testing targets and CTF challenges. Having been open-sourced on GitHub, PENTESTGPT has garnered over 6,500 stars in 12 months and fostered active community engagement, attesting to its value and impact in both the academic and industrial spheres.

Publisher

Proceedings of the 33rd USENIX Security Symposium

Published On

Aug 14, 2024

Authors

Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, Stefan Rass

Related Publications

Explore these studies to deepen your understanding of the subject.

Linguistics and Languages

Applying large language models for automated essay scoring for non-native Japanese

W. Li and H. Liu

Computer Science

The Potential and Limitations of Large Language Models for Text Classification through Synthetic Data Generation

A. K. P. Venkata and L. Gudala

Computer Science

The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications

S. H. Snyder, P. A. Vignaux, et al.

Education

How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment

A. Gilson, C. W. Safranek, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny