logo
ResearchBunny Logo
PENTESTGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

Computer Science

PENTESTGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing

G. Deng, Y. Liu, et al.

LLMs promise to transform penetration testing—this study builds a real-world benchmark and shows they excel at sub-tasks but struggle with whole-context reasoning. Introducing PENTESTGPT, a three-module, LLM-driven framework that boosts task completion by 228.6% over GPT-3.5 and succeeds on real-world targets and CTFs. The research was conducted by the authors present in <Authors> tag and PENTESTGPT is open-sourced with strong community uptake.

00:00
00:00
~3 min • Beginner • English
Abstract
Penetration testing, a crucial industrial practice for ensuring system security, has traditionally resisted automation due to the extensive expertise required by human professionals. Large Language Models (LLMs) have shown significant advancements in various domains, and their emergent abilities suggest their potential to revolutionize industries. In this work, we establish a comprehensive benchmark using real-world penetration testing targets and further use it to explore the capabilities of LLMs in this domain. Our findings reveal that while LLMs demonstrate proficiency in specific sub-tasks within the penetration testing process, such as using testing tools, interpreting outputs, and proposing subsequent actions, they also encounter difficulties maintaining a whole context of the overall testing scenario. Based on these insights, we introduce PENTESTGPT, an LLM-empowered automated penetration testing framework that leverages the abundant domain knowledge inherent in LLMs. PENTESTGPT is meticulously designed with three self-interacting modules, each addressing individual sub-tasks of penetration testing, to mitigate the challenges related to context loss. Our evaluation shows that PENTESTGPT not only outperforms LLMs with a task-completion increase of 228.6% compared to the GPT-3.5 model among the benchmark targets, but also proves effective in tackling real-world penetration testing targets and CTF challenges. Having been open-sourced on GitHub, PENTESTGPT has garnered over 6,500 stars in 12 months and fostered active community engagement, attesting to its value and impact in both the academic and industrial spheres.
Publisher
Proceedings of the 33rd USENIX Security Symposium
Published On
Aug 14, 2024
Authors
Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, Stefan Rass
Tags
Penetration testing
Large Language Models (LLMs)
PENTESTGPT
Automated security assessment
Benchmarking
Context management
Tool integration
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny