Computer Science

M+: Extending MemoryLLM with Scalable Long-Term Memory

Y. Wang, D. Krotov, et al.

Large language models often lose information from the distant past. MemoryLLM compresses past context into a 1B-parameter latent memory but struggles beyond ~20k tokens. This paper presents M+, which augments MemoryLLM with a long-term memory and a co-trained retriever to dynamically fetch relevant information during generation, extending retention from under 20k to over 160k tokens with similar GPU overhead. Research conducted by Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, and Zexue He.... show more

General Summary Metrics

Abstract

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths up to 16k tokens, it struggles to retain knowledge beyond 20k tokens. In this work, we address this limitation by introducing M+, a memory-augmented model based on MemoryLLM that significantly enhances long-term information retention. M+ integrates a long-term memory mechanism with a co-trained retriever, dynamically retrieving relevant information during text generation. We evaluate M+ on diverse benchmarks, including long-context understanding and knowledge retention tasks. Experimental results show that M+ significantly outperforms MemoryLLM and recent strong baselines, extending knowledge retention from under 20k to over 160k tokens with similar GPU memory overhead. We open-source our code at https://github.com/wangyu-ustc/MemoryLLM.

Publisher

Published On

Authors

Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, Zexue He

DOI

https://doi.org/10.48550/arXiv.2502.00592

Explore these studies to deepen your understanding

Adjacent work that informs or extends this paper's methodology and findings.

Business

Financial time series prediction under Covid-19 pandemic crisis with Long Short-Term Memory (LSTM) network

M. Mroua and A. Lamine

Psychology

Long-term memory guides resource allocation in working memory

A. L. Bruning and J. A. Lewis-peacock

Medicine and Health

Trends in dietary patterns over the last decade and their association with long-term mortality in general US populations with undiagnosed and diagnosed diabetes

S. Yuan, J. He, et al.

Biology

Long-term effects of SARS-CoV-2 infection on human brain and memory

Q. Ding and H. Zhao

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 22+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny