Computer Science
M+: Extending MemoryLLM with Scalable Long-Term Memory
Y. Wang, D. Krotov, et al.
Large language models often lose information from the distant past. MemoryLLM compresses past context into a 1B-parameter latent memory but struggles beyond ~20k tokens. This paper presents M+, which augments MemoryLLM with a long-term memory and a co-trained retriever to dynamically fetch relevant information during generation, extending retention from under 20k to over 160k tokens with similar GPU overhead. Research conducted by Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, Rogerio Feris, and Zexue He.
Related Publications
Explore these studies to deepen your understanding
Adjacent work that informs or extends this paper's methodology and findings.
Business
Financial time series prediction under Covid-19 pandemic crisis with Long Short-Term Memory (LSTM) network
M. Mroua and A. Lamine
Psychology
Long-term memory guides resource allocation in working memory
A. L. Bruning and J. A. Lewis-peacock
Medicine and Health
Trends in dietary patterns over the last decade and their association with long-term mortality in general US populations with undiagnosed and diagnosed diabetes
S. Yuan, J. He, et al.
Psychology
Factors associated with returning to work after long-term absence due to mental disorders
P. Rissanen, R. Autio, et al.

