Computer Science

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

Z. He, Y. Cao, et al.

Hierarchical Memory Transformer (HMT) imitates human memory hierarchy to boost long-context processing: it preserves tokens from early segments, passes memory embeddings forward, and recalls relevant history using memory-augmented segment-level recurrence, improving language modeling, QA, and summarization while using far fewer parameters and much less inference memory. Research conducted by Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, and Jason Cong.... show more

Abstract

Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have "flat" memory architectures. Such architectures have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we believe that imitating brain memory hierarchy is beneficial for model memorization. Thus, we propose the Hierarchical Memory Transformer (HMT) ¹, a novel framework that facilitates a model's long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling, question-answering tasks, and the summarization task, we show that HMT consistently improves the long-context processing ability of existing models. Furthermore, HMT achieves a comparable or superior generation quality to long-context LLMs with 2~57x fewer parameters and 2.5~116× less inference memory, significantly outperforming previous memory-augmented models.

Publisher

arXiv

Published On

Feb 06, 2025

Authors

Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong

DOI

https://doi.org/10.18653/v1/2025.naacl-long.410

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Shared computational principles for language processing in humans and deep language models

A. Goldstein, Z. Zada, et al.

Medicine and Health

Natural language processing system for rapid detection and intervention of mental health crisis chat messages

A. Swaminathan, I. López, et al.

Computer Science

Generation and evaluation of artificial mental health records for Natural Language Processing

J. Ive, N. Viani, et al.

Business

Abstract or concrete? The effects of language style and service context on continuous usage intention for AI voice assistants

H. Lan, X. Tang, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny