logo
ResearchBunny Logo
Understanding the Benefits of Hardware-Accelerated Communication in Model-Serving Applications

Computer Science

Understanding the Benefits of Hardware-Accelerated Communication in Model-Serving Applications

W. A. Hanafy, L. Wang, et al.

This groundbreaking research by Walid A Hanafy, Limin Wang, Hyunseok Chang, Sarit Mukherjee, T V Lakshman, and Prashant Shenoy reveals how hardware-accelerated communication can significantly reduce latency in machine learning pipelines. By leveraging RDMA and GPUDirect RDMA, the study demonstrates a potential latency savings of 15-50% compared to traditional TCP methods, showcasing crucial insights into performance optimization.

00:00
00:00
Playback language: English
Abstract
This paper investigates the latency overhead in machine learning (ML)-based computation pipelines and analyzes the potential benefits of hardware-accelerated communication using RDMA and GPUDirect RDMA (GDR). A model-serving framework supporting various communication mechanisms was built to identify performance bottlenecks. The study shows that GDR can save 15-50% of model-serving latency (70-160ms) compared to TCP, highlighting the importance of communication fraction, protocol translation, and data copy optimization.
Publisher
This information was not provided in the paper.
Published On
Jan 01, 2023
Authors
Walid A Hanafy, Limin Wang, Hyunseok Chang, Sarit Mukherjee, T V Lakshman, Prashant Shenoy
Tags
latency
machine learning
hardware acceleration
RDMA
GPUDirect RDMA
model-serving
performance optimization
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny