logo
ResearchBunny Logo
Multimodal graph representation learning for robust surgical workflow recognition with adversarial feature disentanglement

Engineering and Technology

Multimodal graph representation learning for robust surgical workflow recognition with adversarial feature disentanglement

L. Bai, B. Ma, et al.

Discover how a team of researchers, including Long Bai and Boyi Ma, tackled the challenges of surgical workflow recognition. Their innovative GRAD approach combines vision and kinematic data to enhance automation and decision-making while overcoming data corruption issues. Experience a leap in surgical technology!

00:00
00:00
Playback language: English
Abstract
Surgical workflow recognition is crucial for automating tasks, aiding decision-making, and training surgeons. However, data corruption (e.g., occlusion, transmission errors) hinders performance. This paper proposes GRAD, a robust multimodal graph-based approach integrating vision and kinematic data. GRAD uses a Multimodal Disentanglement Graph Network (MDGNet) to capture fine-grained visual information and model vision-kinematic relationships. A Vision-Kinematic Adversarial (VKA) framework aligns feature spaces, and a Contextual Calibrated Decoder enhances robustness. Experiments on two public datasets show high accuracy (86.87% and 92.38%) and robustness to data corruption.
Publisher
Information Fusion
Published On
May 16, 2025
Authors
Long Bai, Boyi Ma, Ruohan Wang, Guankun Wang, Beilei Cui, Zhongliang Jiang, Mobarakol Islam, Zhe Min, Jiewen Lai, Nassir Navab, Hongliang Ren
Tags
surgical workflow recognition
data corruption
multimodal graph-based approach
MDGNet
VKA framework
robustness
kinematic data
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny