Engineering and Technology

Multimodal graph representation learning for robust surgical workflow recognition with adversarial feature disentanglement

L. Bai, B. Ma, et al.

Discover how a team of researchers, including Long Bai and Boyi Ma, tackled the challenges of surgical workflow recognition. Their innovative GRAD approach combines vision and kinematic data to enhance automation and decision-making while overcoming data corruption issues. Experience a leap in surgical technology!

00:00

Playback language: English

Index

Abstract

Surgical workflow recognition is crucial for automating tasks, aiding decision-making, and training surgeons. However, data corruption (e.g., occlusion, transmission errors) hinders performance. This paper proposes GRAD, a robust multimodal graph-based approach integrating vision and kinematic data. GRAD uses a Multimodal Disentanglement Graph Network (MDGNet) to capture fine-grained visual information and model vision-kinematic relationships. A Vision-Kinematic Adversarial (VKA) framework aligns feature spaces, and a Contextual Calibrated Decoder enhances robustness. Experiments on two public datasets show high accuracy (86.87% and 92.38%) and robustness to data corruption.

Publisher

Information Fusion

Published On

May 16, 2025

Authors

Long Bai, Boyi Ma, Ruohan Wang, Guankun Wang, Beilei Cui, Zhongliang Jiang, Mobarakol Islam, Zhe Min, Jiewen Lai, Nassir Navab, Hongliang Ren

DOI

https://doi.org/10.1016/j.inffus.2025.103290

Related Publications

Explore these studies to deepen your understanding of the subject.

Interdisciplinary Studies

ACCELERATING SCIENTIFIC DISCOVERY WITH GENERATIVE KNOWLEDGE EXTRACTION, GRAPH-BASED REPRESENTATION, AND MULTIMODAL INTELLIGENT GRAPH REASONING

M. J. Buehler

Medicine and Health

An integrated network representation of multiple cancer-specific data for graph-based machine learning

L. Pu, M. Singha, et al.

Chemistry

Representation of molecular structures with persistent homology for machine learning applications in chemistry

J. Townsend, C. P. Micucci, et al.

Engineering and Technology

Deep-learning-based image segmentation integrated with optical microscopy for automatically searching for two-dimensional materials

S. Masubuchi, E. Watanabe, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny