Introduction
Surgical complications are a major source of morbidity, mortality, and healthcare costs, particularly in underserved regions lacking access to surgical expertise. Many adverse events stem from errors in surgical team's cognitive processes, such as misjudging anatomical planes during dissection. Deep learning AI offers potential for real-time guidance by interpreting surgical scenes and identifying potential risks. While several AI models have been developed for minimally invasive surgeries, their clinical deployment is hampered by two major limitations: 1) lack of generalizability across diverse operating theaters with varying equipment and data acquisition methods, leading to issues with pre-processing and model training; and 2) reliance on high-performance hardware and fast internet connections, making deployment impractical in resource-limited settings. This study aims to overcome these limitations by developing a scalable, equipment-agnostic framework for real-time surgical AI, focusing on laparoscopic cholecystectomy as a high-impact use case where bile duct injuries are a major concern. The GoNoGoNet algorithm, previously developed for semantic segmentation of safe and dangerous dissection zones, is used as a basis, but with improved model architectures and deployment strategy for enhanced generalizability and scalability.
Literature Review
Existing research demonstrates the potential of deep learning for computer vision tasks in surgery, including image classification and semantic segmentation. Previous work has explored models for predicting safe and dangerous zones in laparoscopic cholecystectomy, tool identification, and surgical phase recognition. However, these studies primarily focus on proof-of-concept models without addressing challenges related to clinical deployment. Specifically, there's a need for models that are generalizable to various operating room environments and adaptable to the computational resources available in diverse settings, including resource-limited regions with limited hardware and internet connectivity. The current state-of-the-art often uses computationally expensive architectures, limiting real-time applicability and accessibility in many locations.
Methodology
This study comprised two phases: model development and validation, and platform development and testing.
**Phase 1: Model Development and Validation:** Two lightweight deep learning models, U-Net and SegFormer, were chosen for their suitability for real-time inference. To address data heterogeneity inherent in surgical videos from various sources (different cameras, resolutions, aspect ratios, etc.), pre-processing techniques were employed. For the U-Net model, frames were divided into overlapping patches to mitigate the effects of varying aspect ratios. For SegFormer, inputs were resized to a fixed height while preserving aspect ratio and then padded to a fixed shape. The models were trained on a large, diverse multicenter dataset (Dataset 1) from 136 institutions and 37 countries, using a 70%/15%/15% split for training, validation, and testing. The models' hyperparameters were tuned using cross-entropy loss on the validation set, and the best-performing models were selected based on validation performance.
**Phase 2: Platform Development and Testing:** A web-based platform was developed using ReactJS (frontend) and Flask (backend) to enable real-time inference on any surgical video stream from any edge device (laptop, smartphone, etc.). The platform supports both synchronous (live inference) and asynchronous (offline processing) modes. The platform design prioritizes usability, incorporating features such as a color-coded overlay to display Go/No-Go zones, with transparency reflecting confidence levels. A slider allows surgeons to adjust the threshold for displaying Go zones based on their risk tolerance for specific procedures. The backend used a shared pool of four GPU workers on a private cloud, coordinated by a round-robin queueing system. A flow-control algorithm optimized the data pipeline to handle variable network speeds, minimizing latency. To make the platform usable in low-bandwidth settings, an option was included to downscale frames and predictions before transmission, significantly reducing data transfer volume. The platform's performance was tested across various network speeds (1-32 Mbps) using Chrome Developer tools to simulate network conditions. Metrics included frames per second (fps) and round-trip delay.
Key Findings
Both U-Net and SegFormer models achieved satisfactory performance in segmenting Go and No-Go zones. On Dataset 2 (independent multicenter dataset), U-Net achieved mean Dice scores of 57% (Go zone) and 76% (No-Go zone), while SegFormer achieved 60% and 76%, respectively. Precision and recall were also high for the No-Go zone, indicating good performance in identifying high-risk regions. The web platform demonstrated robustness across a wide range of network speeds. With the flow control algorithm, a prediction stream of at least 60 fps was maintained with acceptable latency (under 150ms) even at 2 Mbps. Using the frame downscaling feature, 60fps was achievable with under 200ms delay even at 1 Mbps, demonstrating suitability for low-connectivity settings. The minimal performance drop (less than 2% for Dice, precision, and recall) with downscaling validated this optimization strategy. The platform's ability to function with minimal latency even with low bandwidth demonstrates its potential for deployment in resource-constrained environments.
Discussion
This study demonstrates a feasible and cost-effective framework for deploying real-time AI-driven surgical decision support in diverse settings. The use of lightweight model architectures, a well-designed web platform, and optimized network pipeline, addresses previous limitations on generalizability and scalability. The high performance and robustness across variable network conditions, particularly the successful operation in low bandwidth scenarios, highlight the potential for democratizing access to advanced surgical AI. The framework's adaptability to different devices and network conditions broadens its applicability and helps bridge healthcare disparities. Future work should focus on incorporating temporal information into the model architecture for improved prediction consistency and on further reducing latency. Moreover, further research on user interaction design, integration into surgical workflows, and assessment of the impact on clinical outcomes and cost-effectiveness is needed.
Conclusion
This study presents a novel, scalable, and equipment-agnostic framework for deploying real-time surgical AI. The use of lightweight models and a well-optimized web platform enables high-performance prediction streams even on low bandwidth connections, making it suitable for resource-limited settings. The promising results demonstrate the feasibility and potential for wider adoption of AI-assisted surgical decision support, bridging healthcare disparities and improving surgical safety globally. Future research should focus on refining the models, enhancing user interaction design, and conducting clinical trials to evaluate the impact on surgical outcomes and cost-effectiveness.
Limitations
While the models demonstrated good performance, several limitations warrant consideration. The models were trained on single frames, neglecting potential temporal context within surgical videos, which could affect prediction consistency. While the dataset was diverse, including data from various institutions and countries, additional data, especially from edge cases (complex anatomies, complications), would further enhance model performance and robustness. Traditional computer vision metrics may not fully capture the clinical relevance of the models, and additional validation studies using clinical outcome measures are needed. Finally, this study focuses on technical feasibility and does not fully address the integration into the operating room workflow or implications for patient outcomes.
Related Publications
Explore these studies to deepen your understanding of the subject.