TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MuST: Multi-Scale Transformers for Surgical Phase Recognit...

MuST: Multi-Scale Transformers for Surgical Phase Recognition

Alejandra Pérez, Santiago Rodríguez, Nicolás Ayobi, Nicolás Aparicio, Eugénie Dessevres, Pablo Arbeláez

2024-07-24Surgical phase recognitionOnline surgical phase recognition
PaperPDFCode(official)

Abstract

Phase recognition in surgical videos is crucial for enhancing computer-aided surgical systems as it enables automated understanding of sequential procedural stages. Existing methods often rely on fixed temporal windows for video analysis to identify dynamic surgical phases. Thus, they struggle to simultaneously capture short-, mid-, and long-term information necessary to fully understand complex surgical procedures. To address these issues, we propose Multi-Scale Transformers for Surgical Phase Recognition (MuST), a novel Transformer-based approach that combines a Multi-Term Frame encoder with a Temporal Consistency Module to capture information across multiple temporal scales of a surgical video. Our Multi-Term Frame Encoder computes interdependencies across a hierarchy of temporal scales by sampling sequences at increasing strides around the frame of interest. Furthermore, we employ a long-term Transformer encoder over the frame embeddings to further enhance long-term reasoning. MuST achieves higher performance than previous state-of-the-art methods on three different public benchmarks.

Results

TaskDatasetMetricValueModel
Surgical phase recognitionGraSPmAP79.14MuST
Surgical phase recognitionHeiChole BenchmarkF177.25MuST
Surgical phase recognitionMISAWmAP98.08MuST
Surgical phase recognitionCholec80F185.57MuST

Related Papers

Holistic Surgical Phase Recognition with Hierarchical Input Dependent State Space Models2025-06-26Recognizing Surgical Phases Anywhere: Few-Shot Test-time Adaptation and Task-graph Guided Refinement2025-06-25Meta-SurDiff: Classification Diffusion Model Optimized by Meta Learning is Reliable for Online Surgical Phase Recognition2025-06-17ReSW-VL: Representation Learning for Surgical Workflow Analysis Using Vision-Language Model2025-05-19Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities2025-04-26Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections2025-04-23Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings2025-03-25fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models2025-03-25