TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Rendezvous in Time: An Attention-based Temporal Fusion app...

Rendezvous in Time: An Attention-based Temporal Fusion approach for Surgical Triplet Recognition

Saurav Sharma, Chinedu Innocent Nwoye, Didier Mutter, Nicolas Padoy

2022-11-30Action Triplet Recognition
PaperPDFCode(official)

Abstract

One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target). Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. In this paper, we propose Rendezvous in Time (RiT) - a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition. We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as (instrument, verb). Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts. We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.

Results

TaskDatasetMetricValueModel
Activity RecognitionCholecT50 (Challenge)mAP30.94RiT: Rendezvous-in-Time
Action RecognitionCholecT50 (Challenge)mAP30.94RiT: Rendezvous-in-Time

Related Papers

Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections2025-04-23Surgical Triplet Recognition via Diffusion Model2024-06-19EndoViT: pretraining vision transformers on a large collection of endoscopic images2024-04-03CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection2023-02-13Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition through the Lens of Robustness2022-09-18Dissecting Self-Supervised Learning Methods for Surgical Computer Vision2022-07-01Data Splits and Metrics for Method Benchmarking on Surgical Action Triplet Datasets2022-04-11CholecTriplet2021: A benchmark challenge for surgical action triplet recognition2022-04-10