Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan, Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang, Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla, Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao, Satoshi Kondo, Winnie Pang, Hongliang Ren, Julian Ronald Abbing, Mohammad Hasan Sarhan, Sebastian Bodenstedt, Nithya Bhasker, Bruno Oliveira, Helena R. Torres, Li Ling, Finn Gaida, Tobias Czempiel, João L. Vilaça, Pedro Morais, Jaime Fonseca, Ruby Mae Egging, Inge Nicole Wijma, Chen Qian, GuiBin Bian, Zhen Li, Velmurugan Balasubramanian, Debdoot Sheet, Imanol Luengo, Yuanbo Zhu, Shuai Ding, Jakob-Anton Aschenbrenner, Nicolas Elini van der Kar, Mengya Xu, Mobarakol Islam, Lalithkumar Seenivasan, Alexander Jenke, Danail Stoyanov, Didier Mutter, Pietro Mascagni, Barbara Seeliger, Cristians Gonzalez, Nicolas Padoy
Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Activity Recognition | CholecT50 (Challenge) | mAP | 38.1 | Team Trequartista |
| Activity Recognition | CholecT50 (Challenge) | mAP | 36.9 | Team 2Ai |
| Activity Recognition | CholecT50 (Challenge) | mAP | 35.8 | Team SIAT CAMI |
| Activity Recognition | CholecT50 (Challenge) | mAP | 32.9 | Team HFUT-MedIA |
| Activity Recognition | CholecT50 (Challenge) | mAP | 32.7 | Rendezvous (TensorFlow v1) |
| Activity Recognition | CholecT50 (Challenge) | mAP | 32 | Team CITI SJTU |
| Activity Recognition | CholecT50 (Challenge) | mAP | 31.9 | Team ANL |
| Activity Recognition | CholecT50 (Challenge) | mAP | 31.7 | Team Digital Surgery |
| Activity Recognition | CholecT50 (Challenge) | mAP | 26.7 | Team Casia Robotics |
| Activity Recognition | CholecT50 (Challenge) | mAP | 26.3 | Team Lsgroup |
| Activity Recognition | CholecT50 (Challenge) | mAP | 25.6 | Team J&M |
| Activity Recognition | CholecT50 (Challenge) | mAP | 25.5 | Attention Tripnet (TensorFlow v1) |
| Activity Recognition | CholecT50 (Challenge) | mAP | 25.2 | Team Ceaiik |
| Activity Recognition | CholecT50 (Challenge) | mAP | 24.8 | Team SJTU-IMR |
| Activity Recognition | CholecT50 (Challenge) | mAP | 18.4 | Team SK |
| Activity Recognition | CholecT50 (Challenge) | mAP | 18.1 | Team MMLAB |
| Activity Recognition | CholecT50 (Challenge) | mAP | 16 | Team Band of Broeders |
| Activity Recognition | CholecT50 (Challenge) | mAP | 10.4 | Team NCT-TSO |
| Activity Recognition | CholecT50 (Challenge) | mAP | 9.8 | Team HFUT-NUS |
| Activity Recognition | CholecT50 (Challenge) | mAP | 9.3 | Team CAMP |
| Activity Recognition | CholecT50 (Challenge) | mAP | 4.2 | Team Med Recognizer |
| Action Recognition | CholecT50 (Challenge) | mAP | 38.1 | Team Trequartista |
| Action Recognition | CholecT50 (Challenge) | mAP | 36.9 | Team 2Ai |
| Action Recognition | CholecT50 (Challenge) | mAP | 35.8 | Team SIAT CAMI |
| Action Recognition | CholecT50 (Challenge) | mAP | 32.9 | Team HFUT-MedIA |
| Action Recognition | CholecT50 (Challenge) | mAP | 32.7 | Rendezvous (TensorFlow v1) |
| Action Recognition | CholecT50 (Challenge) | mAP | 32 | Team CITI SJTU |
| Action Recognition | CholecT50 (Challenge) | mAP | 31.9 | Team ANL |
| Action Recognition | CholecT50 (Challenge) | mAP | 31.7 | Team Digital Surgery |
| Action Recognition | CholecT50 (Challenge) | mAP | 26.7 | Team Casia Robotics |
| Action Recognition | CholecT50 (Challenge) | mAP | 26.3 | Team Lsgroup |
| Action Recognition | CholecT50 (Challenge) | mAP | 25.6 | Team J&M |
| Action Recognition | CholecT50 (Challenge) | mAP | 25.5 | Attention Tripnet (TensorFlow v1) |
| Action Recognition | CholecT50 (Challenge) | mAP | 25.2 | Team Ceaiik |
| Action Recognition | CholecT50 (Challenge) | mAP | 24.8 | Team SJTU-IMR |
| Action Recognition | CholecT50 (Challenge) | mAP | 18.4 | Team SK |
| Action Recognition | CholecT50 (Challenge) | mAP | 18.1 | Team MMLAB |
| Action Recognition | CholecT50 (Challenge) | mAP | 16 | Team Band of Broeders |
| Action Recognition | CholecT50 (Challenge) | mAP | 10.4 | Team NCT-TSO |
| Action Recognition | CholecT50 (Challenge) | mAP | 9.8 | Team HFUT-NUS |
| Action Recognition | CholecT50 (Challenge) | mAP | 9.3 | Team CAMP |
| Action Recognition | CholecT50 (Challenge) | mAP | 4.2 | Team Med Recognizer |