Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

Daochang Liu, Tingting Jiang

2018-06-21Action Segmentation Sequential Decision Making Reinforcement Learning Segmentation Decision Making Surgical Gesture Recognition General Classification Classification reinforcement-learning

Paper PDF Code

Abstract

Recognition of surgical gesture is crucial for surgical skill assessment and efficient surgery training. Prior works on this task are based on either variant graphical models such as HMMs and CRFs, or deep learning models such as Recurrent Neural Networks and Temporal Convolutional Networks. Most of the current approaches usually suffer from over-segmentation and therefore low segment-level edit scores. In contrast, we present an essentially different methodology by modeling the task as a sequential decision-making process. An intelligent agent is trained using reinforcement learning with hierarchical features from a deep model. Temporal consistency is integrated into our action design and reward mechanism to reduce over-segmentation errors. Experiments on JIGSAWS dataset demonstrate that the proposed method performs better than state-of-the-art methods in terms of the edit score and on par in frame-wise accuracy. Our code will be released later.

Results

Task	Dataset	Metric	Value	Model
Action Localization	JIGSAWS	Accuracy	81.43	RL (full)
Action Localization	JIGSAWS	Edit Distance	87.96	RL (full)
Action Localization	JIGSAWS	F1@10	92	RL (full)
Action Localization	JIGSAWS	F1@25	90.5	RL (full)
Action Localization	JIGSAWS	F1@50	82.2	RL (full)
Action Segmentation	JIGSAWS	Accuracy	81.43	RL (full)
Action Segmentation	JIGSAWS	Edit Distance	87.96	RL (full)
Action Segmentation	JIGSAWS	F1@10	92	RL (full)
Action Segmentation	JIGSAWS	F1@25	90.5	RL (full)
Action Segmentation	JIGSAWS	F1@50	82.2	RL (full)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning2025-07-18 Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion2025-07-18 VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17 Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17 Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback2025-07-17 VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks2025-07-17 QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation2025-07-17