Jiyang Gao, Zhenheng Yang, Ram Nevatia
Action anticipation aims to detect an action before it happens. Many real world applications in robotics and surveillance are related to this predictive capability. Current methods address this problem by first anticipating visual representations of future frames and then categorizing the anticipated representations to actions. However, anticipation is based on a single past frame's representation, which ignores the history trend. Besides, it can only anticipate a fixed future time. We propose a Reinforced Encoder-Decoder (RED) network for action anticipation. RED takes multiple history representations as input and learns to anticipate a sequence of future representations. One salient aspect of RED is that a reinforcement module is adopted to provide sequence-level supervision; the reward function is designed to encourage the system to make correct predictions as early as possible. We test RED on TVSeries, THUMOS-14 and TV-Human-Interaction datasets for action anticipation and achieve state-of-the-art performance on all datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Act. | 8.08 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Noun | 16.07 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Verb | 29.35 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Act. | 18.19 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Noun | 38.83 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Verb | 74.49 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Act. | 2.65 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Noun | 7.81 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Verb | 22.52 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Act. | 7.57 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Noun | 21.42 | ED |
| Activity Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Verb | 62.65 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Act. | 8.08 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Noun | 16.07 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Verb | 29.35 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Act. | 18.19 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Noun | 38.83 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Verb | 74.49 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Act. | 2.65 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Noun | 7.81 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Verb | 22.52 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Act. | 7.57 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Noun | 21.42 | ED |
| Action Recognition | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Verb | 62.65 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Act. | 8.08 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Noun | 16.07 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Verb | 29.35 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Act. | 18.19 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Noun | 38.83 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Verb | 74.49 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Act. | 2.65 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Noun | 7.81 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Verb | 22.52 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Act. | 7.57 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Noun | 21.42 | ED |
| Action Anticipation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Verb | 62.65 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Act. | 8.08 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Noun | 16.07 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Verb | 29.35 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Act. | 18.19 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Noun | 38.83 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Verb | 74.49 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Act. | 2.65 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Noun | 7.81 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Verb | 22.52 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Act. | 7.57 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Noun | 21.42 | ED |
| 2D Human Pose Estimation | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Verb | 62.65 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Act. | 8.08 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Noun | 16.07 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 1 Accuracy - Verb | 29.35 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Act. | 18.19 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Noun | 38.83 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Seen test set (S1)) | Top 5 Accuracy - Verb | 74.49 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Act. | 2.65 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Noun | 7.81 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 1 Accuracy - Verb | 22.52 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Act. | 7.57 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Noun | 21.42 | ED |
| Action Recognition In Videos | EPIC-KITCHENS-55 (Unseen test set (S2) | Top 5 Accuracy - Verb | 62.65 | ED |