Yaser Souri, Yazan Abu Farha, Fabien Despinoy, Gianpiero Francesca, Juergen Gall
We introduce FIFA, a fast approximate inference method for action segmentation and alignment. Unlike previous approaches, FIFA does not rely on expensive dynamic programming for inference. Instead, it uses an approximate differentiable energy function that can be minimized using gradient-descent. FIFA is a general approach that can replace exact inference improving its speed by more than 5 times while maintaining its performance. FIFA is an anytime inference algorithm that provides a better speed vs. accuracy trade-off compared to exact inference. We apply FIFA on top of state-of-the-art approaches for weakly supervised action segmentation and alignment as well as fully supervised action segmentation. FIFA achieves state-of-the-art results on most metrics on two action segmentation datasets.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Action Localization | Breakfast | Acc | 68.6 | FIFA + MS-TCN |
| Action Localization | Breakfast | Average F1 | 66.8 | FIFA + MS-TCN |
| Action Localization | Breakfast | Edit | 78.5 | FIFA + MS-TCN |
| Action Localization | Breakfast | F1@10% | 75.5 | FIFA + MS-TCN |
| Action Localization | Breakfast | F1@25% | 70.2 | FIFA + MS-TCN |
| Action Localization | Breakfast | F1@50% | 54.8 | FIFA + MS-TCN |
| Action Localization | Breakfast | Acc | 51.3 | FIFA + MuCon |
| Action Segmentation | Breakfast | Acc | 68.6 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | Average F1 | 66.8 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | Edit | 78.5 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | F1@10% | 75.5 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | F1@25% | 70.2 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | F1@50% | 54.8 | FIFA + MS-TCN |
| Action Segmentation | Breakfast | Acc | 51.3 | FIFA + MuCon |