Adrien Deliège, Anthony Cioppa, Silvio Giancola, Meisam J. Seikavandi, Jacob V. Dueholm, Kamal Nasrollahi, Bernard Ghanem, Thomas B. Moeslund, Marc Van Droogenbroeck
Understanding broadcast videos is a challenging task in computer vision, as it requires generic reasoning capabilities to appreciate the content offered by the video editing. In this work, we propose SoccerNet-v2, a novel large-scale corpus of manual annotations for the SoccerNet video dataset, along with open challenges to encourage more research in soccer understanding and broadcast production. Specifically, we release around 300k annotations within SoccerNet's 500 untrimmed broadcast soccer videos. We extend current tasks in the realm of soccer to include action spotting, camera shot segmentation with boundary detection, and we define a novel replay grounding task. For each task, we provide and discuss benchmark results, reproducible with our open-source adapted implementations of the most relevant works in the field. SoccerNet-v2 is presented to the broader research community to help push computer vision closer to automatic solutions for more general video understanding and production purposes.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | SoccerNet-v2 | Average-AP | 41.8 | CALF (Cioppa et al.) |
| Video | SoccerNet-v2 | Average-AP | 24.3 | NetVLAD (Giancola et al.) |
| Video | SoccerNet-v2 | Average-mAP | 39.9 | AudioVid (Vanderplaetse et al.) |
| Video | SoccerNet-v2 | Average-mAP | 31.4 | NetVLAD (Giancola et al.) |
| Scene Parsing | SoccerNet-v2 | mIoU | 47.3 | CALF (Cioppa et al.) |
| Scene Parsing | SoccerNet-v2 | mIoU | 35.8 | Baseline |
| Video Semantic Segmentation | SoccerNet-v2 | mIoU | 47.3 | CALF (Cioppa et al.) |
| Video Semantic Segmentation | SoccerNet-v2 | mIoU | 35.8 | Baseline |
| Scene Understanding | SoccerNet-v2 | mIoU | 47.3 | CALF (Cioppa et al.) |
| Scene Understanding | SoccerNet-v2 | mIoU | 35.8 | Baseline |
| Video Retrieval | SoccerNet-v2 | Average-AP | 41.8 | CALF (Cioppa et al.) |
| Video Retrieval | SoccerNet-v2 | Average-AP | 24.3 | NetVLAD (Giancola et al.) |
| Video Segmentation | SoccerNet-v2 | mAP | 78.5 | Histogram (Scikit-Video) |
| Video Segmentation | SoccerNet-v2 | mAP | 64 | Intensity (Scikit-Video) |
| Video Segmentation | SoccerNet-v2 | mAP | 62.2 | Content (PySceneDetect) |
| Video Segmentation | SoccerNet-v2 | mAP | 59.6 | CALF (Cioppa et al.) |
| 2D Semantic Segmentation | SoccerNet-v2 | mIoU | 47.3 | CALF (Cioppa et al.) |
| 2D Semantic Segmentation | SoccerNet-v2 | mIoU | 35.8 | Baseline |