TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Towards Streaming Perception

Towards Streaming Perception

Mengtian Li, Yu-Xiong Wang, Deva Ramanan

2020-05-21ECCV 2020 8Real-Time Multi-Object TrackingReal-Time Object DetectionMotion ForecastingSemantic SegmentationSchedulingInstance Segmentationobject-detectionObject Detection
PaperPDFCode

Abstract

Embodied perception refers to the ability of an autonomous agent to perceive its environment so that it can (re)act. The responsiveness of the agent is largely governed by latency of its processing pipeline. While past work has studied the algorithmic trade-off between latency and accuracy, there has not been a clear metric to compare different methods along the Pareto optimal latency-accuracy curve. We point out a discrepancy between standard offline evaluation and real-time applications: by the time an algorithm finishes processing a particular frame, the surrounding world has changed. To these ends, we present an approach that coherently integrates latency and accuracy into a single metric for real-time online perception, which we refer to as "streaming accuracy". The key insight behind this metric is to jointly evaluate the output of the entire perception stack at every time instant, forcing the stack to consider the amount of streaming data that should be ignored while computation is occurring. More broadly, building upon this metric, we introduce a meta-benchmark that systematically converts any single-frame task into a streaming perception task. We focus on the illustrative tasks of object detection and instance segmentation in urban video streams, and contribute a novel dataset with high-quality and temporally-dense annotations. Our proposed solutions and their empirical analysis demonstrate a number of surprising conclusions: (1) there exists an optimal "sweet spot" that maximizes streaming accuracy along the Pareto optimal latency-accuracy curve, (2) asynchronous tracking and future forecasting naturally emerge as internal representations that enable streaming perception, and (3) dynamic scheduling can be used to overcome temporal aliasing, yielding the paradoxical result that latency is sometimes minimized by sitting idle and "doing nothing".

Results

TaskDatasetMetricValueModel
Object DetectionArgoverse-HD (Full-Stack, Val)AP21.06Official challenge baseline
Object DetectionArgoverse-HD (Full-Stack, Test)AP21.06Official challenge baseline
Object DetectionArgoverse-HD (Detection-Only, Test)AP13.61Official challenge baseline
Object DetectionArgoverse-HD (Detection-Only, Val)AP14.91Official challenge baseline
3DArgoverse-HD (Full-Stack, Val)AP21.06Official challenge baseline
3DArgoverse-HD (Full-Stack, Test)AP21.06Official challenge baseline
3DArgoverse-HD (Detection-Only, Test)AP13.61Official challenge baseline
3DArgoverse-HD (Detection-Only, Val)AP14.91Official challenge baseline
2D ClassificationArgoverse-HD (Full-Stack, Val)AP21.06Official challenge baseline
2D ClassificationArgoverse-HD (Full-Stack, Test)AP21.06Official challenge baseline
2D ClassificationArgoverse-HD (Detection-Only, Test)AP13.61Official challenge baseline
2D ClassificationArgoverse-HD (Detection-Only, Val)AP14.91Official challenge baseline
2D Object DetectionArgoverse-HD (Full-Stack, Val)AP21.06Official challenge baseline
2D Object DetectionArgoverse-HD (Full-Stack, Test)AP21.06Official challenge baseline
2D Object DetectionArgoverse-HD (Detection-Only, Test)AP13.61Official challenge baseline
2D Object DetectionArgoverse-HD (Detection-Only, Val)AP14.91Official challenge baseline
16kArgoverse-HD (Full-Stack, Val)AP21.06Official challenge baseline
16kArgoverse-HD (Full-Stack, Test)AP21.06Official challenge baseline
16kArgoverse-HD (Detection-Only, Test)AP13.61Official challenge baseline
16kArgoverse-HD (Detection-Only, Val)AP14.91Official challenge baseline

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17Fremer: Lightweight and Effective Frequency Transformer for Workload Forecasting in Cloud Services2025-07-17Transient-Stability-Aware Frequency Provision in IBR-Rich Grids via Information Gap Decision Theory and Deep Learning2025-07-17