Maximilian Schäfer, Kun Zhao, Anton Kummert
The prediction of road users' future motion is a critical task in supporting advanced driver-assistance systems (ADAS). It plays an even more crucial role for autonomous driving (AD) in enabling the planning and execution of safe driving maneuvers. Based on our previous work, Context-Aware Scene Prediction Network (CASPNet), an improved system, CASPNet++, is proposed. In this work, we focus on further enhancing the interaction modeling and scene understanding to support the joint prediction of all road users in a scene using spatiotemporal grids to model future occupancy. Moreover, an instance-based output head is introduced to provide multi-modal trajectories for agents of interest. In extensive quantitative and qualitative analysis, we demonstrate the scalability of CASPNet++ in utilizing and fusing diverse environmental input sources such as HD maps, Radar detection, and Lidar segmentation. Tested on the urban-focused prediction dataset nuScenes, CASPNet++ reaches state-of-the-art performance. The model has been deployed in a testing vehicle, running in real-time with moderate computational resources.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Trajectory Prediction | nuScenes | MinADE_10 | 0.92 | CASPNet++ |
| Trajectory Prediction | nuScenes | MinADE_5 | 1.16 | CASPNet++ |
| Trajectory Prediction | nuScenes | MinFDE_1 | 6.18 | CASPNet++ |
| Trajectory Prediction | nuScenes | MissRateTopK_2_10 | 0.29 | CASPNet++ |
| Trajectory Prediction | nuScenes | MissRateTopK_2_5 | 0.5 | CASPNet++ |
| Trajectory Prediction | nuScenes | OffRoadRate | 0.01 | CASPNet++ |