Kashyap Chitta, Aditya Prakash, Andreas Geiger
Efficient reasoning about the semantic, spatial, and temporal structure of a scene is a crucial prerequisite for autonomous driving. We present NEural ATtention fields (NEAT), a novel representation that enables such reasoning for end-to-end imitation learning models. NEAT is a continuous function which maps locations in Bird's Eye View (BEV) scene coordinates to waypoints and semantics, using intermediate attention maps to iteratively compress high-dimensional 2D image features into a compact representation. This allows our model to selectively attend to relevant regions in the input while ignoring information irrelevant to the driving task, effectively associating the images with the BEV representation. In a new evaluation setting involving adverse environmental conditions and challenging scenarios, NEAT outperforms several strong baselines and achieves driving scores on par with the privileged CARLA expert used to generate its training data. Furthermore, visualizing the attention maps for models with NEAT intermediate representations provides improved interpretability.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Autonomous Vehicles | CARLA Leaderboard | Driving Score | 21.83 | NEAT |
| Autonomous Vehicles | CARLA Leaderboard | Infraction penalty | 0.65 | NEAT |
| Autonomous Vehicles | CARLA Leaderboard | Route Completion | 41.71 | NEAT |
| Autonomous Driving | CARLA Leaderboard | Driving Score | 21.83 | NEAT |
| Autonomous Driving | CARLA Leaderboard | Infraction penalty | 0.65 | NEAT |
| Autonomous Driving | CARLA Leaderboard | Route Completion | 41.71 | NEAT |
| Novel View Synthesis | X3D | PSNR | 36.01 | NeAT |
| Novel View Synthesis | X3D | SSIM | 0.9638 | NeAT |
| CARLA longest6 | CARLA | Driving Score | 24 | Neural Attention Fields (NEAT) |
| CARLA longest6 | CARLA | Infraction Score | 0.71 | Neural Attention Fields (NEAT) |
| CARLA longest6 | CARLA | Route Completion | 62 | Neural Attention Fields (NEAT) |