TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-Modal Fusion Transformer for End-to-End Autonomous D...

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Aditya Prakash, Kashyap Chitta, Andreas Geiger

2021-04-19CVPR 2021 1Sensor FusionImitation LearningMotion ForecastingAutonomous DrivingSemantic Segmentationobject-detectionObject Detection
PaperPDFCode(official)Code

Abstract

How should representations from complementary sensors be integrated for autonomous driving? Geometry-based sensor fusion has shown great promise for perception tasks such as object detection and motion forecasting. However, for the actual driving task, the global context of the 3D scene is key, e.g. a change in traffic light state can affect the behavior of a vehicle geometrically distant from that traffic light. Geometry alone may therefore be insufficient for effectively fusing representations in end-to-end driving models. In this work, we demonstrate that imitation learning policies based on existing sensor fusion methods under-perform in the presence of a high density of dynamic agents and complex scenarios, which require global contextual reasoning, such as handling traffic oncoming from multiple directions at uncontrolled intersections. Therefore, we propose TransFuser, a novel Multi-Modal Fusion Transformer, to integrate image and LiDAR representations using attention. We experimentally validate the efficacy of our approach in urban settings involving complex scenarios using the CARLA urban driving simulator. Our approach achieves state-of-the-art driving performance while reducing collisions by 76% compared to geometry-based fusion.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesTown05 ShortRC86.91Geometric Fusion
Autonomous VehiclesTown05 ShortDS54.52TransFuser
Autonomous VehiclesTown05 ShortRC78.41TransFuser
Autonomous VehiclesTown05 LongRC69.17Geometric Fusion
Autonomous VehiclesTown05 LongDS33.15TransFuser
Autonomous VehiclesTown05 LongRC56.36TransFuser
Autonomous VehiclesCARLA LeaderboardDriving Score16.93Transfuser
Autonomous VehiclesCARLA LeaderboardInfraction penalty0.42Transfuser
Autonomous VehiclesCARLA LeaderboardRoute Completion51.82Transfuser
Semantic SegmentationKITTI-360mIoU56.57TransFuser (RGB-LiDAR)
Autonomous DrivingTown05 ShortRC86.91Geometric Fusion
Autonomous DrivingTown05 ShortDS54.52TransFuser
Autonomous DrivingTown05 ShortRC78.41TransFuser
Autonomous DrivingTown05 LongRC69.17Geometric Fusion
Autonomous DrivingTown05 LongDS33.15TransFuser
Autonomous DrivingTown05 LongRC56.36TransFuser
Autonomous DrivingCARLA LeaderboardDriving Score16.93Transfuser
Autonomous DrivingCARLA LeaderboardInfraction penalty0.42Transfuser
Autonomous DrivingCARLA LeaderboardRoute Completion51.82Transfuser
10-shot image generationKITTI-360mIoU56.57TransFuser (RGB-LiDAR)

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18The Imitation Game: Turing Machine Imitator is Length Generalizable Reasoner2025-07-17Supervised Fine Tuning on Curated Data is Reinforcement Learning (and can be improved)2025-07-17World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17