TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CarLLaVA: Vision language models for camera-only closed-lo...

CarLLaVA: Vision language models for camera-only closed-loop driving

Katrin Renz, Long Chen, Ana-Maria Marcu, Jan Hünermann, Benoit Hanotte, Alice Karnsund, Jamie Shotton, Elahe Arani, Oleg Sinavski

2024-06-14CARLA Leaderboard 2.0Autonomous DrivingLanguage Modelling
PaperPDFCode

Abstract

In this technical report, we present CarLLaVA, a Vision Language Model (VLM) for autonomous driving, developed for the CARLA Autonomous Driving Challenge 2.0. CarLLaVA uses the vision encoder of the LLaVA VLM and the LLaMA architecture as backbone, achieving state-of-the-art closed-loop driving performance with only camera input and without the need for complex or expensive labels. Additionally, we show preliminary results on predicting language commentary alongside the driving output. CarLLaVA uses a semi-disentangled output representation of both path predictions and waypoints, getting the advantages of the path for better lateral control and the waypoints for better longitudinal control. We propose an efficient training recipe to train on large driving datasets without wasting compute on easy, trivial data. CarLLaVA ranks 1st place in the sensor track of the CARLA Autonomous Driving Challenge 2.0 outperforming the previous state of the art by 458% and the best concurrent submission by 32.6%.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesBench2DriveDriving Score85.94SimLingo-Base (CarLLaVa)
Autonomous VehiclesCARLADriving Score6.87CarLLaVA
Autonomous VehiclesCARLAInfraction Score0.42CarLLaVA
Autonomous VehiclesCARLARoute Completion18.08CarLLaVA
Autonomous VehiclesCARLADriving Score6.25CarLLaVA (Map Track)
Autonomous VehiclesCARLAInfraction Score0.39CarLLaVA (Map Track)
Autonomous VehiclesCARLARoute Completion18.89CarLLaVA (Map Track)
Autonomous DrivingBench2DriveDriving Score85.94SimLingo-Base (CarLLaVa)

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17