Robot Manipulation on CALVIN

Metric: avg. sequence length (D to D) (higher is better)

LeaderboardDataset

Loading chart...

Results

Sort:

#	Model↕	avg. sequence length (D to D)▼	Extra Data	Paper	Date↕	Code
1	DreamVLA	4.44	No	DreamVLA: A Vision-Language-Action Model Dreamed...	2025-07-06	Code
2	VPP	4.29	No	Video Prediction Policy: A Generalist Robot Poli...	2024-12-19	Code
3	RoboVLMs	4.25	No	Towards Generalist Robot Policies: What Matters ...	2024-12-18	Code
4	Openhelix	4.08	No	OpenHelix: A Short Survey, Empirical Analysis, a...	2025-05-06	Code
5	UP-VLA	4.08	No	UP-VLA: A Unified Understanding and Prediction M...	2025-01-31	-
6	GR-MG	4.04	No	GR-MG: Leveraging Partially Annotated Data via M...	2024-08-26	Code
7	MoDE	4.01	No	Efficient Diffusion Transformer Policies with Mi...	2024-12-17	Code
8	RoboUniView	3.855	No	RoboUniView: Visual-Language Model with Unified ...	2024-06-27	Code
9	UniVLA	3.8	No	UniVLA: Learning to Act Anywhere with Task-centr...	2025-05-09	Code
10	RoboDual	3.66	No	Towards Synergistic, Generalized, and Efficient ...	2024-10-10	-
11	VidMan	3.42	No	VidMan: Exploiting Implicit Dynamics from Video ...	2024-11-14	-
12	3DDA	3.35	No	3D Diffuser Actor: Policy Diffusion with 3D Scen...	2024-02-16	Code
13	OpenVLA	3.27	No	OpenVLA: An Open-Source Vision-Language-Action M...	2024-06-13	Code
14	3D Diffusor Actor	3.27	No	3D Diffuser Actor: Policy Diffusion with 3D Scen...	2024-02-16	Code
15	GR-1	3.06	No	Unleashing Large-Scale Video Generative Pre-trai...	2023-12-20	Code
16	Roboflamingo	2.47	No	Vision-Language Foundation Models as Effective R...	2023-11-02	-
17	LCB	1.78	No	From LLMs to Actions: Latent Codes as Bridges in...	2024-05-08	-
18	Uni-Pi	0.92	No	Learning Universal Policies via Text-Guided Vide...	2023-01-31	-
19	RT-1	0.9	No	RT-1: Robotics Transformer for Real-World Contro...	2022-12-13	Code

#1DreamVLASOTA
4.44
avg. sequence length (D to D)· 2025-07-06
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Code
#2VPPSOTA
4.29
avg. sequence length (D to D)· 2024-12-19
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations Code
#3RoboVLMsSOTA
4.25
avg. sequence length (D to D)· 2024-12-18
Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models Code
#4Openhelix
4.08
avg. sequence length (D to D)· 2025-05-06
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Code
#5UP-VLA
4.08
avg. sequence length (D to D)· 2025-01-31
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent
#6GR-MGSOTA
4.04
avg. sequence length (D to D)· 2024-08-26
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal-Conditioned Policy Code
#7MoDE
4.01
avg. sequence length (D to D)· 2024-12-17
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning Code
#8RoboUniViewSOTA
3.855
avg. sequence length (D to D)· 2024-06-27
RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulation Code
#9UniVLA
3.8
avg. sequence length (D to D)· 2025-05-09
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions Code
#10RoboDual
3.66
avg. sequence length (D to D)· 2024-10-10
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
#11VidMan
3.42
avg. sequence length (D to D)· 2024-11-14
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
#123DDASOTA
3.35
avg. sequence length (D to D)· 2024-02-16
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations Code
#13OpenVLA
3.27
avg. sequence length (D to D)· 2024-06-13
OpenVLA: An Open-Source Vision-Language-Action Model Code
#143D Diffusor Actor
3.27
avg. sequence length (D to D)· 2024-02-16
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations Code
#15GR-1SOTA
3.06
avg. sequence length (D to D)· 2023-12-20
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation Code
#16RoboflamingoSOTA
2.47
avg. sequence length (D to D)· 2023-11-02
Vision-Language Foundation Models as Effective Robot Imitators
#17LCB
1.78
avg. sequence length (D to D)· 2024-05-08
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control
#18Uni-PiSOTA
0.92
avg. sequence length (D to D)· 2023-01-31
Learning Universal Policies via Text-Guided Video Generation
#19RT-1SOTA
0.9
avg. sequence length (D to D)· 2022-12-13
RT-1: Robotics Transformer for Real-World Control at Scale Code