TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Octo: An Open-Source Generalist Robot Policy

Octo: An Open-Source Generalist Robot Policy

Octo Model Team, Dibya Ghosh, Homer Walke, Karl Pertsch, Kevin Black, Oier Mees, Sudeep Dasari, Joey Hejna, Tobias Kreiman, Charles Xu, Jianlan Luo, You Liang Tan, Lawrence Yunliang Chen, Pannag Sanketi, Quan Vuong, Ted Xiao, Dorsa Sadigh, Chelsea Finn, Sergey Levine

2024-05-20Robot Manipulation
PaperPDF

Abstract

Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a little in-domain data, yet generalize broadly. However, to be widely applicable across a range of robotic learning scenarios, environments, and tasks, such policies need to handle diverse sensors and action spaces, accommodate a variety of commonly used robotic platforms, and finetune readily and efficiently to new domains. In this work, we aim to lay the groundwork for developing open-source, widely applicable, generalist policies for robotic manipulation. As a first step, we introduce Octo, a large transformer-based policy trained on 800k trajectories from the Open X-Embodiment dataset, the largest robot manipulation dataset to date. It can be instructed via language commands or goal images and can be effectively finetuned to robot setups with new sensory inputs and action spaces within a few hours on standard consumer GPUs. In experiments across 9 robotic platforms, we demonstrate that Octo serves as a versatile policy initialization that can be effectively finetuned to new observation and action spaces. We also perform detailed ablations of design decisions for the Octo model, from architecture to training data, to guide future research on building generalist robot models.

Results

TaskDatasetMetricValueModel
Robot ManipulationSimplerEnv-Google RobotVariant Aggregation0.012Octo-Base
Robot ManipulationSimplerEnv-Google RobotVariant Aggregation-Move Near0.031Octo-Base
Robot ManipulationSimplerEnv-Google RobotVariant Aggregation-Open/Close Drawer0.011Octo-Base
Robot ManipulationSimplerEnv-Google RobotVariant Aggregation-Pick Coke Can0.006Octo-Base
Robot ManipulationSimplerEnv-Google RobotVisual Matching0.168Octo-Base
Robot ManipulationSimplerEnv-Google RobotVisual Matching-Move Near0.042Octo-Base
Robot ManipulationSimplerEnv-Google RobotVisual Matching-Open/Close Drawer0.227Octo-Base
Robot ManipulationSimplerEnv-Google RobotVisual Matching-Pick Coke Can0.17Octo-Base
Robot ManipulationSimplerEnv-Widow XAverage0.3Octo-Small
Robot ManipulationSimplerEnv-Widow XPut Carrot on Plate0.097Octo-Small
Robot ManipulationSimplerEnv-Widow XPut Spoon on Towel0.472Octo-Small
Robot ManipulationSimplerEnv-Widow XStack Green Block on Yellow Block0.042Octo-Small
Robot ManipulationSimplerEnv-Widow XAverage0.16Octo-Base
Robot ManipulationSimplerEnv-Widow XPut Carrot on Plate0.083Octo-Base
Robot ManipulationSimplerEnv-Widow XPut Spoon on Towel0.125Octo-Base

Related Papers

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge2025-07-06Geometry-aware 4D Video Generation for Robot Manipulation2025-07-01CapsDT: Diffusion-Transformer for Capsule Robot Manipulation2025-06-19Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation2025-06-18SENIOR: Efficient Query Selection and Preference-Guided Exploration in Preference-based Reinforcement Learning2025-06-17What Matters in Learning from Large-Scale Datasets for Robot Manipulation2025-06-16Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success2025-06-12BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models2025-06-09