RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath, Igor Mordatch, Ofir Nachum, Carolina Parada, Jodilyn Peralta, Emily Perez, Karl Pertsch, Jornell Quiambao, Kanishka Rao, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Kevin Sayed, Jaspiar Singh, Sumedh Sontakke, Austin Stone, Clayton Tan, Huong Tran, Vincent Vanhoucke, Steve Vega, Quan Vuong, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich

2022-12-13Robot Manipulation

Paper PDF Code(official)

Abstract

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io

Results

Task	Dataset	Metric	Value	Model
Robot Manipulation	CALVIN	avg. sequence length (D to D)	0.9	RT-1
Robot Manipulation	SimplerEnv-Google Robot	Variant Aggregation	0.397	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Variant Aggregation-Move Near	0.323	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Variant Aggregation-Open/Close Drawer	0.294	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Variant Aggregation-Pick Coke Can	0.49	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Visual Matching	0.534	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Visual Matching-Move Near	0.317	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Visual Matching-Open/Close Drawer	0.597	RT-1-X
Robot Manipulation	SimplerEnv-Google Robot	Visual Matching-Pick Coke Can	0.567	RT-1-X
Robot Manipulation	SimplerEnv-Widow X	Average	0.011	RT-1-X
Robot Manipulation	SimplerEnv-Widow X	Put Carrot on Plate	0.042	RT-1-X

RT-1: Robotics Transformer for Real-World Control at Scale

Abstract

Results

Related Papers

RT-1: Robotics Transformer for Real-World Control at Scale

Abstract

Results

Related Papers