148 machine learning datasets
148 dataset results
Benchmark for physical reasoning that contains a set of simple classical mechanics puzzles in a 2D physical environment. The benchmark is designed to encourage the development of learning algorithms that are sample-efficient and generalize well across puzzles.
SMACv2 (StarCraft Multi-Agent Challenge v2) is a new version of the benchmark where scenarios are procedurally generated and require agents to generalise to previously unseen settings (from the same distribution) during evaluation.
The StarCraft II Learning Environment (S2LE) is a reinforcement learning environment based on the game StarCraft II. The environment consists of three sub-components: a Linux StarCraft II binary, the StarCraft II API and PySC2. The StarCraft II API allows programmatic control of StarCraft II. It can be used to start a game, get observations, take actions, and review replays. PyC2 is a Python environment that wraps the StarCraft II API to ease the interaction between Python reinforcement learning agents and StarCraft II. It defines an action and observation specification, and includes a random agent and a handful of rule-based agents as examples. It also includes some mini-games as challenges and visualization tools to understand what the agent can see and do.
The NetHack Learning Environment (NLE) is a Reinforcement Learning environment based on NetHack 3.6.6. It is designed to provide a standard reinforcement learning interface to the game, and comes with tasks that function as a first step to evaluate agents on this new environment. NetHack is one of the oldest and arguably most impactful videogames in history, as well as being one of the hardest roguelikes currently being played by humans. It is procedurally generated, rich in entities and dynamics, and overall an extremely challenging environment for current state-of-the-art RL agents, while being much cheaper to run compared to other challenging testbeds. Through NLE, the authors wish to establish NetHack as one of the next challenges for research in decision making and machine learning.
Gibson is an opensource perceptual and physics simulator to explore active and real-world perception. The Gibson Environment is used for Real-World Perception Learning.
MINOS is a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. MINOS leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites.
Obstacle Tower is a high fidelity, 3D, 3rd person, procedurally generated environment for reinforcement learning. An agent playing Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other benchmarks such as the Arcade Learning Environment, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment.
Physion is a visual and physical prediction benchmark to measure the performance of machine learning models on making predictions about commonplace real world physical events. In realistically simulating a wide variety of physical phenomena -- rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion -- this dataset presents a more comprehensive challenge than existing benchmarks. Moreover, the dataset also contains human responses for the stimuli so that model predictions can be directly compared to human judgments.
Bridge Data is a large multi-domain and multi-task dataset, with 7,200 demonstrations constituting 71 tasks across 10 environments. The dataset is collected using a low-cost yet versatile 6-DoF WidowX250 robot arm and contains 7,200 demonstrations of a robot performing 71 kitchen tasks across 10 environments with varying lighting, robot positions, and backgrounds. It can be used to boosting generalization of robotic skills and empirically study how it can improve the learning of new tasks in new environments.
PasticineLab is a differentiable physics benchmark, which includes a diverse collection of soft body manipulation tasks. In each task, the agent uses manipulators to deform the plasticine into the desired configuration. The underlying physics engine supports differentiable elastic and plastic deformation using the DiffTaichi system, posing many under-explored challenges to robotic agents.
RTMV is a large-scale synthetic dataset for novel view synthesis consisting of ∼300k images rendered from nearly 2000 complex scenes using high-quality ray tracing at high resolution (1600 × 1600 pixels). The dataset is orders of magnitude larger than existing synthetic datasets for novel view synthesis, thus providing a large unified benchmark for both training and evaluation. Using 4 distinct sources of high-quality 3D meshes, the scenes of our dataset exhibit challenging variations in camera views, lighting, shape, materials, and textures.
SUMMIT is a high-fidelity simulator that facilitates the development and testing of crowd-driving algorithms. By leveraging the open-source OpenStreetMap map database and a heterogeneous multi-agent motion prediction model developed in our earlier work, SUMMIT simulates dense, unregulated urban traffic for heterogeneous agents at any worldwide locations that OpenStreetMap supports. SUMMIT is built as an extension of CARLA and inherits from it the physical and visual realism for autonomous driving simulation. SUMMIT supports a wide range of applications, including perception, vehicle control, planning, and end-to-end learning.
RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance them on top of one another.
LANI is a 3D navigation environment and corpus, where an agent navigates between landmarks. Lani contains 27,965 crowd-sourced instructions for navigation in an open environment. Each datapoint includes an instruction, a human-annotated ground-truth demonstration trajectory, and an environment with various landmarks and lakes. The dataset train/dev/test split is 19,758/4,135/4,072. Each environment specification defines placement of 6–13 landmarks within a square grass field of size 50m×50m.
MineRL BASALT is an RL competition on solving human-judged tasks. The tasks in this competition do not have a pre-defined reward function: the goal is to produce trajectories that are judged by real humans to be effective at solving a given task.
Mario AI was a benchmark environment for reinforcement learning. The gameplay in Mario AI, as in the original Nintendo’s version, consists in moving the controlled character, namely Mario, through two-dimensional levels, which are viewed sideways. Mario can walk and run to the right and left, jump, and (depending on which state he is in) shoot fireballs. Gravity acts on Mario, making it necessary to jump over cliffs to get past them. Mario can be in one of three states: Small, Big (can kill enemies by jumping onto them), and Fire (can shoot fireballs).
A rich, extensible and efficient environment that contains 45,622 human-designed 3D scenes of visually realistic houses, ranging from single-room studios to multi-storied houses, equipped with a diverse set of fully labeled 3D objects, textures and scene layouts, based on the SUNCG dataset (Song et.al.)
The StarCraft Multi-Agent Challenges+ requires agents to learn completion of multi-stage tasks and usage of environmental factors without precise reward functions. The previous challenges (SMAC) recognized as a standard benchmark of Multi-Agent Reinforcement Learning are mainly concerned with ensuring that all agents cooperatively eliminate approaching adversaries only through fine manipulation with obvious reward functions. This challenge, on the other hand, is interested in the exploration capability of MARL algorithms to efficiently learn implicit multi-stage tasks and environmental factors as well as micro-control. This study covers both offensive and defensive scenarios. In the offensive scenarios, agents must learn to first find opponents and then eliminate them. The defensive scenarios require agents to use topographic features. For example, agents need to position themselves behind protective structures to make it harder for enemies to attack.
CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.
Griddly is an environment for grid-world based research. Griddly provides a highly optimized game state and rendering engine with a flexible high-level interface for configuring environments. Not only does Griddly offer simple interfaces for single, multi-player and RTS games, but also multiple methods of rendering, configurable partial observability and interfaces for procedural content generation.