19,997 machine learning datasets
19,997 dataset results
Artie Bias Corpus is an open dataset for detecting demographic bias in speech applications.
The current OOD benchmark VQA-CP v2 only considers one type of shortcut (from question type to answer) and thus still cannot guarantee that the modelrelies on the intended solution rather than a solution specific to this shortcut. To overcome this limitation, VQA-VS proposes a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets. In addition, VQA-VS overcomes three troubling practices in the use of VQA-CP v2, e.g., selecting models using OOD test sets, and further standardize OOD evaluation procedure. VQA-VS provides a more rigorous and comprehensive testbed for shortcut learning in VQA.
KETOD (Knowledge-Enriched Task-Oriented Dialogue) is a dataset containing system responses designed for enriching task-oriented dialogues with chit-chat based on relevant entity knowledge. There are a total of 5,324 dialogues with enriched system responses.
The Industrial Metal Objects dataset is a diverse dataset of industrial metal objects. These objects are symmetric, textureless and highly reflective, leading to challenging conditions not captured in existing datasets. The dataset contains both real-world and synthetic multi-view RGB images with 6D object pose labels.
DIBCO 2013 is the international Document Image Binarization Contest organized in the context of ICDAR 2013 conference. The general objective of the contest is to identify current advances in document image binarization for both machine-printed and handwritten document images using evaluation performance measures that conform to document image analysis and recognition.
H-DIBCO 2012 is the International Document Image Binarization Competition which is dedicated to handwritten document images organized in conjunction with ICFHR 2012 conference. The objective of the contest is to identify current advances in handwritten document image binarization using meaningful evaluation performance measures.
This discourse treebank includes annotated instructional texts originally assembled at the Information Technology Research Institute, University of Brighton. This dataset contains 176 documents with an average of 32.6 EDUs for a total of 5744 EDUs and 53,250 words.
DBE-KT22 contains student exercise answering activities collected through an online practicing platform for the database systems course taught at the Australian National University within the period 2018-2021. The dataset is useful for research targeting students' knowledge tracing given historical sequences of exercise answering.
Crises such as the COVID-19 pandemic continuously threaten our world and emotionally affect billions of people worldwide in distinct ways. Understanding the triggers leading to people's emotions is of crucial importance. Social media posts can be a good source of such analysis, yet these texts tend to be charged with multiple emotions, with triggers scattering across multiple sentences. This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events that trigger each emotion. To support this goal, we introduce CovidET (Emotions and their Triggers during Covid-19), a dataset of ~1,900 English Reddit posts related to COVID-19, which contains manual annotations of perceived emotions and abstractive summaries of their triggers described in the post. We develop strong baselines to jointly detect emotions and summarize emotion triggers. Our analyses show that CovidET presents new challenges in emotion
CrossRE is a cross-domain benchmark for Relation Extraction (RE), which comprises six distinct text domains and includes multi-label annotations. The dataset includes meta-data collected during annotation, to include explanations and flags of difficult instances.
Avalon is a benchmark for generalization in Reinforcement Learning (RL). The benchmark consists of a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This benchmark setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks.
BioNLI is a dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.
Given two entities, generating a coherent sentence describing the relation between them.
SpaRTUN a dataset synthesized for transfer learning on spatial question answering (SQA) and spatial role labeling (SpRL).
CoP3D is a collection of crowd-sourced videos showing around 4,200 distinct pets. CoP2D is a large-scale datasets for benchmarking non-rigid 3D reconstruction "in the wild".
The dataset uses VGG-Sound which consists of 10s clips collected from YouTube for 309 sound classes. A subset of ‘temporally sparse’ classes is selected using the following procedure: 5–15 videos are randomly picked from each of the 309 VGGSound classes, and manually annotated as to whether audio-visual cues are only sparsely available. As a result, 12 classes are selected (∼4 %) or 6.5k and 0.6k videos in the train and test sets, respectively. The classes include 'dog barking', 'chopping wood', 'lion roaring', 'skateboarding' etc.
FinRL-Meta is universe of market environments for data-driven financial reinforcement learning. It follows the de facto standard of OpenAI Gym and the lean principle of software development. It has the following unique features of layered structure and extensibility, training-testing-trading pipeline and plug-and-play mode.
IGLU is a dataset designed for interactive grounded language understanding. It has a total of 8,136 single-turn data pairs of instructions and actions. Every single sample is randomly initialized with a pre-built structure from previously collected multi-turn interactions data.
NLPeer is a multidomain corpus of more than 5k papers and 11k review reports from five different venues. In addition to the new datasets of paper drafts, camera-ready versions and peer reviews from the NLP community, this dataset has a unified data representation, and augment previous peer review datasets to include parsed, structured paper representations, rich metadata and versioning information.
aiMotive dataset is a multimodal dataset for robust autonomous driving with long-range perception. The dataset consists of 176 scenes with synchronized and calibrated LiDAR, camera, and radar sensors covering a 360-degree field of view. The collected data was captured in highway, urban, and suburban areas during daytime, night, and rain and is annotated with 3D bounding boxes with consistent identifiers across frames.