19,997 machine learning datasets
19,997 dataset results
CC-Stories (or STORIES) is a dataset for common sense reasoning and language modeling. It was constructed by aggregating documents from the CommonCrawl dataset that has the most overlapping n-grams with the questions in commonsense reasoning tasks. The top 1.0% of highest ranked documents is chosen as the new training corpus.
The MM-WHS 2017 dataset is a dataset for multi-modality whole heart segmentation. It provides 20 labeled and 40 unlabeled CT volumes, as well as 20 labeled and 40 unlabeled MR volumes. In total there are 120 multi-modality cardiac images acquired in a real clinical environment.
Retrieval Question-Answering (ReQA) benchmark tests a model’s ability to retrieve relevant answers efficiently from a large set of documents.
FERET-Morphs is a dataset of morphed faces selected from the publicly available FERET dataset 1.
Bigram Relatedness Dataset (BiRD) is a large, fine-grained, bigram relatedness dataset, using a comparative annotation technique called Best Worst Scaling. Each of BiRD's 3,345 English term pairs involves at least one bigram. BiRD is made freely available to foster further research on how meaning can be represented and how meaning can be composed.
Rainbow is multi-task benchmark for common-sense reasoning that uses different existing QA datasets: aNLI, Cosmos QA, HellaSWAG. Physical IQa, Social IQa, WinoGrande.
The task of Room Rearrangement consists on an agent exploring a room and recording objects' initial configurations. The agent is removed and the poses and states (e.g., open/closed) of some objects in the room are changed. The agent must restore the initial configurations of all objects in the room.
Action-Based Conversations Dataset (ABCD) is a goal-oriented dialogue fully-labeled dataset with over 10K human-to-human dialogues containing 55 distinct user intents requiring unique sequences of actions constrained by policies to achieve task success. The dataset is proposed to study customer service dialogue systems in more realistic settings.
The Caltech Mouse Social Interactions (CalMS21) dataset is a multi-agent dataset from behavioral neuroscience. The dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021.
XFORMAL is a multilingual formal style transfer benchmark of multiple formal reformulations of informal text in Brazilian Portuguese, French, and Italian.
We release expert-made scribble annotations for the medical ACDC dataset 1. The released data must be considered as extending the original ACDC dataset. The ACDC dataset contains cardiac MRI images, paired with hand-made segmentation masks. It is possible to use the segmentation masks provided in the ACDC dataset to evaluate the performance of methods trained using only scribble supervision.
ChestX-Det is a chest X-Ray dataset with instance-level annotations (boxes and masks). ChestX-Det is a subset of the public dataset NIH ChestX-ray14. It contains ~3500 images of 13 common disease categories labeled by three board-certified radiologists.
MoGaze is a dataset of full-body motion for everyday manipulation tasks, which includes 1) long sequences of manipulation tasks, 2) the 3D model of the workspace geometry, and 3) eye-gaze. The motion data was captured using a traditional motion capture system based on reflective markers. The eye-gaze was captured using a wearable pupil-tracking device.
The Vehicular Reference Misbehavior (VeReMi) dataset, is a dataset for the evaluation of misbehavior detection mechanisms for VANETs (vehicular networks). This dataset consists of message logs of on-board units, including a labelled ground truth, generated from a simulation environment. The dataset includes malicious messages intended to trigger incorrect application behavior, which is what misbehavior detection mechanisms aim to prevent. The initial dataset contains a number of simple attacks: the idea of this dataset release is not just to provide a baseline for the comparison of detection mechanisms, but also to serve as a starting point for more complex attacks.
e-ViL is a benchmark for explainable vision-language tasks. e-ViL spans across three datasets of human-written NLEs (natural language explanations), and provides a unified evaluation framework that is designed to be re-usable for future works.
ATD-12K is a large-scale animation triplet dataset, which comprises 12,000 triplets(train10k,test2k) by manually inspect and the test2k with rich annotations, including levels of difficulty, the Regions of Interest (RoIs) on movements, and tags on motion categories
OpenMEVA is a benchmark for evaluating open-ended story generation metrics. OpenMEVA provides a comprehensive test suite to assess the capabilities of metrics, including (a) the correlation with human judgments, (b) the generalization to different model outputs and datasets, (c) the ability to judge story coherence, and (d) the robustness to perturbations. To this end, OpenMEVA includes both manually annotated stories and auto-constructed test examples.
Satellite images are snapshots of the Earth surface. We propose to forecast them. We frame Earth surface forecasting as the task of predicting satellite imagery conditioned on future weather. EarthNet2021 is a large dataset suitable for training deep neural networks on the task. It contains Sentinel~2 satellite imagery at $20$~m resolution, matching topography and mesoscale ($1.28$~km) meteorological variables packaged into $32000$ samples. Additionally we frame EarthNet2021 as a challenge allowing for model intercomparison. Resulting forecasts will greatly improve ($>\times50$) over the spatial resolution found in numerical models. This allows localized impacts from extreme weather to be predicted, thus supporting downstream applications such as crop yield prediction, forest health assessments or biodiversity monitoring. Find data, code, and how to participate at www.earthnet.tech.
NeoRL is a collection of environments and datasets for offline reinforcement learning with a special focus on real-world applications. The design follows real-world properties like the conservative of behavior policies, limited amounts of data, high-dimensional state and action spaces, and the highly stochastic nature of the environments. The datasets include robotics, industrial control, finance trading and city management tasks with real-world properties, containing three-level sizes of dataset, three-level quality of data to mimic the dataset we will meet in offline RL scenarios. Users can use the dataset to evaluate offline RL algorithms with near real-world application nature.