19,997 machine learning datasets
19,997 dataset results
SemKITTI-DVPS is derived from SemanticKITTI dataset. SemanticKITTI dataset is based on the odometry dataset of the KITTI Vision benchmark. SemanticKITTI dataset provides perspective images and panoptic-labeled 3D point clouds. To convert it for DVPS, we project the 3D point clouds onto the image plane and name the derived dataset as SemKITTI-DVPS. SemKITTI-DVPS is distributed under Creative Commons Attribution-NonCommercial-ShareAlike license.
Breast MRI scans of 922 cancer patients from Duke University, with tumor bounding box annotations, clinical, imaging, and many other features, and more.
The Bacteria Biotope (BB) Task is part of the BioNLP Open Shared Tasks and meets the BioNLP-OST standards of quality, originality and data formats. Manually annotated data is provided for training, development and evaluation of information extraction methods. Tools for the detailed evaluation of system outputs are available. Support in performing linguistic processing are provided in the form of analyses created by various state-of-the art tools on the dataset texts.
The BioDiv dataset includes manually labeled tables for CTA and CEA from the biodiversity domain.
LSOIE is a large-scale OpenIE data converted from QA-SRL 2.0 in two domains, i.e., Wikipedia and Science. It is 20 times larger than the next largest human-annotated OpenIE data, and thus is reliable for fair evaluation. LSOIE provides n-ary OpenIE annotations and gold tuples are in the 〈ARG0, Relation, ARG1, . . . , ARGn〉 format. The dataset has two subsets ... namely LSOIE-wiki and LSOIE-sci, for comprehensive evaluation. LSOIE-wiki has 24,251 sentences and LSOIE-sci has 47,919 sentences.
We manually performed the task of Open Information Extraction on 5 short documents, elaborating tentative guidelines for the task, and resulting in a ground truth reference of 347 tuples. [section 1]
MO-Gymnasium is an open source Python library for developing and comparing multi-objective reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Essentially, the environments follow the standard Gymnasium API, but return vectorized rewards as numpy arrays.
X-Humans consists of 20 subjects (11 males, 9 females) with various clothing types and hair style. The collection of this dataset has been approved by an internal ethics committee. For each subject, we split the motion sequences into a training and test set. In total, there are 29,036 poses for training and 6,439 test poses. X-Humans also contains ground-truth SMPL-X parameters, obtained via a custom SMPL-X registration pipeline specifically designed to deal with low-resolution body parts.
DarkTrack2021 is a challenging nighttime UAV tracking benchmark, which contains 110 challenging sequences with over 100 K frames in total.
Similar to CVUSA and CVACT, the VIGOR dataset contains satellites and street imagery to match them to each other to find the location of the street imagery. For this purpose, data from 4 major American cities were used, namely San Francisco, New York, Seattle and Chicago. Unlike the previous datasets, there are two settings: The SAME-Area setting where images of all cities are available in training and validation split. Secondly, there is the CROSS area setting where training is done on two cities (New York, Seattle) and evaluation is done on Chicago and San Francisco. In addition, the dataset contains semi-positive images which are very close to an actual ground truth image and thus serve as a distraction for the matching task. In total, the dataset consists of 90,618 satellite images and 105,214 street images.
The OPRA Dataset was introduced in Demo2Vec: Reasoning Object Affordances From Online Videos (CVPR'18) for reasoning object affordances from online demonstration videos. It contains 11,505 demonstration clips and 2,512 object images scraped from 6 popular YouTube product review channels along with the corresponding affordance annotations. More details can be found on our https://sites.google.com/view/demo2vec/.
We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks (How to : change a car tire, perform CardioPulmonary resuscitation (CPR), jump cars, repot a plant and make coffee) that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experime
LLCM (Low-Light Cross-Modality) dataset is constructed to facilitate the study of low-light cross-modality person Re-ID task. It contains 46,767 person images of 1,064 identities, and each identity is captured by at least one RGB camera and one IR camera. The LLCM dataset is divided into a training set and a testing set at a ratio about 2:1. The training set contains 30,921 bounding boxes of 713 identities (16,946 bounding boxes are from the VIS modality and 13,975 bounding boxes are from the IR modality), and the testing set contains 13,909 bounding boxes of 351 identities (8,680 bounding boxes are from the VIS modality and 7,166 bounding boxes are from the IR modality).
LongForm dataset is created by leveraging English corpus examples with augmented instructions. It contains diverse set of human-written documents from existing corpora such as C4 and Wikipedia and generate instructions for the given documents via LLMs. The examples generated from raw text corpora via LLMs includes structured corpus examples, as well as various NLP task examples such as email writing, grammar error correction, story/poem generation, and text summarization.
The IMUPoser Dataset is a dataset for estimating body pose using IMUs already in devices that many users own -- namely smartphones, smartwatches, and earbuds.
The WebUI dataset contains 400K web UIs captured over a period of 3 months and cost about $500 to crawl. We grouped web pages together by their domain name, then generated training (70%), validation (10%), and testing (20%) splits. This ensured that similar pages from the same website must appear in the same split. We created four versions of the training dataset. Three of these splits were generated by randomly sampling a subset of the training split: Web-7k, Web-70k, Web-350k. We chose 70k as a baseline size, since it is approximately the size of existing UI datasets. We also generated an additional split (Web-7k-Resampled) to provide a small, higher quality split for experimentation. Web-7k-Resampled was generated using a class-balancing sampling technique, and we removed screens with possible visual defects (e.g., very small, occluded, or invisible elements). The validation and test split was always kept the same.
AmaSum is the largest abstractive opinion summarization dataset, consisting of more than 33,000 human-written summaries for Amazon products. Each summary is paired, on average, with more than 320 customer reviews. Summaries consist of verdicts, pros, and cons, see the example below.
RoNIN The RoNIN dataset contains over 40 hours of IMU sensor data from 100 human subjects with 3D ground-truth trajectories under natural human movements. This data set provides measurements of the accelerometer, gyroscope, magnetometer, and ground track, including direction and location in 327 sequences and at a frequency of 200 Hz. A two-device data collection protocol was developed. A harness was used to attach one phone to the body for 3D tracking, allowing subjects to control the other phone to collect IMU data freely. It should be noted that the ground track can only be obtained using the 3D tracker phone attached to the harness. In addition, the body trajectory is estimated instead of the IMU. RoNIN datset contians 42.7 hours of IMU-motion data over 276 sequences in 3 buildings, and collected from 100 human subjects with three Android devices.
YouTube-ASL is a large-scale, open-domain corpus of American Sign Language (ASL) videos and accompanying English captions drawn from YouTube. With ~1000 hours of videos and >2500 unique signers, YouTube-ASL is ~3x as large and has ~10x as many unique signers as the largest prior ASL dataset.
Dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations.