19,997 machine learning datasets
19,997 dataset results
PETRAW data set was composed of 150 sequences of peg transfer training sessions. The objective of the peg transfer session is to transfer 6 blocks from the left to the right and back. Each block must be extracted from a peg with one hand, transferred to the other hand, and inserted in a peg at the other side of the board. All cases were acquired by a non-medical expert on the LTSI Laboratory from the University of Rennes. The data set was divided into a training data set composed of 90 cases and a test data set composed of 60 cases. A case was composed of kinematic data, a video, semantic segmentation of each frame, and workflow annotation.
MeLa BitChute is a near-complete dataset of over 3M videos from 61K channels over 2.5 years (June 2019 to December 2021) from the social video hosting platform BitChute, a commonly used alternative to YouTube. Additionally, the dataset includes a variety of video-level metadata, including comments, channel descriptions, and views for each video.
Taillar's permutation flow shop, the job shop, and the open shop scheduling problems instances: We restrict ourselves to basic problems: the processing times are fixed, there are neither set-up times nor due dates nor release dates, etc. Then, the objective is the minimization of the makespan.
Synthetic dataset for polyp segmentation. It is the first dataset generated using zero annotations from medical professionals. The dataset is composed of 20 000 images with a resolution of 500×500. SynthColon additionally includes realistic colon images generated with a CycleGAN and the Kvasir training set images. Synth-Colon can also be used for the colon depth estimation task because it provides depth and 3D information for each image. . In summary, Synth-Colon includes: – Synthetic images of the colon and one polyp. – Masks indicating the location of the polyp. – Realistic images of the colon and polyps. Generated using CycleGAN and the Kvasir dataset. – Depth images of the colon and polyp. – 3D meshes of the colon and polyp in OBJ format.
Contains squared blocks of 48×48 pixels including 13 Sentinel-2 bands. Each 480-m block was mined from a large geographical area of interest (102 km × 42 km) located north of Munich, Germany.
MuLD (Multitask Long Document Benchmark) is a set of 6 NLP tasks where the inputs consist of at least 10,000 words. The benchmark covers a wide variety of task types including translation, summarization, question answering, and classification. Additionally there is a range of output lengths from a single word classification label all the way up to an output longer than the input text.
The MCVQA dataset consists of 248, 349 training questions and 121, 512 validation questions for real images in Hindi and Code-mixed. For each Hindi question, we also provide its 10 corresponding answers in Hindi.
KMIR (Knowledge Memorization, Identification, and Reasoning) is a benchmark that covers 3 types of knowledge, including general knowledge, domain-specific knowledge, and commonsense, and provides 184,348 well-designed questions. KMIR can be used for evaluating knowledge memorization, identification and reasoning abilities of language models.
Visible and thermal images have been acquired using a thermographic camera TESTO 880-3, equipped with an uncooled detector with a spectral sensitivity range from 8 to 14 μm and provided with a germanium optical lens, and an approximate cost of 8.000 EUR. For the NIR a customized Logitech Quickcam messenger E2500 has been used, provided with a Silicon based CMOS image sensor with a sensibility to the overall visible spectrum and the half part of the NIR (until 1.000 nm approximately) with a cost of approx. 30 EUR. We have replaced the default optical filter of this camera by a couple of Kodak daylight filters for IR interspersed between optical and sensor. They both have similar spectrum responses and are coded as wratten filter 87 and 87C, respectively. In addition, we have used a special purpose printed circuit board (PCB) with a set of 16 infrared leds (IRED) with a range of emission from 820 to 1.000 nm in order to provide the required illumination.
We created a new dataset, named DFDM, with 6,450 Deepfake videos generated by different Autoencoder models. Specifically, five Autoencoder models with variations in encoder, decoder, intermediate layer, and input resolution, respectively, have been selected to generate Deepfakes based on the same input. We have observed the visible but subtle visual differences among different Deepfakes, demonstrating the evidence of model attribution artifacts.
The Visual-Inertial-Dynamical (VID) dataset not only focuses on traditional six degrees of freedom (6-DOF) pose estimation, but also provides dynamical characteristics of the flight platform for external force perception or dynamics-aided estimation. The VID dataset contains hardware synchronized imagery and inertial measurements, with accurate ground truth trajectories for evaluating common visual-inertial estimators. Moreover, the proposed dataset highlights rotor speed and motor current measurements, control inputs, and ground truth 6-axis force data to evaluate external force estimation. To the best of our knowledge, the proposed VID dataset is the first public dataset containing visual-inertial and complete dynamical information in the real world for pose and external force evaluation.
We have cleaned the noisy IMDB-WIKI dataset using a constrained clustering method, resulting this new benchmark for in-the-wild age estimation. The annotations also allow this dataset to use for some other tasks, like gender classification and face recognition/verification. For more details, please refer to our FPAge paper.
Largest, first-of-its-kind, in-the-wild, fine-grained workout/exercise posture analysis dataset, covering three different exercises: BackSquat, Barbell Row, and Overhead Press. Seven different types of exercise errors are covered. Unlabeled data is also provided to facilitate self-supervised learning.
Symbolic Interactive Language Grounding (SILG) is a multi-environment benchmark which unifies a collection of diverse grounded language learning environments under a common interface. SILG consists of grid-world environments that require generalization to new dynamics, entities, and partially observed worlds (RTFM, Messenger, NetHack), as well as symbolic counterparts of visual worlds that require interpreting rich natural language with respect to complex scenes (ALFWorld, Touchdown). Together, these environments provide diverse grounding challenges in richness of observation space, action space, language specification, and plan complexity.
This dataset contains both infeasible and feasible data points as described in PRIME. The descriptors of the collected data are presented in the table below.
The Japanese Adversarial NLI (JaNLI) dataset is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Please see the paper Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference for details.
The NVIDIA HOPE datasets consist of RGBD images and video sequences with labeled 6-DoF poses for 28 toy grocery objects. The toy grocery objects are readily available for purchase and have ideal size and weight for robotic manipulation. 3D textured meshes for generating synthetic training data are provided.
The HOPE-Video dataset contains 10 video sequences (2038 frames) with 5-20 objects on a tabletop scene captured by a robot arm-mounted RealSense D415 RGBD camera. In each sequence, the camera is moved to capture multiple views of a set of objects in the robotic workspace. First COLMAP was applied to refine the camera poses (keyframes at 6~fps) provided by forward kinematics and RGB calibration from RealSense to Baxter's wrist camera. 3D dense point cloud was then generated via CascadeStereo (included for each sequence in 'scene.ply'). Ground truth poses for the HOPE objects models in the world coordinate system were annotated manually using the CascadeStereo point clouds. The following are provided for each frame:
PACS (Physical Audiovisual CommonSense) is the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains a total of 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. The dataset provides new opportunities to advance the research field of physical reasoning by bringing audio as a core component of this multimodal problem.
PRW-TBPS is a dataset for text based person search task.