19,997 machine learning datasets
19,997 dataset results
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
TAPVid-3D is a dataset and benchmark for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). The dataset consists of 4,000+ real-world videos and 2.1 million metric 3D point trajectories, spanning a variety of object types, motion patterns, and indoor and outdoor environments.
Tiny ImageNet-R is a subset of the ImageNet-R dataset by Hendrycks et al. ("The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization") with 10,456 images spanning 62 of the 200 Tiny ImageNet dataset. It is a test set achieved by collecting images of joint classes of Tiny ImageNet and ImageNet. The resized images of size 64×64 contain art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes. For further information on ImageNet-R visit the original GitHub repository of ImageNet-R.
MedMNIST-C is an open-source data set collection comprising algorithmically generated corruptions applied to the test sets of the MedMNIST collection following the concept of ImageNet-C. To maintain the integrity of the medical data, we have excluded any weather-dependent corruptions (“Snow”, “Frost”, “Fog”). Hence, each data set in the MedMNIST-C collection comprises 16 different corruptions (12 test corruptions and 4 validation corruptions) spanning 5 severity levels. For further information on the corruptions visit the original GitHub repository of ImageNet-C.
This dataset comprehends the 3D building information model (in IFC and Revit formats), manually elaborated based on the terrestrial laser scanner of the sequence 2 of ConSLAM, and the refined ground truth (GT) poses (in TUM format) of sessions 2, 3, 4, and 5 of the open-access ConSLAM dataset (which provides camera, LiDAR, and IMU measurements).
The Lund University Vision, Radio, and Audio (LuViRA) positioning dataset consists of 89 trajectories that are recorded in the Lund University Humanities Lab's Motion Capture (Mocap) Studio using a MIR200 robot as the targeted platform. Each trajectory contains data from four different systems, vision, radio, audio and a ground truth system that can provide within 0.5mm localization accuracy. A Motion Capture (Mocap) system in the environment is used as the ground truth system, which provides 3D or 6DoF tracking of a camera, a single antenna and a speaker. These targets are mounted on top of the MIR200 robot and put in motion. 3D positions of the 11 static microphones are also provided.
The HEterogeneous Materials and Elastic Waves with Source variability in 3D (HEMEWS-3D) database comprises 30,000 simulations of elastic wave propagation in 3D geological domains. Each domain is parametrized by a different geological model built from a random arrangement of layers augmented by random fields that represent heterogeneities. Elastic waves originate from a randomly located pointwise source parametrized by a random moment tensor. For each simulation, ground motion is synthesized at the surface by a grid of virtual sensors. The high frequency of waveforms ($f_{max}$ = 5 Hz) allows for extensive analyses of surface ground motion.
The first dataset contains annotated natural language queries (i.e. Mandarin) with their Cypher equivalent. It is made up of: - A Neo4j database - 10000 pairs of Text-Cypher queries
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The ARKitFace dataset is established by this work in order to train and evaluate both 3D face shape and 6DoF in the setting of perspective projection. A total of 500 volunteers, aged 9 to 60, are invited to record the dataset. They sit in a random environment, and the 3D acquisition equipment is fixed in front of them, with a distance ranging from about 0.3m to 0.9m. Each subject is asked to perform 33 specific expressions with two head movements (from looking left to looking right / from looking up to looking down). 3D acquisition equipment we used is an iPhone 11. The shape and location of human face are tracked by structured light sensor. The triangle mesh and 6DoF information of the RGB images are obtained by built-in ARKit toolbox. The triangle mesh is made up of 1,220 vertices and 2,304 triangles. In total, 902,724 2D facial images (resolution 1280×720 or 1440×1280) with ground-truth 3D mesh and 6DoF pose annotation are collected.
SA-Det-100k is a large-scale class-agnostic object detection dataset for Research Purposes only. The dataset is based on a subset of SA-1B (see LICENSE), and all objects belong to the same category objects. Because it contains a large number of scenarios but does not provide class-specific annotations, we believe it may be a good choice to pre-training models for a variety of downstream tasks with different categories. The dataset contains about 100k images, and each image is resized using opencv-python so that the larger one of their height and width is 1333, which is consistent with the data augmentation commonly used to train COCO. For example project based on this dataset, please see Relation-DETR (https://github.com/xiuqhou/Relation-DETR).
The Clarity Speech Corpus is a forty speaker British English speech dataset. The corpus was created for the purpose of running listening tests to gauge speech intelligibility and quality in the Clarity Project, which has the goal of advancing speech signal processing by hearing aids through a series of challenges. The dataset is suitable for machine learning and other uses in speech and hearing technology, acoustics and psychoacoustics. The data comprises recordings of approximately 10,000 sentences drawn from the British National Corpus (BNC) with suitable length, words and grammatical construction for speech intelligibility testing. The collection process involved the selection of a subset of BNC sentences, the recording of these produced by 40 British English speakers, and the processing of these recordings to create individual sentence recordings with associated prompts and metadata.
The availability of high-quality datasets play a crucial role in advancing research and development especially, for safety critical and autonomous systems. In this paper, we present AssistTaxi, a comprehensive novel dataset which is a collection of images for runway and taxiway analysis. The dataset comprises of more than 300,000 frames of diverse and carefully collected data, gathered from Melbourne (MLB) and Grant-Valkaria (X59) general aviation airports. The importance of AssistTaxi lies in its potential to advance autonomous operations, enabling researchers and developers to train and evaluate algorithms for efficient and safe taxiing.
Introduced originally by Xiaohan Yu, Yang Zhao, Yongsheng Gao, Xiaohui Yuan, Shengwu Xiong (2021). Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance. In ICCV 2021.
The SynthSOD dataset contains more than 47 hours of multitrack music obtained by synthesizing orchestra and ensemble pieces from the Symbolic Orchestral Database (SOD) using Spitfire BBC Symphony Orchestra Professional Library. To synthesize the MIDI files from the SOD, we needed to fix the original files into the General MIDI standard, select a subsect of files that fitted into our requirements (e.g., containing only instruments that we could synthesize), and develop a new system to generate musically-motivated random annotations about tempo, dynamic, and articulation.
The Industrial Objects in Varied Contexts (InVar) Dataset was internally produced by our team and contains 100 objects in 20800 total images (208 images per class). The objects consist of common automotive, machine and robotics lab parts. Each class contains 4 sub-categories (52 images each) with different attributes and visual complexities.
The first benchmark comprising 473 prompts designed to assess the ability of LLMs to resist malicious code generation.
WeatherKITTI is currently the most realistic all-weather simulated enhancement of the KITTI dataset. The WeatherKITTI dataset simulates the three weather conditions that most affect visual perception in real-world scenarios: rain, snow, and fog. Each type of weather has two intensity levels: severe and extremely severe. Together with clear weather, these two levels create a weather-enhanced dataset featuring three levels and seven weather scenarios.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Lifespan HCP Release 2.0 includes cross-sectional visit 1 (V1) preprocessed structural and functional imaging data, unprocessed V1 imaging data for all included modalities (structural, high-res hippocampal T2, resting state fMRI, task fMRI, diffusion, and ASL), and non-imaging demographic and behavioral assessment data from 725 HCP-Aging (HCP-A, ages 36-100+) healthy participants (22+ TB of data).