19,997 machine learning datasets
19,997 dataset results
In order to create the TED-talks dataset, 3,035 YouTube videos were downloaded using the "TED talks" query. From these initial candidates, videos in which the upper part of the person is visible for at least 64 frames, and the height of the person bounding box was at least 384 pixels were selected. Static videos were manually filtered out and videos in which a person is doing something other than presenting.
JVS is a Japanese multi-speaker voice corpus which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices.
The dataset contains 21 full-HD videos, each around 1 hr long, captured at six different locations. Vehicles in the videos (20 865 instances in total) are annotated with the precise speed measurements from optical gates using LiDAR and verified with several reference GPS tracks. The dataset is available for download and it contains the videos and metadata (calibration, lengths of features in image, annotations, and so on) for future comparison and evaluation.
CirCor DigiScope is currently the largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process 215780 heart sounds have been manually annotated. Each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading and quality.
A large set of images of cats and dogs.
DialFact is a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. There are three sub-tasks in DialFact: 1) Verifiable claim detection task distinguishes whether a response carries verifiable factual information; 2) Evidence retrieval task retrieves the most relevant Wikipedia snippets as evidence; 3) Claim verification task predicts a dialogue response to be supported, refuted, or not enough information.
A video dataset for benchmarking upsampling methods. Inter4K contains 1,000 ultra-high resolution videos with 60 frames per second (fps) from online resources. The dataset provides standardized video resolutions at ultra-high definition (UHD/4K), quad-high definition (QHD/2K), full-high definition (FHD/1080p), (standard) high definition (HD/720p), one quarter of full HD (qHD/520p) and one ninth of a full HD (nHD/360p). We use frame rates of 60, 50, 30, 24 and 15 fps for each resolution. Based on this standardization, both super-resolution and frame interpolation tests can be performed for different scaling sizes ($\times 2$, $\times 3$ and $\times 4$). In this paper, we use Inter4K to address frame upsampling and interpolation. Inter4K provides both standardized UHD resolution and 60 fps for all of videos by also containing a diverse set of 1,000 5-second videos. Differences between scenes originate from the equipment (e.g., professional 4K cameras or phones), lighting conditions, vari
Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.
GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
Large-scale single-object tracking dataset, containing 108 sequences with a total length of 1.5 hours. FE108 provides ground truth annotations on both the frame and event domain. The annotation frequency is up to 40Hz and 240Hz for the frame and event domains, respectively. FE108 is the largest event-frame-based dataset for single object tracking, and also offers the highest annotation frequency in the event domain.
L3DAS22: MACHINE LEARNING FOR 3D AUDIO SIGNAL PROCESSING This dataset supports the L3DAS22 IEEE ICASSP Gand Challenge. The challenge is supported by a Python API that facilitates the dataset download and preprocessing, the training and evaluation of the baseline models and the results submission.
A dataset of Joint Reasoning for Temporal and Causal Relations
This is a dataset for a super-resolution task. The dataset contains 480x270 videos that were decoded with 6 different bitrates (100 - 4000 kbps) using 5 different codecs (H.264, H.265, H.266, AV1, and AVS3 standards). The dataset contains indoor and outdoor videos as well as animation. All videos have low SI/TI values and simple textures. It was made to minimize compression artifacts that may occur to make restoration of details possible.
the 160x160 subset of the GasHisSDB dataset.
SynWoodScape is a synthetic version of the surround-view dataset covering many of its weaknesses and extending it. WoodScape comprises four surround-view cameras and nine tasks, including segmentation, depth estimation, 3D bounding box detection, and a novel soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for the fisheye camera instead of using naive rectification.
RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance them on top of one another.
FaceVerse-High Quality 3D Face Dataset contains 2,688 high-quality head scans (21 expressions from 128 identities) captured by a dense DLSR rig. For each scan, we provide the 3D model (.obj), the corresponding texture map (.jpeg) and the FaceVerse fitted model (.ply) with the same topology.
Prophesee’s GEN1 Automotive Detection Dataset is the largest Event-Based Dataset to date.
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.
The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across 8 different categories. Two tasks will be available for participation: 1) classify dermoscopic images without meta-data, and 2) classify images with additional available meta-data.