Datasets

19,997 machine learning datasets

19,997 dataset results

TED-talks

In order to create the TED-talks dataset, 3,035 YouTube videos were downloaded using the "TED talks" query. From these initial candidates, videos in which the upper part of the person is visible for at least 64 frames, and the height of the person bounding box was at least 384 pixels were selected. Static videos were manually filtered out and videos in which a person is doing something other than presenting.

13 papers8 benchmarksVideos

JVS

JVS is a Japanese multi-speaker voice corpus which contains voice data of 100 speakers in three styles (normal, whisper, and falsetto). The corpus contains 30 hours of voice data including 22 hours of parallel normal voices.

13 papers0 benchmarksSpeech

BrnoCompSpeed

The dataset contains 21 full-HD videos, each around 1 hr long, captured at six different locations. Vehicles in the videos (20 865 instances in total) are annotated with the precise speed measurements from optical gates using LiDAR and verified with several reference GPS tracks. The dataset is available for download and it contains the videos and metadata (calibration, lengths of features in image, annotations, and so on) for future comparison and evaluation.

13 papers8 benchmarksVideos

CirCor DigiScope

CirCor DigiScope is currently the largest pediatric heart sound dataset. A total of 5282 recordings have been collected from the four main auscultation locations of 1568 patients, in the process 215780 heart sounds have been manually annotated. Each cardiac murmur has been manually annotated by an expert annotator according to its timing, shape, pitch, grading and quality.

13 papers7 benchmarks

Cats and Dogs

A large set of images of cats and dogs.

13 papers4 benchmarksImages

DialFact

DialFact is a testing benchmark dataset of 22,245 annotated conversational claims, paired with pieces of evidence from Wikipedia. There are three sub-tasks in DialFact: 1) Verifiable claim detection task distinguishes whether a response carries verifiable factual information; 2) Evidence retrieval task retrieves the most relevant Wikipedia snippets as evidence; 3) Claim verification task predicts a dialogue response to be supported, refuted, or not enough information.

13 papers0 benchmarksTexts

Inter4K

A video dataset for benchmarking upsampling methods. Inter4K contains 1,000 ultra-high resolution videos with 60 frames per second (fps) from online resources. The dataset provides standardized video resolutions at ultra-high definition (UHD/4K), quad-high definition (QHD/2K), full-high definition (FHD/1080p), (standard) high definition (HD/720p), one quarter of full HD (qHD/520p) and one ninth of a full HD (nHD/360p). We use frame rates of 60, 50, 30, 24 and 15 fps for each resolution. Based on this standardization, both super-resolution and frame interpolation tests can be performed for different scaling sizes ($\times 2$, $\times 3$ and $\times 4$). In this paper, we use Inter4K to address frame upsampling and interpolation. Inter4K provides both standardized UHD resolution and 60 fps for all of videos by also containing a diverse set of 1,000 5-second videos. Differences between scenes originate from the equipment (e.g., professional 4K cameras or phones), lighting conditions, vari

13 papers0 benchmarksVideos

Yelp-Fraud (Multi-relational Graph Dataset for Yelp Spam Review Detection)

Yelp-Fraud is a multi-relational graph dataset built upon the Yelp spam review dataset, which can be used in evaluating graph-based node classification, fraud detection, and anomaly detection models.

13 papers10 benchmarksGraphs

GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

13 papers1 benchmarksSpeech, Texts

FE108

Large-scale single-object tracking dataset, containing 108 sequences with a total length of 1.5 hours. FE108 provides ground truth annotations on both the frame and event domain. The annotation frequency is up to 40Hz and 240Hz for the frame and event domains, respectively. FE108 is the largest event-frame-based dataset for single object tracking, and also offers the highest annotation frequency in the event domain.

13 papers2 benchmarks

L3DAS22

L3DAS22: MACHINE LEARNING FOR 3D AUDIO SIGNAL PROCESSING This dataset supports the L3DAS22 IEEE ICASSP Gand Challenge. The challenge is supported by a Python API that facilitates the dataset download and preprocessing, the training and evaluation of the baseline models and the results submission.

13 papers0 benchmarksAudio

TCR (Temporal and Causal Reasoning dataset)

A dataset of Joint Reasoning for Temporal and Causal Relations

13 papers0 benchmarks

MSU Super-Resolution for Video Compression

This is a dataset for a super-resolution task. The dataset contains 480x270 videos that were decoded with 6 different bitrates (100 - 4000 kbps) using 5 different codecs (H.264, H.265, H.266, AV1, and AVS3 standards). The dataset contains indoor and outdoor videos as well as animation. All videos have low SI/TI values and simple textures. It was made to minimize compression artifacts that may occur to make restoration of details possible.

13 papers66 benchmarks

160_subset (160x160 subset)

the 160x160 subset of the GasHisSDB dataset.

13 papers0 benchmarks

SynWoodScape (Synthetic Surround-view Fisheye Camera Dataset for Autonomous Driving)

SynWoodScape is a synthetic version of the surround-view dataset covering many of its weaknesses and extending it. WoodScape comprises four surround-view cameras and nine tasks, including segmentation, depth estimation, 3D bounding box detection, and a novel soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for the fisheye camera instead of using naive rectification.

13 papers0 benchmarksImages

RGB-Stacking

RGB-Stacking is a benchmark for vision-based robotic manipulation. The robot is trained to learn how to grasp objects and balance them on top of one another.

13 papers13 benchmarksEnvironment

FaceVerse (FaceVerse-High Quality 3D Face Dataset)

FaceVerse-High Quality 3D Face Dataset contains 2,688 high-quality head scans (21 expressions from 128 identities) captured by a dense DLSR rig. For each scan, we provide the 3D model (.obj), the corresponding texture map (.jpeg) and the FaceVerse fitted model (.ply) with the same topology.

13 papers0 benchmarks

GEN1 Detection (Prophesee GEN1 Automotive Detection Dataset)

Prophesee’s GEN1 Automotive Detection Dataset is the largest Event-Based Dataset to date.

13 papers10 benchmarksVideos

ProsocialDialog

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them.

13 papers2 benchmarksDialog, Texts

ISIC 2019

The goal for ISIC 2019 is classify dermoscopic images among nine different diagnostic categories.25,331 images are available for training across 8 different categories. Two tasks will be available for participation: 1) classify dermoscopic images without meta-data, and 2) classify images with additional available meta-data.

13 papers4 benchmarks

PreviousPage 134 of 1000Next