19,997 machine learning datasets
19,997 dataset results
The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances. These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection. In this work, we propose the singing voice deepfake detection task. We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers. We provide a train/val/test split where the test sets include various scenarios. We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances. We find these sys
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
A dataset of videos synthetically degraded with Adobe After Effects to exhibit artifacts resembling those of real-world analog videotapes. The original high-quality videos belong to the Venice scene of the Harmonic dataset. The artifacts taken into account are: 1) tape mistracking; 2) VHS edge waving; 3) chroma loss along the scanlines; 4) tape noise; 5) undersaturation. The dataset comprises a total of 26,392 frames corresponding to 40 clips. The clips are randomly divided into training and test sets with a 75%-25% ratio.
The SSC dataset is a spiking version of the Speech Commands dataset release by Google (Speech Commands). SSC was generated using Lauscher, an artificial cochlea model. The SSC dataset consists of utterances recorded from a larger number of speakers under controlled conditions. Spikes were generated in 700 input channels, and it contains 35 word categories from a large number of speakers.
The first human-annotated corpus containing 26k spans on 11k comments
SOTAB V2 features two annotation tasks: Column Type Annotation (CTA) and Columns Property Annotation (CPA). The goal of the Column Type Annotation (CTA) task is to annotate the columns of a table using 82 types from the Schema.org vocabulary, such as telephone, Duration, Mass, or Organization. The goal of the Columns Property Annotation (CPA) task is to annotate pairs of table columns with one out of 108 Schema.org properties, such as gtin, startDate, priceValidUntil, or recipeIngredient. The benchmark consists of 45,834 tables annotated for CTA and 30,220 tables annotated for CPA originating from 55,511 different websites. The tables are split into training-, validation- and test sets for both tasks. The tables cover 17 popular Schema.org types including Product, LocalBusiness, Event, and JobPosting.
Evaluating human-scene interaction requires precise annotations for camera pose and scene geometry. However, such information is not available in existing datasets for egocentric human pose estimation. To solve this issue, we collected a new real-world dataset using a head-mounted fisheye camera combined with a calibration board. The ground truth scene geometry is obtained with the SfM method from a multi-view capture system with 120 synced 4K resolution cameras and the ground truth egocentric camera pose is obtained by localizing a calibration board rigidly attached to the egocentric camera. This dataset contains around 28K frames of two actors, performing various human-scene interacting motions such as sitting, reading a newspaper, and using a computer. This dataset is evenly split into training and testing splits. We fine-tuned the method on the training split before the evaluation. This dataset will be made publicly available and additional details of it are shown in the supplement
ConfAIde is a benchmark that evaluates the inference-time privacy implications of Language Models (LLMs) in interactive settings.
The MMVP-VLM (Multimodal Visual Patterns - Visual Language Models) Benchmark is specifically designed to systematically evaluate the performance of recent CLIP-based models in understanding and processing visual patterns. Let's delve into the details:
XSafety is the first multilingual safety benchmark specifically designed for Large Language Models (LLMs). The motivation behind creating XSafety stems from the global deployment of LLMs in practical applications. Here are the key points about XSafety:
The Sunnybrook Cardiac Data (SCD), also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction. Subset of this data set was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009. The whole complete data set is now available in the CAP database with public domain license.
NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal and temporal actions and understand the rich object interactions in daily activities. This page records LLMs for answer evaluation.
The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.
This dataset contains 70 (30 falls + 40 activities of daily living) sequences. Fall events are recorded with 2 Microsoft Kinect (RGB + Depth) cameras and corresponding accelerometric data. ADL events are recorded with only one camera and accelerometer. Sensor data was collected using PS Move (60Hz) and x-IMU (256Hz) devices.
EvoEval is a holistic benchmark suite created by evolving HumanEval problems¹. It contains 828 new problems across 5 semantic-altering and 2 semantic-preserving benchmarks¹. EvoEval allows evaluation and comparison across different dimensions and problem types, such as Difficult, Creative, or Tool Use problems¹.
The first real-world, large-scale Roadside Cooperative Perception Dataset, RCooper, is released to bloom research on roadside cooperative perception for practical applications. More than 50k images and 30k point clouds manually annotated with 3D bounding boxes and trajectories for ten semantic classes are provided.
The odometry benchmark consists of 22 stereo sequences, saved in loss less png format: We provide 11 sequences (00-10) with ground truth trajectories for training and 11 sequences (11-21) without ground truth for evaluation. For this benchmark you may provide results using monocular or stereo visual odometry, laser-based SLAM or algorithms that combine visual and LIDAR information. The only restriction we impose is that your method is fully automatic (e.g., no manual loop-closure tagging is allowed) and that the same parameter set is used for all sequences. A development kit provides details about the data format. More details are available at: https://www.cvlibs.net/datasets/kitti/eval_odometry.php.
To comprehensively evaluate the effectiveness and generalization ability of style transfer methods, we build StyleBench that covers 73 distinct styles, ranging from paintings, flat illustrations, 3D rendering to sculptures with varying materials. For each style, we collect 5-7 distinct images with variations. In total, our StyleBench contains 490 images across diverse styles.
We construct a style-balanced dataset, called StyleGallery, covering several open source datasets. Specifically, StyleGallery includes JourneyDB, a dataset comprising a broad spectrum of diverse styles derived from MidJourney, and WIKIART, with extensive fine-grained painting styles, such as pointillism and ink drawing, and a subset of stylized images from LAION-Aesthetics.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).