Datasets

19,997 machine learning datasets

19,997 dataset results

SingFake (SingFake: Singing Voice Deepfake Detection)

The rise of singing voice synthesis presents critical challenges to artists and industry stakeholders over unauthorized voice usage. Unlike synthesized speech, synthesized singing voices are typically released in songs containing strong background music that may hide synthesis artifacts. Additionally, singing voices present different acoustic and linguistic characteristics from speech utterances. These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection. In this work, we propose the singing voice deepfake detection task. We first present SingFake, the first curated in-the-wild dataset consisting of 28.93 hours of bonafide and 29.40 hours of deepfake song clips in five languages from 40 singers. We provide a train/val/test split where the test sets include various scenarios. We then use SingFake to evaluate four state-of-the-art speech countermeasure systems trained on speech utterances. We find these sys

7 papers0 benchmarksAudio, Music, Speech

BraTS 2020 (RSNA-ASNR-MICCAI Brain Tumor Segmentation BraTS Challenge)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

7 papers0 benchmarksImages

TAPE (resToration of digitized Analog videotaPEs)

A dataset of videos synthetically degraded with Adobe After Effects to exhibit artifacts resembling those of real-world analog videotapes. The original high-quality videos belong to the Venice scene of the Harmonic dataset. The artifacts taken into account are: 1) tape mistracking; 2) VHS edge waving; 3) chroma loss along the scanlines; 4) tape noise; 5) undersaturation. The dataset comprises a total of 26,392 frames corresponding to 40 clips. The clips are randomly divided into training and test sets with a 75%-25% ratio.

7 papers4 benchmarksVideos

SSC (Spiking Speech Commands v0.2)

The SSC dataset is a spiking version of the Speech Commands dataset release by Google (Speech Commands). SSC was generated using Lauscher, an artificial cochlea model. The SSC dataset consists of utterances recorded from a larger number of speakers under controlled conditions. Spikes were generated in 700 input channels, and it contains 35 word categories from a large number of speakers.

7 papers2 benchmarksAudio

ViHOS (Hate Speech Spans Detection for Vietnamese)

The first human-annotated corpus containing 26k spans on 11k comments

7 papers0 benchmarksTexts

WDC SOTAB V2

SOTAB V2 features two annotation tasks: Column Type Annotation (CTA) and Columns Property Annotation (CPA). The goal of the Column Type Annotation (CTA) task is to annotate the columns of a table using 82 types from the Schema.org vocabulary, such as telephone, Duration, Mass, or Organization. The goal of the Columns Property Annotation (CPA) task is to annotate pairs of table columns with one out of 108 Schema.org properties, such as gtin, startDate, priceValidUntil, or recipeIngredient. The benchmark consists of 45,834 tables annotated for CTA and 30,220 tables annotated for CPA originating from 55,511 different websites. The tables are split into training-, validation- and test sets for both tasks. The tables cover 17 popular Schema.org types including Product, LocalBusiness, Event, and JobPosting.

7 papers2 benchmarksTabular

SceneEgo (SceneEgo Test Dataset)

Evaluating human-scene interaction requires precise annotations for camera pose and scene geometry. However, such information is not available in existing datasets for egocentric human pose estimation. To solve this issue, we collected a new real-world dataset using a head-mounted fisheye camera combined with a calibration board. The ground truth scene geometry is obtained with the SfM method from a multi-view capture system with 120 synced 4K resolution cameras and the ground truth egocentric camera pose is obtained by localizing a calibration board rigidly attached to the egocentric camera. This dataset contains around 28K frames of two actors, performing various human-scene interacting motions such as sitting, reading a newspaper, and using a computer. This dataset is evenly split into training and testing splits. We fine-tuned the method on the training split before the evaluation. This dataset will be made publicly available and additional details of it are shown in the supplement

7 papers8 benchmarks

ConfAIde

ConfAIde is a benchmark that evaluates the inference-time privacy implications of Language Models (LLMs) in interactive settings.

7 papers0 benchmarks

MMVP-VLM

The MMVP-VLM (Multimodal Visual Patterns - Visual Language Models) Benchmark is specifically designed to systematically evaluate the performance of recent CLIP-based models in understanding and processing visual patterns. Let's delve into the details:

7 papers0 benchmarks

XSafety

XSafety is the first multilingual safety benchmark specifically designed for Large Language Models (LLMs). The motivation behind creating XSafety stems from the global deployment of LLMs in practical applications. Here are the key points about XSafety:

7 papers0 benchmarks

Sunnybrook Cardiac Data

The Sunnybrook Cardiac Data (SCD), also known as the 2009 Cardiac MR Left Ventricle Segmentation Challenge data, consist of 45 cine-MRI images from a mixed of patients and pathologies: healthy, hypertrophy, heart failure with infarction and heart failure without infarction. Subset of this data set was first used in the automated myocardium segmentation challenge from short-axis MRI, held by a MICCAI workshop in 2009. The whole complete data set is now available in the CAP database with public domain license.

7 papers0 benchmarksImages, MRI

NExT-QA (Open-ended VideoQA)

NExT-QA is a VideoQA benchmark targeting the explanation of video contents. It challenges QA models to reason about the causal and temporal actions and understand the rich object interactions in daily activities. This page records LLMs for answer evaluation.

7 papers2 benchmarksTexts, Videos

CheXmask

The CheXmask Database presents a comprehensive, uniformly annotated collection of chest radiographs, constructed from five public databases: ChestX-ray8, Chexpert, MIMIC-CXR-JPG, Padchest and VinDr-CXR. The database aggregates 657,566 anatomical segmentation masks derived from images which have been processed using the HybridGNet model to ensure consistent, high-quality segmentation. To confirm the quality of the segmentations, we include in this database individual Reverse Classification Accuracy (RCA) scores for each of the segmentation masks. This dataset is intended to catalyze further innovation and refinement in the field of semantic chest X-ray analysis, offering a significant resource for researchers in the medical imaging domain.

7 papers0 benchmarksBiomedical, Images, Medical

URFD Dataset (UR Fall Detection Dataset)

This dataset contains 70 (30 falls + 40 activities of daily living) sequences. Fall events are recorded with 2 Microsoft Kinect (RGB + Depth) cameras and corresponding accelerometric data. ADL events are recorded with only one camera and accelerometer. Sensor data was collected using PS Move (60Hz) and x-IMU (256Hz) devices.

7 papers0 benchmarksRGB-D

EvoEval

EvoEval is a holistic benchmark suite created by evolving HumanEval problems¹. It contains 828 new problems across 5 semantic-altering and 2 semantic-preserving benchmarks¹. EvoEval allows evaluation and comparison across different dimensions and problem types, such as Difficult, Creative, or Tool Use problems¹.

7 papers0 benchmarks

RCooper (Roadside Cooperative Perception Dataset)

The first real-world, large-scale Roadside Cooperative Perception Dataset, RCooper, is released to bloom research on roadside cooperative perception for practical applications. More than 50k images and 30k point clouds manually annotated with 3D bounding boxes and trajectories for ten semantic classes are provided.

7 papers0 benchmarks3D, Images, Point cloud

KITTI Odometry Benchmark

The odometry benchmark consists of 22 stereo sequences, saved in loss less png format: We provide 11 sequences (00-10) with ground truth trajectories for training and 11 sequences (11-21) without ground truth for evaluation. For this benchmark you may provide results using monocular or stereo visual odometry, laser-based SLAM or algorithms that combine visual and LIDAR information. The only restriction we impose is that your method is fully automatic (e.g., no manual loop-closure tagging is allowed) and that the same parameter set is used for all sequences. A development kit provides details about the data format. More details are available at: https://www.cvlibs.net/datasets/kitti/eval_odometry.php.

7 papers3 benchmarksImages, Time series

StyleBench

To comprehensively evaluate the effectiveness and generalization ability of style transfer methods, we build StyleBench that covers 73 distinct styles, ranging from paintings, flat illustrations, 3D rendering to sculptures with varying materials. For each style, we collect 5-7 distinct images with variations. In total, our StyleBench contains 490 images across diverse styles.

7 papers6 benchmarks

StyleGallery

We construct a style-balanced dataset, called StyleGallery, covering several open source datasets. Specifically, StyleGallery includes JourneyDB, a dataset comprising a broad spectrum of diverse styles derived from MidJourney, and WIKIART, with extensive fine-grained painting styles, such as pointillism and ink drawing, and a subset of stylized images from LAION-Aesthetics.

7 papers0 benchmarks

NEU-DET

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

7 papers6 benchmarks

PreviousPage 191 of 1000Next