19,997 machine learning datasets
19,997 dataset results
Annotated audio files (separate combined annotation file) of lung sounds as recorded from various vantage points of the chest wall. The annotation includes the sound type (Insipratory: I, Experiatory: E, Wheezes: W, Crackles: C , N:Normal), the diagnosis as decided by a specialist (Asthma, COPD, BRON, heart failure, lung fibrosis, etc.), and the location on the chest wall from which the recording was taken (Posterior: P Lower: L Left: L Right R, UPPER: U, ANTERIOR: A, MIDDLE: M). The audio file names are coded: 1. Filter type; B: BELL 20-200Hz, Diaphragm 100-500 Hz, Extended range 50-500 Hz. 2. Patient number: P1-P112.
DL3DV-10K is a dataset of real-world videos with scene annotations and camera parameters.
MFW+ is a benchmark dataset for masked face recognition and an extended version of MFW. The original MFW, published as a benchmark for masked face recognition, is composed of 300 IDs and 3,000 images. However, with two duplicate IDs found in MFW, the dataset actually contains 298 unique IDs and 2,980 images. To evaluate models under various mask conditions and environments, we manually gathered additional data from the web. The refined and extended MFW, which we named MFW+, contains 606 IDs, 2,911 unmasked face images, and 2,838 masked face images. Paper: https://bmvc2022.mpi-inf.mpg.de/0723.pdf
WiFall dataset contains data related to fall detection, action recognition and people id identification in a meeting room scenario. The dataset provides synchronised CSI, RSSI, and timestamp for each sample.
This ImageNet-100 dataset was introduced in the following paper,
An eyeblink detection in the wild dataset.
We introduce the first dataset, MUSIC-AVQA-R, to evaluate the robustness of AVQA models. The construction of this dataset involves two key processes: rephrasing and splitting. The former involves the rephrasing of questions in the test split of MUSIC-AVQA, and the latter is dedicated to the categorization of questions into frequent (head) and rare (tail) subset.
M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing. 40% of the data are unanswerable questions, 60% are answerable.
ThermoHands is the first benchmark dataset specifically designed for egocentric 3D hand pose estimation from thermal images. It addresses the challenges of hand pose estimation in low-light conditions and when the hand is occluded by gloves or other wearables—scenarios where traditional RGB or NIR-based systems struggle.
The dataset available for download on this webpage represents a 5x5x5µm section taken from the CA1 hippocampus region of the brain, corresponding to a 1065x2048x1536 volume. The resolution of each voxel is approximately 5x5x5nm. The data is provided as multipage TIF files that can be loaded in Fiji. We annotated mitochondria in two sub-volumes. Each sub-volume consists of the first 165 slices of the 1065x2048x1536 image stack. The volume used for training our algorithm in the publications mentionned at the bottom of this page is the top part while the bottom part was used for testing.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
Understanding and analyzing animal behavior is increasingly essential to protect endangered animal species. However, the application of advanced computer vision techniques in this regard is minimal, which boils down to lacking large and diverse datasets for training deep models.
ApisTox contains molecules in SMILES format for predicting pesticides toxicity to honey bees.
The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam benchmark to date for the multilingual evaluation of vision-language models. Kaleidoscope is a large-scale, in-language multimodal benchmark designed to evaluate VLMs across diverse languages and visual inputs. Kaleidoscope covers 18 languages and 14 different subjects, amounting to a total of 20,911 multiple-choice questions. Built through an open science collaboration with a diverse group of researchers worldwide, Kaleidoscope ensures linguistic and cultural authenticity. We evaluate top-performing multilingual vision-language models and find that they perform poorly on low-resource languages
CLEAR-Bias is a benchmark dataset designed to evaluate the robustness of large language models (LLMs) against bias elicitation, particularly under adversarial conditions. It comprises 4,400 prompts across two task formats: multiple-choice and sentence completion. These prompts span seven core bias categories—age, disability, ethnicity, gender, religion, sexual orientation, and socioeconomic status—as well as three intersectional categories, enabling the exploration of overlapping social biases often overlooked in standard evaluations. Each category includes 20 carefully crafted base prompts (10 per task type), which are further expanded using seven jailbreak techniques: machine translation, obfuscation, prefix and prompt injection, refusal suppression, reward incentives, and role-playing—each implemented with three variants.
https://github.com/dialogue-evaluation/RuSentNE-evaluation
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
435 vocal presets retrieval from MedleyDB and a private collection of multi-track mixes.