19,997 machine learning datasets
19,997 dataset results
The Free Universal Sound Separation (FUSS) dataset is a database of arbitrary sound mixtures and source-level references, for use in experiments on arbitrary sound separation. FUSS is based on FSD50K corpus.
The Groove MIDI Dataset (GMD) is composed of 13.6 hours of aligned MIDI and (synthesized) audio of human-performed, tempo-aligned expressive drumming. The dataset contains 1,150 MIDI files and over 22,000 measures of drumming.
JuICe is a corpus of 1.5 million examples with a curated test set of 3.7K instances based on online programming assignments. Compared with existing contextual code generation datasets, JuICe provides refined human-curated data, open-domain code, and an order of magnitude more training data.
To address the need for a standard open domain table benchmark dataset, the author propose a novel weak supervision approach to automatically create the TableBank, which is orders of magnitude larger than existing human labeled datasets for table analysis. Distinct from traditional weakly supervised training set, our approach can obtain not only large scale but also high quality training data.
AgeDB contains 16, 488 images of various famous people, such as actors/actresses, writers, scientists, politicians, etc. Every image is annotated with respect to the identity, age and gender attribute. There exist a total of 568 distinct subjects. The average number of images per subject is 29. The minimum and maximum age is 1 and 101, respectively. The average age range for each subject is 50.3 years.
Toyota Smarthome Untrimmed (TSU) is a dataset for activity detection in long untrimmed videos. The dataset contains 536 videos with an average duration of 21 mins. Since this dataset is based on the same footage video as Toyota Smarthome Trimmed version, it features the same challenges and introduces additional ones. The dataset is annotated with 51 activities.
3D Hand Pose is a multi-view hand pose dataset consisting of color images of hands and different kind of annotations for each: the bounding box and the 2D and 3D location on the joints in the hand.
Includes more than two million traffic sign images that are based on real-world and simulator data.
DynaSent is an English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis. DynaSent combines naturally occurring sentences with sentences created using the open-source Dynabench Platform, which facilities human-and-model-in-the-loop dataset creation. DynaSent has a total of 121,634 sentences, each validated by five crowdworkers.
The FIVR-200K dataset has been collected to simulate the problem of Fine-grained Incident Video Retrieval (FIVR). The dataset comprises 225,960 videos associated with 4,687 Wikipedia events and 100 selected video queries.
Aims to extract events and their arguments from multimedia documents. Develops the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments.
MinneApple is a benchmark dataset for apple detection and segmentation. The fruits are labelled using polygonal masks for each object instance to aid in precise object detection, localization, and segmentation. Additionally, the dataset also contains data for patch-based counting of clustered fruits. The dataset contains over 41, 000 annotated object instances in 1000 images.
A database with 2,000 videos captured by surveillance cameras in real-world scenes.
SemArt is a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. It contains 21,384 samples that provides artistic comments along with fine-art paintings and their attributes for studying semantic art understanding.
The database includes 3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state.
SketchGraphs is a dataset of 15 million sketches extracted from real-world CAD models intended to facilitate research in both ML-aided design and geometric program induction. Each sketch is represented as a geometric constraint graph where edges denote designer-imposed geometric relationships between primitives, the nodes of the graph.
A novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users.
TextSeg is a large-scale fine-annotated and multi-purpose text detection and segmentation dataset, collecting scene and design text with six types of annotations: word- and character-wise bounding polygons, masks and transcriptions.
TV show Caption is a large-scale multimodal captioning dataset, containing 261,490 caption descriptions paired with 108,965 short video moments. TVC is unique as its captions may also describe dialogues/subtitles while the captions in the other datasets are only describing the visual content.
VehicleX is a large-scale synthetic dataset. Created in Unity, it contains 1,362 vehicles of various 3D models with fully editable attributes.