19,997 machine learning datasets
19,997 dataset results
Artificial hierarchical datasets to study how neural networks learn hierarchical tasks. See papers for details.
Description Detection Dataset ($D^3$, /dikju:b/) is an attempt at creating a next-generation object detection dataset. Unlike traditional detection datasets, the class names of the objects are no longer simple nouns or noun phrases, but rather complex and descriptive, such as a dog not being held by a leash. For each image in the dataset, any object that matches the description is annotated. The dataset provides annotations such as bounding boxes and finely crafted instance masks.It comprises of 422 well-designed descriptions and 24,282 positive object-description pairs.
REFinD is a large-scale annotated dataset of relations, with ∼29K instances and 22 relations amongst 8 types of entity pairs, generated entirely over financial documents.
Although deep face recognition has achieved impressive results in recent years, there is increasing controversy regarding racial and gender bias of the models, questioning their trustworthiness and deployment into sensitive scenarios. DemogPairs is a validation set with 10.8K facial images and 58.3M identity verification pairs, distributed in demographically-balanced folds of Asian, Black and White females and males. We also propose a benchmark of experiments using DemogPairs over state-of-the-art deep face recognition models in order to analyze their cross-demographic behavior and potential demographic biases (see figure below).
FunnyBirds is a synthetic vision dataset that is developed to automatically and quantitatively analyze XAI methods. It consists of 50 500 images (50k train, 500 test) of 50 synthetic bird species.
CommitPackFT is a 2GB filtered version of CommitPack to contain only high-quality commit messages that resemble natural language instructions.
ChatHaruhi is a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated dialogues.
GRAZPEDWRI-DX is a public dataset of 20,327 pediatric wrist trauma X-ray images released by the University of Medicine of Graz. These X-ray images were collected by multiple pediatric radiologists at the Department for Pediatric Surgery of the University Hospital Graz between 2008 and 2018, involving 6,091 patients and a total of 10,643 studies. This dataset is annotated with 74,459 image labels, featuring a total of 67,771 labeled objects.
Autism spectrum disorder (ASD) is characterized by qualitative impairment in social reciprocity, and by repetitive, restricted, and stereotyped behaviors/interests. Previously considered rare, ASD is now recognized to occur in more than 1% of children. Despite continuing research advances, their pace and clinical impact have not kept up with the urgency to identify ways of determining the diagnosis at earlier ages, selecting optimal treatments, and predicting outcomes. For the most part this is due to the complexity and heterogeneity of ASD. To face these challenges, large-scale samples are essential, but single laboratories cannot obtain sufficiently large datasets to reveal the brain mechanisms underlying ASD. In response, the Autism Brain Imaging Data Exchange (ABIDE) initiative has aggregated functional and structural brain imaging data collected from laboratories around the world to accelerate our understanding of the neural bases of autism. With the ultimate goal of facilitating
These are the files containing the Convex Hull and Traveling Salesman Problem dataset present in the “Pointer Networks” paper:
ViP-Bench is a comprehensive benchmark designed to assess the capability of multimodal models in understanding visual prompts across multiple dimensions. It aims to evaluate how well these models interpret various visual prompts, including recognition, OCR, knowledge, math, relationship reasoning, and language generation. ViP-Bench includes a diverse set of 303 images and questions, providing a thorough assessment of visual understanding capabilities at the region level. This benchmark sets a foundation for future research into multimodal models with arbitrary visual prompts.
The WinoWhy dataset is a resource that provides human-annotated reasons for answering Winograd Schema Challenge (WSC) questions. It includes the original WSC dataset and 4095 WinoWhy reasons (15 for each WSC question) that could justify the pronoun coreference choices in WSC.
The OCW dataset is for evaluating creative problem solving tasks by curating the problems and human performance results from the popular British quiz show Only Connect.
CriticBench is a comprehensive benchmark designed to assess the abilities of Large Language Models (LLMs) to critique and rectify their reasoning across various tasks. It encompasses five reasoning domains:
Earnings-22 is a practical benchmark designed to evaluate automatic speech recognition (ASR) systems' performance on real-world, accented audio. Let me provide you with more details:
Realistic Video DeSnowing Dataset (RVSD) contains a total of 110 pairs of videos. Each pair contains snowy and hazy videos and corresponding snow-free and haze-free ground truth videos. We use a rendering engine (Unreal Engine 5) and various augmentation techniques to generate snow and haze with diverse and realistic physical properties. This results in more realistic and varied synthesized videos, which improve the model’s performance on real-world data.
Memory Maze is a 3D domain of randomized mazes designed for evaluating the long-term memory abilities of RL agents. Memory Maze isolates long-term memory from confounding challenges, such as exploration, and requires remembering several pieces of information: the positions of objects, the wall layout, and keeping track of agent’s own position.
This is a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool. Each file is annotated by multiple annotators that provided tags and a one-sentence description of the audio content.
The GraphInstruct dataset is part of a benchmark proposed in the paper titled "GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability." This benchmark is designed to evaluate and enhance the graph understanding abilities of large language models (LLMs). It includes 21 classical graph reasoning tasks, providing diverse graph generation pipelines and detailed reasoning steps².
StableToolBench is a new benchmark for tool learning that aims to provide a well-balanced combination of stability and reality, building upon its predecessor, ToolBench. It was developed to address the instability issues of previous tool learning benchmarks, which either relied on hand-crafted online tools with limited scale or large-scale real online APIs that suffered from instability due to API status changes¹².