19,997 machine learning datasets
19,997 dataset results
300 news articles annotated with 1,727 bias spans and find evidence that informational bias appears in news articles more frequently than lexical bias.
Evidence Inference is a corpus for this task comprising 10,000+ prompts coupled with full-text articles describing RCTs.
IntrA is an open-access 3D intracranial aneurysm dataset that makes the application of points-based and mesh-based classification and segmentation models available. This dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clipping operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction.
KPTimes is a large-scale dataset of news texts paired with editor-curated keyphrases.
TG-ReDial is a a topic-guided conversational recommendation dataset for research on conversational/interactive recommender systems.
VQA-HAT (Human ATtention) is a dataset to evaluate the informative regions of an image depending on the question being asked about it. The dataset consists of human visual attention maps over the images in the original VQA dataset. It contains more than 60k attention maps.
The Korean DeepFake Detection Dataset (KoDF) is a large-scale collection of synthesized and real videos focused on Korean subjects, used for the task of deepfake detection.
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) contains 7,356 files (total size: 24.8 GB). The database contains 24 professional actors (12 female, 12 male), vocalizing two lexically-matched statements in a neutral North American accent. Speech includes calm, happy, sad, angry, fearful, surprise, and disgust expressions, and song contains calm, happy, sad, angry, and fearful emotions. Each expression is produced at two levels of emotional intensity (normal, strong), with an additional neutral expression. All conditions are available in three modality formats: Audio-only (16bit, 48kHz .wav), Audio-Video (720p H.264, AAC 48kHz, .mp4), and Video-only (no sound). Note, there are no song files for Actor_18.
iNat2021 is a large-scale image dataset collected and annotated by community scientists that contains over 2.7M images from 10k different species.
RadarScenes is a real-world radar point cloud dataset for automotive applications.
Dataset of 64x64 images of a robot pushing objects on a table top. From Berkeley AI Research (BAIR).
CaseHOLD (Case Holdings On Legal Decisions) is a law dataset comprised of over 53,000+ multiple choice questions to identify the relevant holding of a cited case. This dataset presents a fundamental task to lawyers and is both legally meaningful and difficult from an NLP perspective (F1 of 0.4 with a BiLSTM baseline). The citing context from the judicial decision serves as the prompt for the question. The answer choices are holding statements derived from citations following text in a legal decision. There are five answer choices for each citing text. The correct answer is the holding statement that corresponds to the citing text. The four incorrect answers are other holding statements.
A dataset of 90,000 high-resolution nature landscape images, crawled from Unsplash and Flickr and preprocessed with Mask R-CNN and Inception V3.
Nighttime Driving is a dataset of road scenes consisting of 35,000 images ranging from daytime to twilight time and to nighttime.
Gaofen Image Dataset (GID) is a large-scale land-cover dataset constructed with Gaofen-2 (GF-2) satellite images. This dataset has superiorities over the existing land-cover dataset because of its large coverage, wide distribution, and high spatial resolution. It contains 150 GF-2 images annotated at the pixel level for 5 categories: built-up, farmland, forest, meadow, and water.
Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion. X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames
The data set contains 38 patches (of the same size), each consisting of a true orthophoto (TOP) extracted from a larger TOP mosaic.
S2Looking is a building change detection dataset that contains large-scale side-looking satellite images captured at varying off-nadir angles. The S2Looking dataset consists of 5,000 registered bitemporal image pairs (size of 1024*1024, 0.5 ~ 0.8 m/pixel) of rural areas throughout the world and more than 65,920 annotated change instances. We provide two label maps to separately indicate the newly built and demolished building regions for each sample in the dataset. We establish a benchmark task based on this dataset, i.e., identifying the pixel-level building changes in the bi-temporal images.
UVO is a new benchmark for open-world class-agnostic object segmentation in videos. Besides shifting the problem focus to the open-world setup, UVO is significantly larger, providing approximately 8 times more videos compared with DAVIS, and 7 times more mask (instance) annotations per video compared with YouTube-VOS and YouTube-VIS. UVO is also more challenging as it includes many videos with crowded scenes and complex background motions. Some highlights of the dataset include:
MedMNIST v2 is a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28 x 28 (2D) or 28 x 28 x 28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning.