19,997 machine learning datasets
19,997 dataset results
Multimodal C4 (MMC4) is an augmentation of the popular text-only c4 corpus with images interleaved. The corpus contains 103M documents containing 585M images interleaved with 43B English tokens.
Multispectral imaging using multiplexed illumination.
The AGIQA-3K is a fine-grained AI-generated image (AGI) subjective quality assessment database. It was created to address the need for quality models that are consistent with human subjective ratings, considering the large quality variance among different AGIs. The database extensively considers various popular AGI models, generates AGI through different prompts and model parameters, and collects subjective scores at the perceptual quality and text-to-image alignment level.
Weather is recorded every 10 minutes for the 2020 whole year, which contains 21 meteorological indicators, such as air temperature, humidity, etc. The dataset in CSV format can be downloaded at https://drive.google.com/file/d/1Tc7GeVN7DLEl-RAs-JVwG9yFMf--S8dy/view?usp=share_link.
The Tox21 data set comprises 12,060 training samples and 647 test samples that represent chemical compounds. There are 801 "dense features" that represent chemical descriptors, such as molecular weight, solubility or surface area, and 272,776 "sparse features" that represent chemical substructures (ECFP10, DFS6, DFS8; stored in Matrix Market Format ). Machine learning methods can either use sparse or dense data or combine them. For each sample there are 12 binary labels that represent the outcome (active/inactive) of 12 different toxicological experiments. Note that the label matrix contains many missing values (NAs). The original data source and Tox21 challenge site is https://tripod.nih.gov/tox21/challenge/.
The Multiview 3D event dataset is capture by me and Xiaohan Nie in UCLA. it contains RGB, depth and human skeleton data captured simultaneously by three Kinect cameras. This dataset include 10 action categories: pick up with one hand, pick up with two hands, drop trash, walk around, sit down, stand up, donning, doffing, throw, carry. Each action is performed by 10 actors. This dataset contains data taken from a variety of viewpoints. The dataset can be found in part-1, part-2 part-3, part-4, part-5, part-6, part-7, part-8, part-9, part-10, part-11, part-12, part-13, part-14, part-15, part-16, We also created a version of the dataset that only contains RGB videos: RGB videos only.
FakeNewsNet is collected from two fact-checking websites: GossipCop and PolitiFact containing news contents with labels annotated by professional journalists and experts, along with social context information.
QuaRTz is a crowdsourced dataset of 3864 multiple-choice questions about open domain qualitative relationships. Each question is paired with one of 405 different background sentences (sometimes short paragraphs).
DeepFashion2 is a versatile benchmark of four tasks including clothes detection, pose estimation, segmentation, and retrieval. It has 801K clothing items where each item has rich annotations such as style, scale, viewpoint, occlusion, bounding box, dense landmarks and masks. There are also 873K Commercial-Consumer clothes pairs
The General-100 dataset is a dataset for image super-resolution. It contains 100 bmp format images with no compression) The size of the 100 images ranges from 710 x 704 (large) to 131 x 112 (small).
The HELP dataset is an automatically created natural language inference (NLI) dataset that embodies the combination of lexical and logical inferences focusing on monotonicity (i.e., phrase replacement-based reasoning). The HELP (Ver.1.0) has 36K inference pairs consisting of upward monotone, downward monotone, non-monotone, conjunction, and disjunction.
A new large multiview dataset for human body expressions with natural clothing. The goal of HUMBI is to facilitate modeling view-specific appearance and geometry of gaze, face, hand, body, and garment from assorted people. 107 synchronized HD cameras are used to capture 772 distinctive subjects across gender, ethnicity, age, and physical condition.
InteriorNet is a RGB-D for large scale interior scene understanding and mapping. The dataset contains 20M images created by pipeline:
ShoeV2 is a dataset of 2,000 photos and 6648 sketches of shoes. The dataset is designed for fine-grained sketch-based image retrieval.
An interactive, first-person, partially-observed visual environment that uses Google Street View for its photographic content and broad coverage, and give performance baselines for a challenging goal-driven navigation task.
A new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families containing round 40 billion characters and aimed to accelerate the research of multilingual modeling.
Holl-E is a dataset containing movie chats wherein each response is explicitly generated by copying and/or modifying sentences from unstructured background knowledge such as plots, comments and reviews about the movie.
ECtHR is a dataset comprising European Court of Human Rights cases, including annotations for paragraph-level rationales. This dataset comprises 11k ECtHR cases and can be viewed as an enriched version of the ECtHR dataset of Chalkidis et al. (2019), which did not provide ground truth for alleged article violations (articles discussed) and rationales. It is released with silver rationales obtained from references in court decisions, and gold rationales provided by ECHR-experienced lawyers
Action Genome Question Answering (AGQA) is a benchmark for compositional spatio-temporal reasoning. AGQA contains 192M unbalanced question answer pairs for 9.6K videos. It also contains a balanced subset of 3.9M question answer pairs, 3 orders of magnitude larger than existing benchmarks, that minimizes bias by balancing the answer distributions and types of question structures.
VGG-SS (VGG Sound Source) is a benchmark for evaluating sound source localisation in videos. The dataset consists on a new set of annotations for the recently-introduced VGG-Sound dataset, where the sound sources visible in each video clip are explicitly marked with bounding box annotations. This dataset is 20 times larger than analogous existing ones, contains 5K videos spanning over 200 categories, and, differently from Flickr SoundNet, is video-based.