19,997 machine learning datasets
19,997 dataset results
The Extended Complex Scene Saliency Dataset (ECSSD) is comprised of complex scenes, presenting textures and structures common to real-world images. ECSSD contains 1,000 intricate images and respective ground-truth saliency maps, created as an average of the labeling of five human participants.
BioGRID is a biomedical interaction repository with data compiled through comprehensive curation efforts. The current index is version 4.2.192 and searches 75,868 publications for 1,997,840 protein and genetic interactions, 29,093 chemical interactions and 959,750 post translational modifications from major model organism species.
MOT20 is a dataset for multiple object tracking. The dataset contains 8 challenging video sequences (4 train, 4 test) in unconstrained environments, from crowded places such as train stations, town squares and a sports stadium. Image Source: https://motchallenge.net/vis/MOT20-04
The Ciao dataset contains rating information of users given to items, and also contain item category information. The data comes from the Epinions dataset.
A new multimodal retrieval dataset. TVR requires systems to understand both videos and their associated subtitle (dialogue) texts, making it more realistic. The dataset contains 109K queries collected on 21.8K videos from 6 TV shows of diverse genres, where each query is associated with a tight temporal window.
PEMS-BAY is a dataset for traffic prediction.
The LM (Linemod) dataset is a valuable resource introduced by Stefan Hinterstoisser and colleagues in their research on model-based training, detection, and pose estimation of texture-less 3D objects in heavily cluttered scenes¹. Let's delve into the details:
The ProPara dataset is designed to train and test comprehension of simple paragraphs describing processes (e.g., photosynthesis), designed for the task of predicting, tracking, and answering questions about how entities change during the process.
The Segmentation of Underwater IMagery (SUIM) dataset contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.
A dataset for robot grasp planning based on physics simulation. The dataset contains 17.7M parallel-jaw grasps, spanning 8872 objects from 262 different categories, each labeled with the grasp result obtained from a physics simulator.
CoVoST is a large-scale multilingual speech-to-text translation corpus. Its latest 2nd version covers translations from 21 languages into English and from English into 15 languages. It has total 2880 hours of speech and is diversified with 78K speakers and 66 accents.
A benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv.com.
DR(eye)VE is a large dataset of driving scenes for which eye-tracking annotations are available. This dataset features more than 500,000 registered frames, matching ego-centric views (from glasses worn by drivers) and car-centric views (from roof-mounted camera), further enriched by other sensors measurements.
The EgoHands dataset contains 48 Google Glass videos of complex, first-person interactions between two people. The main intention of this dataset is to enable better, data-driven approaches to understanding hands in first-person computer vision. The dataset offers
OxUva is a dataset and benchmark for evaluating single-object tracking algorithms.
A dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps.
THCHS-30 is a free Chinese speech database THCHS-30 that can be used to build a full-fledged Chinese speech recognition system.
UMDFaces is a face dataset divided into two parts:
PolyU Dataset is a large dataset of real-world noisy images with reasonably obtained corresponding “ground truth” images. The basic idea is to capture the same and unchanged scene for many (e.g., 500) times and compute their mean image, which can be roughly taken as the “ground truth” image for the real-world noisy images. The rational of this strategy is that for each pixel, the noise is generated randomly larger or smaller than 0. Sampling the same pixel many times and computing the average value will approximate the truth pixel value and alleviate significantly the noise.
OGB Large-Scale Challenge (OGB-LSC) is a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph ML. OGB-LSC provides graph datasets that are orders of magnitude larger than existing ones and covers three core graph learning tasks -- link prediction, graph regression, and node classification.