Datasets

3,275 machine learning datasets

3,275 dataset results

SKILL-102 (SKILL 102 Lifelong Learning Dataset)

SKILL-102 consists of 102 image classification datasets. Each one supports one complex classification task, and the corresponding dataset was obtained from previously published sources (e.g., task 1: classify flowers into 102 classes, such as lily, rose, petunia, etc using 8,185 train/val/test images (Nilsback & Zisserman, 2008a); task 2: classify 67 types of scenes, such as kitchen, bedroom, gas station, library, etc using 15,524 images (Quattoni & Torralba, 2009). In total, SKILL-102 comprises 102 tasks, 5,033 classes, and 2,041,225 training images. To the best of our knowledge, SKILL-102 is the most challenging completely real (not synthesized or permuted) image classification benchmark for LL and SKILL algorithms, with the largest number of tasks, number of classes, and inter-task variance.

2 papers0 benchmarksImages

Fish-100

Schools of inland silversides (Menidia beryllina, n=14 individuals per school) were recorded in the Lauder Lab at Harvard University while swimming at 15 speeds (0.5 to 8 BL/s, body length, at 0.5 BL/s intervals) in a flow tank with a total working section of 28 x 28 x 40 cm as described in previous work, at a constant temperature (18±1°C) and salinity (33 ppt), at a Reynolds number of approximately 10,000 (based on BL). Dorsal views of steady swimming across these speeds were recorded by high-speed video cameras (FASTCAM Mini AX50, Photron USA, San Diego, CA, USA) at 60-125 frames per second (feeding videos at 60 fps, swimming alone 125 fps). The dorsal view was recorded above the swim tunnel and a floating Plexiglas panel at the water surface prevented surface ripples from interfering with dorsal view videos. Five keypoints were labeled (tip, gill, peduncle, dorsal fin tip, caudal tip). 100 frames were labeled, making this a real-world sized laboratory dataset.

2 papers4 benchmarksImages

ccHarmony

ccHarmony is a color checker (cc) based image harmonization dataset. The dataset contains 350 real images and 426 segmented foregrounds, in which each real image has one or two segmented foregrounds. Each foreground is associated with 10 synthetic composite images. Therefore, our dataset has in total 4260 pairs of synthetic composite images and ground-truth real images. We split all pairs into 3080 training pairs and 1180 test pairs.

2 papers0 benchmarksImages

FracAtlas (A Dataset for Fracture Classification, Localization and Segmentation of Musculoskeletal Radiographs)

FractureAtlas is a musculoskeletal bone fracture dataset with annotations for deep learning tasks like classification, localization, and segmentation. The dataset contains a total of 4,083 X-Ray images with annotation in COCO, VGG, YOLO, and Pascal VOC format. This dataset is made freely available for any purpose. The data provided within this work are free to copy, share or redistribute in any medium or format. The data might be adapted, remixed, transformed, and built upon. The dataset is licensed under a CC-BY 4.0 license. It should be noted that to use the dataset correctly, one needs to have knowledge of medical and radiology fields to understand the results and make conclusions based on the dataset. It's also important to consider the possibility of labeling errors.

2 papers0 benchmarksImages

Real-CE

Real-CE is a real-world Chinese-English benchmark dataset for the task of STISR with the emphasis on restoring structurally complex Chinese characters. The benchmark provides 1,935/783 real LR-HR text image pairs (contains 33,789 text lines in total) for training/testing in 2× and 4× zooming modes, complemented by detailed annotations, including detection boxes and text transcripts.

2 papers0 benchmarksImages

PUMaVOS (Partial and Unusual Masks for Video Object Segmentation)

PUMaVOS is a dataset of challenging and practical use cases inspired by the movie production industry.

2 papers0 benchmarksImages, RGB Video

dacl10k (dacl10k: Dataset for Semantic Bridge Damage Segmentation)

dacl10k stands for damage classification 10k images and is a multi-label semantic segmentation dataset for 19 classes (13 damages and 6 objects) present on bridges.

2 papers0 benchmarksImages

SB20 (Sugar Beet 2020 University of Bonn)

Video sequences captured at a field on Campus Kleinaltendorf (CKA), University of Bonn, captured by BonBot-I, an autonomous weeding robot. The data was captured by mounting an Intel RealSense D435i sensor with a nadir view of the ground.

2 papers0 benchmarksImages, RGB Video, RGB-D, Videos

Celeb-HQ Facial Identity Recognition Dataset

2 papers0 benchmarksImages

Celeb-HQ Face Gender Recognition Dataset

2 papers0 benchmarksImages

PAD Dataset (Pose-agnostic/Multi-pose Anomaly Detection Dataset)

Multi-pose Anomaly Detection (MAD) dataset, which represents the first attempt to evaluate the performance of pose-agnostic anomaly detection. The MAD dataset containing 4,000+ highresolution multi-pose views RGB images with camera/pose information of 20 shape-complexed LEGO animal toys for training, as well as 7,000+ simulation and real-world collected RGB images (without camera/pose information) with pixel-precise ground truth annotations for three types of anomalies in test sets. Note that MAD has been further divided into MAD-Sim and MAD-Real for simulation-to-reality studies to bridge the gap between academic research and the demands of industrial manufacturing.

2 papers2 benchmarksImages

CHAMMI (CHAMMI: A benchmark for channel-adaptive models in microscopy imaging)

We present a cellular microscopic image dataset for investigating channel-adaptive models. We collected and pre-processed images from three publicly available sources: 1) the WTC-11 hiPSC dataset from the Allen Institute (Viana et al., 2023), 2) the Human Protein Atlas dataset (Thul et al., 2017), and 3) a combined Cell Painting dataset from the Broad Institute (Gustafsdottir et al., 2013; Bray et al., 2017; Way et al., 2021). These images contain 3, 4, or 5 channels with different cellular structures highlighted in each channel. The goal of this dataset is to facilitate the creation and evaluation of novel computer vision models that are invariant to channel numbers.

2 papers0 benchmarksImages

TTE-A&O (Travel Time Estimation: Abakan and Omsk)

The dataset includes two parts corresponding to the cities of Abakan (65524 nodes, 340012 edges) and Omsk (231688 nodes, 1149492 edges). Along with the road network graph, it includes trip records represented as sequences of visited nodes (making the dataset suitable both for path-blind and path-aware settings). There are two types of target values for a regression task: real travel time and real length of a trip.

2 papers2 benchmarksGraphs, Images

3D-Point Cloud dataset of various geometrical terrains (3D-Point Cloud dataset of various geometrical terrains in urban environments recorded during human locomotion)

Depth vision has been recently used in many locomotion devices with the objective to ease the life of disabled people toward reaching more ecological lifestyle. This is due to the fact that such cameras are cheap, compact and can provide rich information about the environment. Our dataset provides many recordings of point cloud and other types of data during different locomotion modes in urban context. If you used this data, please cite the following papers below: 1-Depth Vision based Terrain Detection Algorithm during Human Locomotion 2-Using Depth Vision for Terrain Detection during Active Locomotion

2 papers0 benchmarks3D, Images, Point cloud, RGB-D

Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification (chestxray dataset)

Dataset of validated OCT and Chest X-Ray images described and analyzed in "Deep learning-based classification and referral of treatable human diseases". The OCT Images are split into a training set and a testing set of independent patients. OCT Images are labeled as (disease)-(randomized patient ID)-(image number by this patient) and split into 4 directories: CNV, DME, DRUSEN, and NORMAL.

2 papers0 benchmarksImages

MMVax-Stance

MMVax-Stance includes 113 Vaccine Hesitancy Framings found on Twitter about the COVID-19 vaccines. Language experts annotated multimodal image-text tweets as Relevant or Not Relevant, and then further annotated Relevant tweets with Stance towards each framing.

2 papers0 benchmarksImages, Texts

CREPE (Compositional REPresentation Evaluation)

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that—across 7 architectures trained with 4 algorithms on massive datasets—they struggle at compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark, CREPE, which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of a test dataset containing over 370K image-text pairs and three different seen-unseen splits. The three splits are designed to test models trained on three popular training datasets: CC-12M, YFCC-15M, and LAION-400M. We also generate 325K, 316K, and 309K hard negative captions for a subset of the pairs. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 183K hard neg

2 papers4 benchmarksImages, Texts

S2-100K

The S2-100K dataset is a dataset of 100,000 multi-spectral satellite images and their corresponding locations (latitude / longitude coordinates of the image centroid) sampled from Sentinel-2 via the Microsoft Planetary Computer. Copernicus Sentinel data is captured between Jan 1, 2021 and May 17, 2023. The dataset is sampled approximately uniformly over landmass and only includes images without cloud coverage.

2 papers0 benchmarksImages

StreetTryOn

StreetTryOn, the new in-the-wild Virtual Try-On dataset, consists of 12,364 and 2,089 street person images for training and validation, respectively. It is derived from the large fashion retrieval dataset DeepFashion2, from which we filter out over 90% of DeepFashion2 images that are infeasible for try-on tasks (e.g., non-frontal view, large occlusion, dark environment, etc.). Combining with the garment and person images in VITON-HD, we obtain a comprehensive suite of in-domain and cross-domain try-on tasks that have garment and person inputs from various sources, including Shop2Model, Model2Model, Shop2Street, and Street2Street.

2 papers2 benchmarksImages

CholecTrack20 (Multi-Perspective Multi-Class Multi-Object Tracking Dataset For Surgical Tools)

CholecTrack20 is a surgical video dataset focusing on laparoscopic cholecystectomy and designed for surgical tool tracking, featuring 20 annotated videos. The dataset includes detailed labels for multi-class multi-tool tracking, offering trajectories for tool visibility within the camera scope, intracorporeal movement within the patient's body, and the life-long intraoperative trajectory of each tool. Annotations cover spatial coordinates, tool class, operator identity, phase, visual conditions (occlusion, bleeding, smoke), and more for tools like grasper, bipolar, hook, scissors, clipper, irrigator, and specimen bag, with annotations provided at 1 frame per second across 35K frames and 65K instance tool labels. The dataset uses official splits, allocating 10 videos for training, 2 for validation, and 8 for testing.

2 papers0 benchmarksImages, Videos

PreviousPage 103 of 164Next