3,275 machine learning datasets
3,275 dataset results
This dataset contains 1200 images (1000 WLI images and 200 FICE images) with fine-grained segmentation annotations. The training set consists of 1000 images, and the test set consists of 200 images. All polyps are classified into neoplastic or non-neoplastic classes denoted by red and green colors, respectively. This dataset is a part of a bigger dataset called NeoPolyp.
CaDIS: a Cataract Dataset for Image Segmentation is a dataset for semantic segmentation created by Digital Surgery Ltd. on top of the CATARACTS dataset. CaDIS consists of 4670 images sampled from the 25 videos on CATARACTS' training set. Each pixel in each image is labeled with its respective instrument or anatomical class from a set of 36 identified classes. More details about the dataset could be found in the paper (https://arxiv.org/pdf/1906.11586.pdf).
HUMAN4D is a large and multimodal 4D dataset that contains a variety of human activities simultaneously captured by a professional marker-based MoCap, a volumetric capture and an audio recording system. By capturing 2 female and $2$ male professional actors performing various full-body movements and expressions, HUMAN4D provides a diverse set of motions and poses encountered as part of single- and multi-person daily, physical and social activities (jumping, dancing, etc. ), along with multi-RGBD (mRGBD), volumetric and audio data.
CUHK03-C is an evaluation set that consists of algorithmically generated corruptions applied to the CUHK03 test-set. These corruptions consist of Noise: Gaussian, shot, impulse, and speckle; Blur: defocus, frosted glass, motion, zoom, and Gaussian; Weather: snow, frost, fog, brightness, spatter, and rain; Digital: contrast, elastic, pixel, JPEG compression, and saturate. Each corruption has five severity levels, resulting in 100 distinct corruptions.
Adaptiope is a domain adaptation dataset with 123 classes in the three domains synthetic, product and real life. One of the main goals of Adaptiope is to offer a clean and well curated set of images for domain adaptation. This was necessary as many other common datasets in the area suffer from label noise and low quality images. Additionally, Adaptiope's class set was chosen in a way that minimizes the overlap with the class set of the commonly used ImageNet pretraining, therefore preventing information leakage in a domain adaptation setup.
AmsterTime dataset offers a collection of 2,500 well-curated images matching the same scene from a street view matched to historical archival image data from Amsterdam city. The image pairs capture the same place with different cameras, viewpoints, and appearances. Unlike existing benchmark datasets, AmsterTime is directly crowdsourced in a GIS navigation platform (Mapillary). In turn, all the matching pairs are verified by a human expert to verify the correct matches and evaluate the human competence in the Visual Place Recognition (VPR) task for further references.
EgoHOS is a labeled dataset consisting of 11243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. The data are collected form multiple sources: 7,458 frames from Ego4D, 2,212 frames from EPIC-KITCHEN, 806 frames from THU-READ, and 350 frames of our own collected egocentric videos with people playing Escape Room. This dataset is designed for tasks including hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interactions, and video inpainting of hand-object foregrounds in egocentric videos.
The EntitySeg dataset contains 33,227 images with high-quality mask annotations. Compared with existing dataets, there are three distinct properties in EntitySeg. First, 71.25% and 86.23% of the images are of high resolution with at least 2000px×2000px and 1000px×1000px which is more consistent with current digital imaging trends. Second, the dataset is open-world and is not limited to predefined classes. Third, the mask annotation along the boundaries are more accurate than existing datasets.
Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people. This exchange is not free from offensive, trolling or malicious contents targeting users or communities. One way of trolling is by making memes, which in most cases combines an image with a concept or catchphrase. The challenge of dealing with memes is that they are region-specific and their meaning is often obscured in humour or sarcasm. To facilitate the computational modelling of trolling in the memes for Indian languages, we created a meme dataset for Tamil (TamilMemes). We annotated and released the dataset containing suspected trolls and not-troll memes. In this paper, we use the a image classification to address the difficulties involved in the classification of troll memes with the existing methods. We found that the identification of a troll meme with such an image classifier is not feasible which has been corroborated with precision,
ClueWeb22 is the newest iteration of the ClueWeb line of datasets, provides 10 billion web pages affiliated with rich information. Its design was influenced by the need for a high quality, large scale web corpus to support a range of academic and industry research, for example, in information systems, retrieval-augmented AI systems, and model pretraining. Compared with earlier CLUEWeb corpora, the ClUEWeb22 corpus is larger, more varied, of higher-quality, and aligned with the document distributions in commercial web search. Besides raw HTML, the dataset includes rich information about the web pages provided by industry-standard document understanding systems, including the visual representation of pages rendered by a web browser, parsed HTML structure information from a neural network parser, and pre-processed cleaned document text.
VOT2020 is a Visual Object Tracking benchmark for short-term tracking in RGB.
The dataset contains single-shot videos taken from moving cameras in underwater environments. The first shard of a new Marine Video Kit dataset is presented to serve for video retrieval and other computer vision challenges. In addition to basic meta-data statistics, we present several insights based on low-level features as well as semantic annotations of selected keyframes. 1379 videos with a length from 2 s to 4.95 min, with the mean and median duration of each video is 29.9 s, and 25.4 s, respectively. We capture data from 11 different regions and countries during the time from 2011 to 2022.
The Zenseact Open Dataset (ZOD) is a large-scale and diverse multi-modal autonomous driving (AD) dataset, created by researchers at Zenseact. It was collected over a 2-year period in 14 different European counties, using a fleet of vehicles equipped with a full sensor suite. The dataset consists of three subsets: Frames, Sequences, and Drives, designed to encompass both data diversity and support for spatiotemporal learning, sensor fusion, localization, and mapping.
SAMRS is a remote sensing segmentation dataset which provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination.
PerSeg is a dataset for personalized segmentation. The raw images are collect from the training data of subject driven diffusion models: DreamBooth, Textual Inversion, and Custom Diffusion. PerSeg contains 40 objects of various categories in total, including daily necessities, animals, and buildings. Contextualized in different poses or scenes, each object is related with 5∼7 images with our annotated masks.
This is the first general Underwater Image Instance Segmentation (UIIS) dataset containing 4,628 images for 7 categories with pixel-level annotations for underwater instance segmentation task
Have need seven multiple exposure ground truth images satisfying EV 0, ±1, ±2, ±3 for static scenes. 96 different scene image stacks (672 images; out-door: 504, indoor: 168 images)
The PhenoBench dataset contains multiple image segmentation challenges from the agricultural domain.
We introduce a novel Image Quality Assessment (IQA) dataset comprising 6073 UHD-1 (4K) images, annotated at a fixed width of 3840 pixels. Contrary to existing No-Reference (NR) IQA datasets, ours focuses on highly aesthetic photos of high technical quality, filling a gap in the literature. The images, carefully curated to exclude synthetic content, are sufficiently diverse to train general NR-IQA models. Importantly, the dataset is annotated with perceptual quality ratings obtained through a crowdsourcing study. Ten expert raters, comprising photographers and graphics artists, assessed each image at least twice in multiple sessions spanning several days, resulting in highly reliable labels. Annotators were rigorously selected based on several metrics, including self-consistency, to ensure their reliability. The dataset includes rich metadata with user and machine-generated tags from over 5,000 categories and popularity indicators such as favorites, likes, downloads, and views. With its
The iWildCam2020-WILDS dataset is a variant of the iWildCam 2020 dataset. iWildCam2020-WILDS is a benchmark dataset designed to test OOD generalization for the task of species classification. The label space consists of 182 species. Each domain corresponds to a different location of the camera trap. The training and test images belong to disjoint sets of locations in the OOD setting.