3,275 machine learning datasets
3,275 dataset results
Wikipedia Webpage 2M (WikiWeb2M) is a multimodal open source dataset consisting of over 2 million English Wikipedia articles. It is created by rescraping the ∼2M English articles in WIT. Each webpage sample includes the page URL and title, section titles, text, and indices, images and their captions.
Despite recent advances in vision-and-language tasks, most progress is still focused on resource-rich languages such as English. Furthermore, widespread vision-and-language datasets directly adopt images representative of American or European cultures resulting in bias. Hence we introduce ParsVQA-Caps, the first benchmark in Persian for Visual Question Answering and Image Captioning tasks. We utilize two ways to collect datasets for each task, human-based and template-based for VQA and human-based and web-based for image captioning. The image captioning dataset consists of over 7.5k images and about 9k captions. The VQA dataset consists of almost 11k images and 28.5k question and answer pairs with short and long answers usable for both classification and generation VQA.
A View From Somewhere (AVFS)—a dataset of 638,180 face similarity judgments over 4,921 faces. Each judgment corresponds to the odd-one-out (i.e., least similar) face in a triplet of faces and is accompanied by both the identifier and demographic attributes of the annotator who made the judgment.
COVIDx CXR-3 is an open access benchmark dataset that we generated, comprising 30,882 CXR images across 17,026 patient cases. Images may be added over time to improve the dataset.
HAC is a dataset for learning and benchmarking arbitrary Hybrid Adverse Conditions restoration. HAC contains 31 scenarios composed of an arbitrary combination of five common weather, with a total of 316K adverse-weather/clean pairs.
Dataset can be used by anyone who is interested to perform morphological classification of galaxies. Originally dataset provided by Kaggle user Jay Lin (https://www.kaggle.com/jay1985) 4 years ago. Dataset was used in conference paper "Morphological Classification of Galaxies Using SpinalNet"
The directory HiCIS contains two datasets for instance segmentation of honeycombs in concrete in COCO Format. The datasets orginate from images scraped from the internet and the other one is provided by Metis Systems AG. The directory HiCC/web contains the dataset using the images from the internet and HICC/metis contains the dataset using the images provided by Metis Systems AG as part of the research project Smart Design and Construction (SDaC).
We introduce a large-scale video dataset Slovo for Russian Sign Language task. Slovo dataset size is about 16 GB, and it contains 20400 RGB videos for 1000 sign language gestures from 194 singers. Each class has 20 samples. The dataset is divided into training set and test set by subject user_id. The training set includes 15300 videos, and the test set includes 5100 videos. The total video recording time is ~9.2 hours. About 35% of the videos are recorded in HD format, and 65% of the videos are in FullHD resolution. The average video length with gesture is 50 frames.
A dataset of 100K synthetic images of skin lesions, ground-truth (GT) segmentations of lesions and healthy skin, GT segmentations of seven body parts (head, torso, hips, legs, feet, arms and hands), and GT binary masks of non-skin regions in the texture maps of 215 scans from the 3DBodyTex.v1 dataset [2], [3] created using the framework described in [1]. The dataset is primarily intended to enable the development of skin lesion analysis methods. Synthetic image creation consisted of two main steps. First, skin lesions from the Fitzpatrick 17k dataset were blended onto skin regions of high-resolution three-dimensional human scans from the 3DBodyTex dataset [2], [3]. Second, two-dimensional renders of the modified scans were generated.
3D confocal stacks with corresponding 2D Light-field microscope images
This dataset contains annual Sentinel-2 MSI composites (wet and dry season) for Kigali for the period 2016-2020. In addition, a metadata file containing population count at the grid level (100 x 100 m) for 2020 and at the census level (administrative units) for 2016 and 2020 is provided. Ancillary data such as the administrative boundaries of Kigali are also available.
Iran's Built Heritage Binary Image Classification Dataset contains approximately 10,500 CHB images gathered from four different sources:
The image comes from the CCD camera of the highway measurement vehicle. Cracks and sealed cracks have been labeled. The form of labels is different from traditional block annotations, but uses redundant and dense annotation boxes. Some of the data is manually annotated, while others are model generated annotations that have undergone careful manual inspection.
PTCGA200 is a public pathological H&E image datasets from Patch TCGA in 200 microns by 512 px.
This dataset is a collection of fluorescent images from mice in order to test an automatic cell counting tool that we developed. 62 images viewed from 2 or 3 different fields of views are shown. In brief, the dataset was derived from brain sections of a model for HIV-induced brain injury (HIVgp120tg), which expresses soluble gp120 envelope protein in astrocytes under the control of a modified GFAP promoter. The mice were in a mixed C57BL/6.129/SJL genetic background, and two genotypes of 9 month old male mice were selected: wild type controls (Resting, n = 3) and transgenic littermates (HIVgp120tg, Activated, n = 3). No randomization was performed. HIVgp120tg mice show among other hallmarks of human HIV neuropathology an increase in microglia numbers which indicates activation of the cells compared to non-transgenic littermate controls.
ISOD contains 2,000 manually labelled RGB-D images from 20 diverse sites, each featuring over 30 types of small objects randomly placed amidst the items already present in the scenes. These objects, typically ≤3cm in height, include LEGO blocks, rags, slippers, gloves, shoes, cables, crayons, chalk, glasses, smartphones (and their cases), fake banana peels, fake pet waste, and piles of toilet paper, among others. These items were chosen because they either threaten the safe operation of indoor mobile robots or create messes if run over.
The Belfort dataset This dataset includes minutes of Belfort municipal council drawn up between 1790 and 1946. Documents include deliberations, lists of councillors, convocations, and agendas. It includes 24,105 text-line images that were automatically detected from pages. Up to 4 transcriptions are available for each line image: two from humans, and two from automatic models.
This dataset contains the pre-generated dataset referenced in the GenPlot Paper.
Multi-Person Interaction Motion (MI-Motion) Dataset includes skeleton sequences of multiple individuals collected by motion capture systems and refined and synthesized using a game engine. The dataset contains 167k frames of interacting people's skeleton poses and is categorized into 5 different activity scenes.
DeepGraviLens is a data set of simulated gravitational lenses consisting of images associated with brightness variation time series. In this dataset, both non-transient and transient phenomena (supernovae explosions) are simulated.