3,275 machine learning datasets
3,275 dataset results
The ULI-RI dataset is generated using the Unreal Engine 4 to simulate various outdoor environments with 115 high-quality 3D human models. For each person identity, we controlled and quantitatively labeled the illumination intensity, view point (model z-rotation angle), and background to create 512 images. There are total 115 x 512 = 58880 images in the ULI-RI dataset.
50K synthetic renders of the human foot, with surface normals, masks and keypoints.
Overview The Surgical Instruments Recognition Dataset is a groundbreaking collection of high-resolution images (1280x960 pixels) specifically designed for the recognition and categorization of surgical instruments. This dataset captures the intricate details and complexity of surgical tools, particularly when arranged in scenarios reminiscent of an operating room.
Intrinsic component extension of MIT Multi-Illumination Dataset proposed in the paper "Intrinsic Image Decomposition via Ordinal Shading", Chris Careaga and Yağız Aksoy, ACM Transactions on Graphics, 2023
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The CryoPPP dataset consists of 34 ground truth data and metadata for 335 EMPIAR IDs. The ground truth data is comprised of a variety of 9893 Micrographs (~300 cryo-EM images per EMPIAR ID) with manually curated ground truth coordinates of picked protein particles. The metadata consists of 1,698,802 high-resolution micrographs deposited in EMPIAR with their respective FPT and Globus data download paths.
The ITCPR dataset is a comprehensive collection specifically designed for the Zero-Shot Composed Person Retrieval (ZS-CPR) task. It consists of a total of 2,225 annotated triplets, derived from three distinct datasets: Celeb-reID, PRCC, and LAST.
The CLCXray dataset contains 9,565 X-ray images, in which 4,543 X-ray images (real data) are obtained from the real subway scene and 5,022 X-ray images (simulated data) are scanned from manually designed baggage. There are 12 categories in the CLCXray dataset, including 5 types of cutters and 7 types of liquid containers. Five kinds of cutters include blade, dagger, knife, scissors, swiss army knife. Seven kinds of liquid containers include cans, carton drinks, glass bottle, plastic bottle, vacuum cup, spray cans, tin. The annotations are made in COCO format.
The Algonauts 2023 Challenge focuses on predicting responses in the human brain as participants perceive complex natural visual scenes. Through collaboration with the Natural Scenes Dataset (NSD) team, the Challenge runs on the largest suitable brain dataset available, opening new venues for data-hungry modeling.
A pioneering dataset for vignette removal. Vigset includes 983 pairs of both vignetting and vignetting-free high-resolution (5340×3697) real-world images under various conditions.
The scales of the data accessible through internet search engines can reach hundreds of millions, or even billions. The existence of such large weak-labeled databases has gained importance in the training of face recognition algorithms. Starting with the publicly available YFCC100M, we propose a weakly-labeled subset for multi-label face recognition for self-supervised methods. A 392K image subset of YFCC100M of 128x128 images was obtained by querying for the 40 facial attributes. We made this dataset publicly available.
MapReader in GeoHumanities workshop (SIGSPATIAL 2022): Gold standards and outputs
Automating the creation of catalogues for radio galaxies in next-generation deep surveys necessitates the identification of components within extended sources and their respective infrared hosts. We present RadioGalaxyNET, a multimodal dataset, tailored for machine learning tasks to streamline the automated detection and localization of multi-component extended radio galaxies and their associated infrared hosts. The dataset encompasses 4,155 instances of galaxies across 2,800 images, incorporating both radio and infrared channels. Each instance furnishes details about the extended radio galaxy class, a bounding box covering all components, a pixel-level segmentation mask, and the keypoint position of the corresponding infrared host galaxy. RadioGalaxyNET is the first dataset to include images from the highly sensitive Australian Square Kilometre Array Pathfinder (ASKAP) radio telescope, corresponding infrared images, and instance-level annotations for galaxy detection.
Synthetic dataset comprising three different environments for multi-camera dynamic novel view synthesis for soccer. This dataset is made compatible for Nerfstudio, and includes data parsers with various settings to reproduce the settings of our paper "Dynamic NeRFs for Soccer Scenes" and more.
topex-printer is a dataset containing 102 machine parts of a label printing machine. It includes these parts for two domains, real photos and CAD rendered models.
Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, many works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is partly caused by the large amount of time required to fit datasets of neural fields.
The Generic Object Decoding (GOD) Dataset is a specialized resource developed for fMRI-based decoding. It aggregates fMRI data gathered through the presentation of images from 200 representative object categories, originating from the 2011 fall release of ImageNet. The training session incorporated 1,200 images (8 per category from 150 distinct object categories). In contrast, the test session included 50 images (one from each of the 50 object categories). It is noteworthy that the categories in the test session were unique from those in the training session and were introduced in a randomized sequence across runs. On five subjects the fMRI scanning was conducted.
ODSI-DB is an image database of oral and dental reflectance spectral images of human test subjects. Image sets of the test subjects contain the front-view and the occlusal surfaces of lower and upper teeth, oral mucosa, and face surrounding the mouth. Other features-of-interest have been imaged on case-by-case basis. The spectral images in the database have been annotated by dental experts.
The dataset includes polarimetric, RGB and depth automotive (on the road) data.
Recent advances in large language models have led to the development of multimodal LLMs (MLLMs), which take both image data and text as an input. Virtually all of these models have been announced within the past year, leading to a significant need for benchmarks evaluating the abilities of these models to reason truthfully and accurately on a diverse set of tasks. When Google announced Gemini (Gemini Team et al., 2023), they showcased its ability to solve rebuses—wordplay puzzles which involve creatively adding and subtracting letters from words derived from text and images. The diversity of rebuses allows for a broad evaluation of multimodal reasoning capabilities, including image recognition, multi- step reasoning, and understanding the human creator’s intent. We present REBUS: a collection of 333 hand-crafted rebuses spanning 13 diverse cate- gories, including hand-drawn and digital images created by nine contributors. Samples are presented in Table 1. Notably, GPT-4V, the most powe