Datasets

3,275 machine learning datasets

3,275 dataset results

Holopix50k

An in-the-wild stereo image dataset, comprising 49,368 image pairs contributed by users of the Holopix mobile social platform.

12 papers0 benchmarksImages, Stereo

A new large dataset for illumination estimation. This dataset, called INTEL-TAU, contains 7022 images in total, which makes it the largest available high-resolution dataset for illumination estimation research.

12 papers0 benchmarksImages

IRS (Indoor Robotics Stereo)

IRS is an open dataset for indoor robotics vision tasks, especially disparity and surface normal estimation. It contains totally 103,316 samples covering a wide range of indoor scenes, such as home, office, store and restaurant.

12 papers0 benchmarksImages

MoVi (Large Multipurpose Motion and Video Dataset)

Contains 60 female and 30 male actors performing a collection of 20 predefined everyday actions and sports movements, and one self-chosen movement.

12 papers5 benchmarksImages

RICE (Remote sensing Image Cloud rEmoving)

RICE is a remote sensing image dataset for cloud removal. The proposed dataset consists of two parts: RICE1 contains 500 pairs of images, each pair has images with cloud and cloudless size of 512512; RICE2 contains 450 sets of images, each set contains three 512512 size images, respectively, the reference picture without clouds, the picture of the cloud and the mask of its cloud.

12 papers0 benchmarksImages

TJU-DHD

TJU-DHD is a high-resolution dataset for object detection and pedestrian detection. The dataset contains 115,354 high-resolution images (52% images have a resolution of 1624×1200 pixels and 48% images have a resolution of at least 2,560×1,440 pixels) and 709,330 labelled objects in total with a large variance in scale and appearance.

12 papers0 benchmarksImages

VizWiz-Captions

Consists of over 39,000 images originating from people who are blind that are each paired with five captions.

12 papers0 benchmarksImages, Texts

MuST-Cinema

MuST-Cinema is a Multilingual Speech-to-Subtitles corpus ideal for building subtitle-oriented machine and speech translation systems. It comprises audio recordings from English TED Talks, which are automatically aligned at the sentence level with their manual transcriptions and translations.

12 papers0 benchmarksImages

Aesthetic Visual Analysis

Aesthetic Visual Analysis is a dataset for aesthetic image assessment that contains over 250,000 images along with a rich variety of meta-data including a large number of aesthetic scores for each image, semantic labels for over 60 categories as well as labels related to photographic style.

12 papers2 benchmarksImages

ADAM (Adam: automatic detection challenge on age-related macular degeneration)

ADAM is organized as a half day Challenge, a Satellite Event of the ISBI 2020 conference in Iowa City, Iowa, USA.

12 papers1 benchmarksImages, Medical

Hyper-Kvasir Dataset

HyperKvasir dataset contains 110,079 images and 374 videos where it captures anatomical landmarks and pathological and normal findings. A total of around 1 million images and video frames altogether.

12 papers3 benchmarksBiomedical, Images, Medical, Videos

Semi-iNat (Semi-Supervised iNaturalist)

Semi-iNat is a challenging dataset for semi-supervised classification with a long-tailed distribution of classes, fine-grained categories, and domain shifts between labeled and unlabeled data. The data is obtained from iNaturalist, a community driven project aimed at collecting observations of biodiversity.

12 papers0 benchmarksImages

SODA10M

SODA10M is a large-scale object detection benchmark for standardizing the evaluation of different self-supervised and semi-supervised approaches by learning from raw data. SODA10M contains 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes.

12 papers0 benchmarksImages

OPA (Object Placement Assessment)

Object-Placement-Assessment (OPA) is a task consisting on verifying whether a composite image is plausible in terms of the object placement. The foreground object should be placed at a reasonable location on the background considering location, size, occlusion, semantics, and etc.

12 papers0 benchmarksImages

SpaceNet 7 (Multi-Temporal Urban Development SpaceNet Dataset)

Satellite imagery analytics have numerous human development and disaster response applications, particularly when time series methods are involved. For example, quantifying population statistics is fundamental to 67 of the 232 United Nations Sustainable Development Goals, but the World Bank estimates that more than 100 countries currently lack effective Civil Registration systems. The SpaceNet 7 Multi-Temporal Urban Development Challenge aims to help address this deficit and develop novel computer vision methods for non-video time series data. In this challenge, participants will identify and track buildings in satellite imagery time series collected over rapidly urbanizing areas. The competition centers around a new open source dataset of Planet satellite imagery mosaics, which includes 24 images (one per month) covering ~100 unique geographies. The dataset will comprise over 40,000 square kilometers of imagery and exhaustive polygon labels of building footprints in the imagery, total

12 papers0 benchmarksImages

Causal3DIdent

Update on 3DIdent, where we introduce six additional object classes (Hare, Dragon, Cow, Armadillo, Horse, and Head), and impose a causal graph over the latent variables. For further details, see Appendix B in the associated paper (https://arxiv.org/abs/2106.04619).

12 papers1 benchmarksImages

RENOIR

A dataset of color images corrupted by natural noise due to low-light conditions, together with spatially and intensity-aligned low noise images of the same scenes.

12 papers2 benchmarksImages

ONCE-3DLanes (Monocular 3D Lane Detection Dataset)

ONCE-3DLanes is a real-world autonomous driving dataset with lane layout annotation in 3D space. A dataset annotation pipeline is designed to automatically generate high-quality 3D lane locations from 2D lane annotations by exploiting the explicit relationship between point clouds and image pixels in 211,000 road scenes.

12 papers0 benchmarksImages

Bongard-HOI

Bongard-HOI testifies to which extent your few-shot visual learner can quickly induce the true HOI concept from a handful of images and perform reasoning with it. Further, the learner is also expected to transfer the learned few-shot skills to novel HOI concepts compositionally.

12 papers2 benchmarksImages, Texts

RECON (RECON Outdoor Navigation Dataset)

https://sites.google.com/view/recon-robot/dataset

12 papers0 benchmarksImages, RGB Video, Stereo

PreviousPage 47 of 164Next