Datasets

3,275 machine learning datasets

3,275 dataset results

MIT Traffic

MIT Traffic is a dataset for research on activity analysis and crowded scenes. It includes a traffic video sequence of 90 minutes long. It is recorded by a stationary camera. The size of the scene is 720 by 480 and it is divided into 20 clips.

6 papers0 benchmarksImages

UCO-LAEO

A dataset for building models that detect people Looking At Each Other (LAEO) in video sequences.

6 papers0 benchmarksImages

UIT-ViIC

UIT-ViIC contains manually written captions for images from Microsoft COCO dataset relating to sports played with ball. UIT-ViIC consists of 19,250 Vietnamese captions for 3,850 images.

6 papers0 benchmarksImages

MLe2e

MLe2 is a dataset for the evaluation of scene text end-to-end reading systems and all intermediate stages such as text detection, script identification and text recognition. The dataset contains a total of 711 scene images covering four different scripts (Latin, Chinese, Kannada, and Hangul).

6 papers0 benchmarksImages

2D Hela

2D HeLa is a dataset of fluorescence microscopy images of HeLa cells stained with various organelle-specific fluorescent dyes. The images include 10 organelles, which are DNA (Nuclei), ER (Endoplasmic reticulum), Giantin, (cis/medial Golgi), GPP130 (cis Golgi), Lamp2 (Lysosomes), Mitochondria, Nucleolin (Nucleoli), Actin, TfR (Endosomes), Tubulin. The purpose of the dataset is to train a computer program to automatically identify sub-cellular organelles.

6 papers0 benchmarksBiology, Images

LLAMAS (Labeled Lane Markers)

The unsupervised Labeled Lane MArkerS dataset (LLAMAS) is a dataset for lane detection and segmentation. It contains over 100,000 annotated images, with annotations of over 100 meters at a resolution of 1276 x 717 pixels. The Unsupervised Llamas dataset was annotated by creating high definition maps for automated driving including lane markers based on Lidar.

6 papers4 benchmarksImages

VIsual PERception (VIPER)

VIPER is a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world.

6 papers0 benchmarks3D, Images, Videos

DAGM2007

This is a synthetic dataset for defect detection on textured surfaces. It was originally created for a competition at the 2007 symposium of the DAGM (Deutsche Arbeitsgemeinschaft für Mustererkennung e.V., the German chapter of the International Association for Pattern Recognition). The competition was hosted together with the GNSS (German Chapter of the European Neural Network Society).

6 papers2 benchmarksImages

PieAPP dataset

The PieAPP dataset is a large-scale dataset used for training and testing perceptually-consistent image-error prediction algorithms. The dataset can be downloaded from: server containing a zip file with all data (2.2GB) or Google Drive (ideal for quick browsing).

6 papers0 benchmarksImages

AcinoSet

AcinoSet is a dataset of free-running cheetahs in the wild that contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. The authors utilized markerless animal pose estimation with DeepLabCut to provide 2D keypoints (in the 119K frames). It also includes 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data.

6 papers0 benchmarksImages

IIIT-ILST

IIIT-ILST is a dataset and benchmark for scene text recognition for three Indic scripts - Devanagari, Telugu and Malayalam. IIIT-ILST contains nearly 1000 real images per each script which are annotated for scene text bounding boxes and transcriptions.

6 papers0 benchmarksImages, Texts

OmniFlow

OmniFlow is a synthetic omnidirectional human optical flow dataset. Based on a rendering engine the authors created a naturalistic 3D indoor environment with textured rooms, characters, actions, objects, illumination and motion blur where all components of the environment are shuffled during the data capturing process. The simulation has as output rendered images of household activities and the corresponding forward and backward optical flow. The dataset consists of 23,653 image pairs and corresponding forward and backward optical flow.

6 papers0 benchmarksImages

VMRD (Visual Manipulation Relationship Dataset)

VMRD is a multi-object grasp dataset. It has been collected and labeled using hundreds of objects coming from 31 categories. There are totally 5,185 images including 17,688 object instances and 51,530 manipulation relationships.

6 papers0 benchmarksImages

BAAI-VANJEE

BAAI-VANJEE is a dataset for benchmarking and training various computer vision tasks such as 2D/3D object detection and multi-sensor fusion. The BAAI-VANJEE roadside dataset consists of LiDAR data and RGB images collected by VANJEE smart base station placed on the roadside about 4.5m high. This dataset contains 2500 frames of LiDAR data, 5000 frames of RGB images, including 20% collected at the same time. It also contains 12 classes of objects, 74K 3D object annotations and 105K 2D object annotations.

6 papers0 benchmarksImages, LiDAR

ZS-F-VQA

The ZS-F-VQA dataset is a new split of the F-VQA dataset for zero-shot problem. Firstly we obtain the original train/test split of F-VQA dataset and combine them together to filter out the triples whose answers appear in top-500 according to its occurrence frequency. Next, we randomly divide this set of answers into new training split (a.k.a. seen) $\mathcal{A}_s$ and testing split (a.k.a. unseen) $\mathcal{A}_u$ at the ratio of 1:1. With reference to F-VQA standard dataset, the division process is repeated 5 times. For each $(i,q,a)$ triplet in original F-VQA dataset, it is divided into training set if $a \in \mathcal{A}_s$. Else it is divided into testing set. The overlap of answer instance between training and testing set in F-VQA are $2565$ compared to $0$ in ZS-F-VQA.

6 papers1 benchmarksGraphs, Images, Texts

GD-VCR

Geo-Diverse Visual Commonsense Reasoning (GD-VCR) is a new dataset to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense.

6 papers2 benchmarksImages, Texts

CAMELS Multifield Dataset

CMD is a publicly available collection of hundreds of thousands 2D maps and 3D grids containing different properties of the gas, dark matter, and stars from more than 2,000 different universes. The data has been generated from thousands of state-of-the-art (magneto-)hydrodynamic and gravity-only N-body simulations from the CAMELS project.

6 papers0 benchmarks3D, Images, Physics

Galaxy Zoo DECaLS

Approx. 300,000 images of galaxies labelled by shape.

6 papers0 benchmarksImages

Modern Office-31

Modern Office-31 is a refurbished version of the commonly used Office-31 dataset. Modern Office-31 rectifies many of the annotation errors and low quality images in the Amazon domain of the original Office-31 dataset. Additionally, this dataset adds another synthetic domain based on the Adaptiope dataset.

6 papers0 benchmarksImages

CeyMo

CeyMo is a novel benchmark dataset for road marking detection which covers a wide variety of challenging urban, sub-urban and rural road scenarios. The dataset consists of 2887 total images of 1920 × 1080 resolution with 4706 road marking instances belonging to 11 classes. The test set is divided into six categories: normal, crowded, dazzle light, night, rain and shadow.

6 papers1 benchmarksImages

PreviousPage 64 of 164Next